Blog
Product

100K Birds Flying on GPU: An Exploration of Algorithms and Nature

There was a question that bothered me for a long time: why do people and matter in this world form different clusters and repulsions at different scales? People come together, yet form competing and hostile crowds; atoms and molecules come together, yet can produce separate objects, and even planets and galaxies.

With this simple contemplation, I overcame my art student's professional barriers and began studying concepts that were once far from me: Complex Systems, Swarm Intelligence, Metaheuristics, Self-Organization, Emergence, and Agent-Based Modeling (ABM). I turned these studies and discoveries into artworks and released my first App Store / Steam application.

If you're interested, let's learn about this story and explore together...

Initial Inspiration

I rarely play games, but "Frost" and its sequel "Lifelike" left an extraordinarily deep impression on me. The game's website told me that no CPU could provide such high-performance cluster simulation, and the game developed GPU algorithms based on Metal.

Later, I learned about Unity Visual Effect Graph, an advanced and easy-to-use GPU particle system, but this system couldn't simulate particle interactions, which deepened my confusion. How did Frost achieve it?

Shared my confusion with Professor Lu

Starting with Bird Flocks

In academia, the pioneering work in swarm systems is a 1986 bionic program called Boids. It summarized bird flock movement into three basic rules for individual motion: Separation, Alignment, and Cohesion, with each individual only able to observe others within a certain range or even a certain field of view.

The classic bird flock algorithm Boids

Boids algorithm example on the Processing website, https://processing.org/examples/flocking.html

When Two Flocks Meet

Project website: https://bingweb.binghamton.edu/~sayama/SwarmChemistry/

When I discovered this research, I became instantly excited. The particle system inherited the three basic rules of the bird flock model and added more behavioral rules, while having different flocks in the same scene. Diversity emerged in an instant.

You can download the Java program from the project website, and the code is also open source.

However, the program limited the number of particles to 300, and we all know that CPU's (single-threaded, traversal) neighbor-finding algorithm has a complexity of O(N²), meaning when the number of particles doubles, the computation time quadruples. The simulation bottleneck appeared immediately.

GPU: 10,000x Performance Improvement

Jensen Huang of NVIDIA once said that according to Moore's Law, the GPU improvement compared to CPU is equivalent to giving researchers ten years' worth of computing resources now, which might explain why GPU computing improvements are so noticeable.

A YouTuber, the creator of the game Headmaster, implemented Boids simulation with 500k particles. The principle is the same as the Processing Pixel Flow library example program - it doesn't actually search for neighbors but writes particle information into a vector texture to change each particle's position.

Vector field texture-based simulation can easily achieve 100,000-level Boids approximate simulation, but this method isn't suitable for the complex model I wanted to implement

First Attempt

Python + Numba

My first experiment was using the Numba library with CUDA acceleration in Python. Thanks to Python's extremely low barrier to entry and Numba's magical syntax, we quickly ported the CPU algorithm to the GPU platform and rendered it with OpenGL.

Numba library, just add a few lines of code to make a function run in parallel on GPU

First successful algorithm run, the display was still black and white then

Running the OpenGL rendered program

Testing in the classroom

Exhibition effect diagram

Second Attempt

TouchDesigner

First time using TouchDesigner, didn't know how to achieve many effects, but at least got it running

When presenting for the course completion, we already had a 60x performance improvement

Unfortunately, we couldn't directly call OpenGL compute shaders in TouchDesigner (because macOS doesn't support it), all computations were still done in Python's Numba, just moving the drawing part from OpenGL to TouchDesigner.

Additionally, Python (Numba) doesn't have structs and vector types, all data needs to be stored in arrays, so x, y coordinate calculations had to be written twice, which was very cumbersome.

Third Attempt

Houdini

After learning Houdini, I discovered a new world. OpenCL support gave me the opportunity to write low-level GPU programs (CUDA C++ was really too difficult for me), with convenient data binding and ultra-high performance. The algorithm was completely ported with just a few nodes and connections.

Using OpenCL nodes in Houdini to accelerate simulation, easily rewriting 3D with vector syntax

Rendered with Redshift (out of focus makes it artistic)

Houdini OpenCL's performance gave me the motivation for further optimization, and even the idea of making cross-platform applications. For an offline effects software, it had exceptionally smooth real-time performance, much better than Python's Numba...

Final Masterpiece

Unity

After stumbling through countless pitfalls, I finally found the most suitable platform.

Compute Shaders: Solving Interoperability Issues, Multi-platform Compilation

In previous Python/Houdini programs, all particle data computation and rendering needed to be copied back and forth between CPU and GPU. Due to non-shared memory in traditional architecture, this severely affected computational performance. In fact, GPU computation programs and rendering programs can interoperate, i.e., index the same data (Buffer).

Unity provided a perfect solution: HLSL Compute Shader. Unity compiles written HLSL programs into Metal, DirectX, Vulkan, and OpenGL, eliminating the hassle of rewriting code for multiple platforms. Like regular Shader programs, they support reading Buffers in GPU.

Even without any optimization, it easily achieved several times performance improvement, especially noticeable with large numbers of particles.

Pursuing Ultimate Optimization: Spatial Partitioning Algorithm

Until now, all particle neighbor-finding calculations were complete brute force traversal, but there are several interesting methods that can make the program only search for particles around each particle, greatly saving search time. Among them, the most understandable and suitable method for this scenario is spatial partitioning.

Since each particle has a maximum observation radius of 40, I could divide the 1280 × 720 activity area into 32 × 18 grid squares with sides of 40. This way, each particle only needs to traverse the nine grids around its own grid. The reason is that particles separated by one grid must be more than 40 units apart.

After spatial partitioning optimization, performance improved threefold on average compared to unoptimized versions.

Besides this, there are sorting algorithms and K-D Tree partitioning worth exploring.

After basic grid spatial partitioning, performance reached 200k particles at 60FPS on 3070Ti, a 10,000x performance improvement compared to Processing's CPU program

New Discovery: Particle System Generates Zebra Patterns

In a chance experiment, I discovered that just by adding a small mechanism to particles: friend-or-foe recognition, i.e., treating nearby particles differently based on type, very interesting self-organizing patterns could be produced. It could be a small flower, or it could be a zebra pattern.

Previously, zebra patterns could only be achieved through Reaction-Diffusion algorithms or Multi-Neighborhood Cellular Automata algorithms. Now, particle-based models can do it too, which might open new perspectives in life sciences because the foundation of life is cells, not pixels.

Video duration: 01:16

A small flower bloomed by particle motion, now the application icon

Worm-like effect, completely self-organized by particles, can recover after destruction

Unique zebra patterns

Effect after connecting MIDI keyboard, 24 parameters perfectly match the twenty-four knobs on AKAI MPD218, excellent experience

Photo taken during testing at the academy

Mobile UI I made for the work, you can choose interesting presets, take screenshots, copy and paste parameters.

VR version, under development~

From Idea to Launch

Some small insights

Taking the First Step, Persisting

I remember first sharing this idea with teachers in March this year. I was somewhat afraid then, after all, I wasn't formally trained in computer science, yet wanted to research GPU computing that most computer science undergraduates don't even learn.

Suddenly one day, Sister Danqi pulled me to the academy and asked if we were interested in creating some works for a COP15 biodiversity conference exhibition that the Environmental Protection Department was organizing.

That's when I thought of this topic again and wondered, should I take up the challenge?

I found friends who shared the same pursuits: my high school buddy Chen (Github: CPunisher) who got the first Python program running, Unity teammate Shichen (TikTok @时辰不死于背锅) who works on US time and hands over work at midnight, and Jin Xi and Master Du Xi who worked together on creativity, planning and exhibitions.

We met many masters along the way who gave us tons of help.

Github friends, those who know understand

Worked for three days and nights straight just to light up the black screen of the iOS program. The 0.72 (name of a串串香 shop downstairs) that my teammate brought made my heart race with excitement, double chef ecstasy

Class teacher Yang, our GPU dad, provided computing power for the entire major, leading the whole class to defeat R9000P

New media artist Ren Yuan, introduced by Teacher Shen Hao, author of "Processing Creative Programming", gave me a "small goal" that became my motivation to constantly pursue the limits

The gaming department's adorable Feng Hu Huan Yu, asked Teacher Fei and Senior Zheng to teach me spatial partitioning and other algorithms

This lovely Japanese professor researching swarm chemistry replied to my email and asked if I could make a version with evolutionary algorithms

Shared my program with the game developer mentioned above

Department head: You're plucking geese as they fly by, I should charge management fees

Masters, all masters

Course poster, thanks to the Environmental Protection Department for the honor!

Listening to WWF's official introduction to biodiversity

Special thanks to Teacher Sun, Sister Danqi, Brother Bo and the masters at Black Bow, Brother Ou, Old Cao, Sister Xiaolu, who squeezed out time for personal guidance and took us out to eat and drink. Hope to play again soon

(They say our class is the most unfortunate because all the study abroad programs for short semesters were cancelled due to the pandemic, but we still managed to have exciting short semesters. All I can say is, CUC Digital Media, worth investing in 👍)

Release and Submit for Review

Finally passed review!

I'd always heard Apple's review process was strict, and after experiencing it, I feel it's really troublesome, but worth stepping into these pitfalls.

Many things I'd done before stopped at prototype or demo stage, never thinking about releasing them for more people to see. This time I chose to put down my perfectionism and release first without waiting for 100% feature completion.

Then I was faced with things I'd never touched before: developer account registration, export compliance, US tax forms, foreign exchange accounts, age rating, packaging and uploading.

Required verbose writing for App Store: Privacy Policy (no other meaning, just want to say it's really troublesome to have to create a separate webpage to explain there's no data collection for a program that doesn't collect any private data)

First successful package upload

Constantly packaging, constantly testing

Uploaded dozens of screenshots and promotional images of different sizes, finally saw my store page

Passed Steam review!

Passed App Store review!

Final Words

Not long ago, the 2021 Nobel Prize in Physics was awarded to three scientists who made important contributions to complex physical systems. Complex systems thinking is becoming increasingly important. Weather changes unpredictably within deterministic systems, life accumulates order in a disordered world. Each individual has independent goals, transmitting information through limited connections, yet can produce astonishing collective wisdom. This is probably why I'm so fascinated by it.

Slime mold constructing Tokyo railways

WeChat Moments like relationship network drawn in freshman year, clustering partitioning based on graph relationships

My strange attractor generative art work

Multi-Neighborhood Cellular Automata

Reaction-Diffusion

The End.

Click "Read More" or search for "Parameter Life" in the App Store to purchase the application (Steam version launching on 2021.12.3). If you have any questions, feel free to add my WeChat (nhciao, please note your name) to discuss!

Share