» dev performance optimization
This site relies heavily on Javascript. You should enable it if you want the full experience. Learn more.

dev performance optimization

acl(admin devvvv vvvvgroup)

Performace Optimization Tips

GDC 2005 presentations

    http://developer.nvidia.com/object/gdc_2005_presentations.html

locking & parallel processing

The most obvious solution would be to lock the back buffer for each frame in Direct3D (analogous to calling glFinish() in OpenGL). This ensures that all pending graphics commands are completed by the GPU before the CPU moves on. However, this completely removes any potential for asynchronous processing, as the CPU is unable to process the next frame until the current frame has finished rendering.

A better solution is double-buffered texture locking. This is a generalization of locking the back-buffer. At the end of your frame you render a single triangle to a tiny (2x2) texture, then read the contents of your texture. So far this solution is equivalent to locking the back-buffer, and suffers the same kind of stalls. It ensures that the GPU never gets more than 1 frame ahead of the CPU.

Now generalize it: use two tiny textures and alternately render to them and alternately lock them:

Render frame 1

Render a triangle to texture 0

Lock and read texture 1

Render frame 2

Render a triangle to texture 1

Lock and read texture 0

Render frame 3

Render triangle to texture 0

Lock and read texture 1

Render frame 4

Render a triangle to texture 1

Lock and read texture 0

...

Now, the GPU does not get stalled; it also never gets more than 2 frames ahead of the CPU. Lag is up to one frame, but overall efficiency is higher since the GPU is always busy (if you are GPU bound). You can further generalize it to use triple-buffered textures, and you may even be able to insert multiple sync points per frame to get finer control over lag.

A second solution is to use DirectX 9's Asynchronous Query functionality (analogous to using fences in OpenGL). At the end of your frame, insert a D3DQUERYTYPE_EVENT query into your rendering stream. You can then poll whether the GPU has reached this event yet by using GetData. As in 1) you can thus ensure (i.e., busy wait w/ the CPU) that the CPU never gets more than 2 frames ahead of the GPU, while the GPU is never idled. Similarly it is conceivable to insert multiple queries per frame to get finer control over lag.

anonymous user login

Shoutbox

~2d ago

joreg: 6 session beginner course part 2 "Deep Dive" starts January 13th: https://thenodeinstitute.org/courses/ws24-5-vvvv-beginners-part-ii/

~2d ago

joreg: 6 session beginner course part 1 "Playground" starts November 4th: https://thenodeinstitute.org/courses/ws24-5-vvvv-beginners-part-i/

~2d ago

joreg: Save the date: Oktober 17: vvvv meetup in Berlin!

~4d ago

joreg: 12 session online vvvv beginner course postponed to start November 4th: https://thenodeinstitute.org/courses/ws24-5-vvvv-beginners-class/

~15d ago

~24d ago

joreg: Webinar on October 2nd: Rhino meets Realtime with vvvv https://visualprogramming.net/blog/2024/webinar-rhino-meets-realtime-with-vvvv/

~29d ago

joreg: Introducing: Support for latest Ultraleap hand-tracking devices: https://visualprogramming.net/blog/2024/introducing-support-for-new-ultraleap-devices/

~1mth ago

joreg: 2 day vvvv/fuse workshop in Vienna as part of NOISE festival on Sept. 13 and 14: https://www.noise.ist/vienna

~1mth ago

joreg: New beginner video tutorial: World Cities https://youtu.be/ymzrK7tZLBI

~1mth ago

catweasel: https://colour-burst.com/2023/01/26/macroscopic/ yeah, ' is there anyone who cares about slides anymore...' Well me for a start! :D