Unmanaged spread pointers? Arithmetic operations on huge spreads?

matise · July 17, 2012, 5:31pm

Question:
Is it possible to get unmanaged pointer to spread somehow. Currently i use Marshal to copy array from managed to unmanaged memory, but I would like to avoid this operation, because I want to handle huge arrays fast. I have made c++ managed project that has unmanaged methods, some of them written in assembly. This kind of code is really fast, but interoperation between managed code and this is slowing down things. Is it possible to write unamanged node plugin for vvvv? And if it is is there some example? Most of the time when I want to do some operations on huge arrays I don’t need double values, but mostly byte or 16 bit float values. Then I could write assembly that uses mmx, sse instructions and this speeds up computation massively. With 64 bit processors gain can be like 80%.

Idea:
I wanted to create substitution nodes for arithmetic operations that use mmx, sse instructions and can operate on big arrays. I would create assembly for manipulating 8bit, 16bit integer arrays using mmx, sse instructions depending on need, this would improve performance.

Another thing that I have seen here is dottores approach for manipulating huge amounts of data (gpu particles) using textures and gpu. I have been thinking on creating nodes that could receive spread and create dx texture from it and create device inside node which would contain shader code. Output would be converted from texture into a spread once shader code is executed. While this is good for manipulating vectors or any large arrays. There is problem that shaders are limited to texture input, so I don’t know how to implement something like particle collision with world or to receive some other kind of data into shaders. I haven’t looked into cuda or OpenCL documentation jet, but this might be great solution for parallel processing of large arrays. While 64bit cpu can operate on eight 8bit values simultaneously gpu can do much more, like 200-300 32bit float values in one cycle. If nodes could be created that would use OpenCL for arithmetic operations on value spreads in vvvv this would create amazing possibilities and would speed up things. You could have few gpus in pc and use their memory for spread storage and operations. Most of operations in vvvv can be done in parallel. This might seem complicated, but I think it is simple and can be done easily. Only thing I need is direct access to vvvv memory.

Project description:

I have created managed implementation for PhysX nodes. Till now I have created poly meshes, convex meshed, cloth, triggers, events. There were no problems and it is all working just fine. But now I have created fluid node and emitters. Well these nodes could generate huge spreads of particle data (like gpu particles from dottore). When I look at timings (debug mode) I see that by far the slowest operations are arithmetic operations inside vvvv. I output spread of vectors representing particle positions and spread of particle age. So I do some operations over them, lets say I generate spread of colors in relation to particle age. This results in few double operations per 3000 particles and it impacts performance.

matise · July 17, 2012, 6:17pm

http://www.youtube.com/watch?v=ygn2Axc3qe0

This is video from vvvv using PhysX and fluid particle system. 60% of cpu time goes onto generating colors from particle life time. Which is not good, considering that everything else is calculated on cpu…

There are 3000 spheres rendered here, motion blur and radial blur is applied in two more passes over particle texture… There is only one emitter in this scene, because this is scene that I build while writing nodes for PhysX implementation… I would create some demo containing all implemented features. Currently there are poly meshes (static geometry), convex meshed, triggers, events, clothes and fluids implemented. Next step would be to implement force fields, so I could create vortexes for particles. This video shows what can be done using emitter and fluid parameters.

While other nodes are working fine, because I only manipulate dx mesh data, which is provided by device pointer. There are no intensive operations going on in vvvv. I only supply initial data and then PhysX does the rest… I manipulate meshes in code and it works fine…

Another thing that I don’t know how to do is input dx mesh into c# node, or input dx texture. I have created helper patches that split mesh into vertex buffer and feed vector data into my node do it could cook PhysX mesh and create output mesh. Because this is done only once it is not creating performance problems…

Here is old rigid bodies video, if someone is interested…
https://vimeo.com/34857254

ft · July 18, 2012, 12:38pm

Sounds interesting.

Did you consider OpenCL too for parallel processing? I guess it will allow for some things to be done in GPU in an easier way than to convert everything to a texture first.

Elias · July 18, 2012, 1:46pm

to your initial question, yes a pointer to the unmanaged memory can be retrieved. in latest alpha (28) you can do this by referencing VVVV.Hosting.dll, and use the classes which are below the VVVV.Hosting.IO.Pointers namespace.

for example:

using VVVV.Hosting.IO.Pointers;
...
[Input("Foo")](Input("Foo"))
FastValueInput FFooIn;
...
var pointerToUnmanagedMemory = FFooIn.Data;
...

now i’m aware that this is still managed code, but at least you don’t have any memory copies going on.

writing unmanaged plugins only is not possible, as the whole plugin hosting stuff was built with c# in mind. in theory this could be done of course, as vvvv itself is unmanaged, but there’re no plans to do so yet.

have a look at this thread started by vux a while ago, contains a lot of ideas to this topic: https://discourse.vvvv.org/t/6849

oh and texture/mesh inputs are on our todo.

mrboni · July 18, 2012, 2:50pm

Are there any major obstacles to implementing opencl in vvvv?

And whatever happened to flateric’s massively optimised transform and colour nodes?