Kinect depth-to-world conversion in pixel shader! But

After looking at the CIANT particle stuff, I decided to see if I could use a shader (effect) to convert a depth texture (16 bit greyscale) from a Kinect into XYZ data. Lo and behold it works, and most surprisingly, without having to do any data scaling.

In other words, the data returned is the proper magnitude and polarity - so by putting the world X, Y and Z data (in meters) into the R, G and B values, they can actually be negative and outside the range of 0 to 1.

Attached is the shader and an example file. You don’t need a Kinect as it uses an included BMP file of a captured Kinect image. Move the mouse cursor around in the render window and left-click to measure XYZ. Note that if you move the cursor “up” the drawers to the right of me knee, that the X and Y remain (relatively) constant and only the Z changes.

BUT… I had naively assumed I would be able to do something useful with all this nice point-cloud data in the resulting texture, only to find that plugins can’t take textures as input! Bwahhhh!

So other than using “Pipet”, does anyone know of a method to get this nice point-cloud data OUT of the texture? Thanks!

Effect to convert kinect depth texture to world XYZ (77.7 kB)

if you need to access world data in a plugin check
https://github.com/elliotwoods/VVVV.Nodes.OpenNI
specifically, the World image output outputs an RGB32F image (i.e. xyz as an ‘array’ of 3x32bit floats)

also
https://github.com/elliotwoods/VVVV.Nodes.OpenNI/blob/master/Package/Shaders/ToWorld.fx is performing similar to as you described above (depth to world on GPU)
One (minor) issue with this is that the calibration per kinect device is different, and is stored inside the kinect device itself. OpenNI and libfreenect can lookup this data to perform the correct unprojection (and alignment between rgb and depth).

So the unprojection performed in this shader is basically accurate for 1 kinect out there, and is ‘good enough’ for anything else generally

I spoke to some guys from OpenNI who told me about how the do the calibration.

Thanks, Eliot. I’ve done pretty much the same thing in my experimenting with a plugin, but performance is still an issue. I’ve optimized my plugin by providing a spreadable input of bounding boxes to dynamically reduce the conversions done (and amount of data returned) to only the areas of interest, but I still need more speed. This is a real issue when trying to keep frame rates and interactivity to 30fps particularly with two Kinects, and I’m on a fast machine. Doing all that in a shader is the obvious path, but extracting the data is the roadblock.

So from where I’m sitting, it would be great to have access to either geometry shaders (1st choice), textures into plugins (2nd choice), or a Pipet that could do block operations with subsampling more efficiently (last).

Concerning accuracy, there are really three kinds: absolute accuracy (is the data correct in the real world), relative accuracy (is a move of 25mm really 25mm), and repeatability (same data over time). For the stuff I’m doing (hand and body motion), I really am only concerned with the last two, and even then my needs are probably well within the current limits without further cal (as I suspect most applications are). I could understand though that for modeling and environment scanning absolute accuracy becomes more important.

Thanks again!

Hmmm, in thinking about it more, I think even for model and environment scanning absolute accuracy is not needed as those are building up a collection of relative measurements. The only time I think you need absolute accuracy is when someone(thing) is interacting with a world object/position, such as a virtual/projected keyboard that has no features to do relative measurement against. Just thinking out loud.

What are you trying to do with the data? 640x480 xyz points is a significant amount of data to be processing on the cpu.

Why not explore options using shaders instead of plugins?

I am doing this exact thing to make these visuals: http://www.youtube.com/watch?v=rqPpSFa_bU4&feature=channel_video_title
It’s very similar in function to the ciant particle system.

I don’t really understand what you are asking, ciant particles doesn’t use plugins, but shaders… Just plug in your xyz texture into the particle position inputs and wallah.

You are on the right track with your patch tho. You just need to work with shaders instead of plugins… get rid of the pipet and replace with ciant particles.

I agree GPU is the way to go for major number crunching like this. But I want to reduce the data down in the shader, not just process it into visuals. To do things like feature extraction and fun stuff like gesture recognition and motion input into other processes, sooner or later the numbers have to come out of the shader and back to the CPU. Right now to do that kind of stuff it all has to happen on the CPU, a huge waste and really too slow for high-res seamless interactivity.

As I said above, I had ASS-U-ME’d there was at least a plugin way to do this, but the only way I can think of right now in vvvv is to put the results in pixels and Pipet them out. That should work for some low-data things, such as object positions, contours, etc., but I’d rather put my limited available effort into a better solution.

exactly what kind of processing are you needing to do?
some stuff is inefficient with GPU (spectral operations, e.g. find closest point)

accuracy is a seperate discussion of course.
i’m not sure what the error level is, but i believe that the deviation of rgb<>depth is much more than the deviation depth<>xyz between kinect devices.
also the error is camera intrinsics/extrinsics, i.e. doesn’t apply any sort of linear distortion, therefore repeatable but wrong in both absolute and relative.
scanning a room would be affected, best use the OpenNI / libfreenect world xyz in those cases.
but as i said, depth<>xyz (i.e. intrinsics) may only be negligible.

to note:
i can run easily at 60fps mainloop, with the cpu crunching world XYZ at 30fps (sensor runs at 30fps, so that calc should only need run at 30)
Core i7 3GHz.
perhaps you’ve got a threading bug / aren’t threading / are doing too many conversions(?)

final note:
yes it’s incredibly slow to get the data in the graph (e.g. 6404803 ‘Values’ going between nodes)
currently, the only way around this is custom data types between nodes (e.g. CVImageLink)

Where is this code running on the CPU, in a plugin? How is the data getting into it? Where is it going? I assume this must be in reference to your custom data type, since as you note using spreads of Vector3Ds only gets me a few FPS.

So how does one who is not a devvvv do a custom data type? Did I miss a doc page? I haven’t looked at your OpenCV stuff yet, so are you saying I can use it to move this kind of data around between plugins (dynamic!) efficiently? Thanks!