Can you tell us quickly how this works?
In particular - if I've got a single shared variable that is, say, 100megs in size, and I have 100 threads that need to use it all at once, is this going to consume 100megs in total, or 10gigs? Like - are you copying stuff back and forth between some shared place to make this work, or does it work by having the shared stuff operated-on in-place ?