Some more boygroup issues

Some more issues found using boygroup:

  • Clients send ping via TCP (eg: due to fact there is no timeout it can potentially hang the machine). Ping should be send via UDP.
  • If one client gets frozen for any reason, and server tries to send a TCP message (either patch message or TCP message due to packet size), server will also freeze. Only way to avoid that is to find which client is frozen and restart it.
  • There’s way too many active bridges, so network usage is rather high. Attached 2 examples where running as server and bridge should not be active.
  • Also it’s for the moment very hard to see which link is a bridge and if it’s active or not (not the count, but to visualize in patch). So some hint to tell a link is a bridge and to see if active would also be pretty handy (for now have to capture network packets and check node path in there)

Sample1.v4p (1.6 kB)
Sample2.v4p (6.0 kB)

hey vux!

  • If one client gets frozen for any reason, and server tries to send a TCP message (either patch message or TCP message due to packet size), server will also freeze. Only way to avoid that is to find which client is frozen and restart it.

… is reproducable and we work on that.
Our plans involve again a rewrite of one basic networking component, where indy just fails. The planned solution involves a server side tcp send queue and a worker thread. Upon queue overflow we would then mark this client as gone.

  • Clients send ping via TCP (eg: due to fact there is no timeout it can potentially hang the machine). Ping should be send via UDP.

We may be able to skip the ping. We’ll see. After the above solution works we might be able to manage connection status only from the server side and send clients that were marked as gone a new graph when they seem to be back again from a server side perspective.

There’s way too many active bridges, so network usage is rather high. Attached 2 examples where running as server and bridge should not be active.

If i understand you right, you expect bridges to only be active when any client really needs the data. In case of many clients and UDP bridges this would involve quite some checks. I don’t think this can be done in a reasonable way. Just use a grey S+H node and only change values of bridges when you expect any client to use the data.

Also it’s for the moment very hard to see which link is a bridge and if it’s active or not (not the count, but to visualize in patch). So some hint to tell a link is a bridge and to see if active would also be pretty handy (for now have to capture network packets and check node path in there)

In debug mode at least the input pin on the boygrouped node should visualize if data has been sent in the last second or so. When data is sent (active bridge) it turns yellow and from there fades to black (=inactive bridge).
of course showing the same on a link is not a bad idea.

the tcp server will now use an in- and outbound queue (was inbound only before) per peer client to receive/send messages. this fixed the server freeze when client freezes on a local test setup here.
a build containing this fix is available here if you need it right away.

note that further logic like disconnecting the frozen client when queue runs full still needs to be implemented - see gregsn’s post.

Ok so let’s take the first sample again (attached).

If I got AvoidNil as module it bridge reports as always active, If i got the module flattened (right) bridge is not active unless you change value.
Then if you show AvoidNil and hide it back, the bridge is not active anymore either.

So it’s not a point of a S+H, it’s a point of bridge behavior being totally inconsistent. In the case from the patch I should either have 2 active bridges or 0, but not 1. This makes any type of debugging particularly hard.

And here this is an extremely simple patch, most boygroup patches are quite much more complex.

Got quite lot of other boygroup related issues, I’ll compile another list soon.

Sample1.v4p (6.6 kB)

yea ok. that was wired. even though, after debugging, it made far more sense then i would have expected at first (non).
quick fix coming up shortly