Entering edit mode
milos-91
•
0
@milos-91-17999
Last seen 6.1 years ago
Hi!
I have a problem with running the PureCN when --parallel
flag is active. The error is:
Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection
Calls: runAbsoluteCN ... .send_EXEC -> <Anonymous> -> sendData.SOCK0node -> serialize
Similar error has already been reported and it is due to lack of RAM memory. But, in this case, I've checked the metrics, and only around 15% of RAM memory and 14% of CPUs are used.
Any idea why this is failing?
Thank you very much!
The error isn't that too much memory is being used, but that the amount of data being sent from the worker to the manager or vice versa is too large for the type of connection implemented. I'm not familiar with PureCN but the overall strategy is to reduce the amount of data sent or returned to the worker, e.g., by analysis of smaller chromosome regions????
Thanks Martin and Milos. There is definitely some room for improvement, although I've never seen this error even in our whole exomes. Martin, is there an easy way to profile the memory usage of workers?
Actually I'm not 100% sure that is the amount of data being serialized; the first step would be to get a reproducible example...
Is it possible that this happens when a worker was idle for a long time? Some workers exit early after a few minutes, other can run for more than our in big datasets. This is something I can probably find an easy workaround for.
Do you mind trying version 1.13.4? It should balance the workload much better across nodes. This should reduce the runtime significantly and might decrease the chance of such communication errors.