gilbert-mh - 2024-01-12

Hello all,
I have been trying to implement a TCP client on a Festo PLC (CPX-E-CEC-M1) and it looks like it works well except that after some time (greatly varies between a few hours to more than 100h) my PLC crash. When I look into the log file the only thing I see is that before the crash happens a few kernel warnings : N0HZ_local_softirq_pending 80 and then the crash.
I've looked into this warning and from what I could find on the net it seems that this is warning is triggered when the ethernet link is down.

I've tried to correct this bug for quite some time and what I know is that :
- The crash is caused by my TCP client, when I remove it from my code I see no crash
- The crash happens more quickly the more the TCP client is used.
- The time before the crash is not directly proportional to the number of communications or their size. But it looks like it is just more likely to happen if the client connect to the server at a higher frequency.
- The precedent observation makes it seem unlikely that the crash is caused by some memory overflow because then the crash speed would be more proportional to the amount of data exchanged.

SO from these observations, I believe that the crash could be caused by the PLC trying to connect to a server while there is some kind of issue with the ethernet link resulting in the PLC getting stuck in some indefinite state and making it crash. This still seems a bit unlikely to me because if the ethernet is down it simply shouldn't be able to contact the server and the communication would just fail which doesn't cause my PLC to crash.

Has anyone encountered the same kind of problem (with the same kernel message) ? I am pretty sure the warning is not the direct cause of the crash but just an indicator that something is wrong with my PLC.

Thanks in advance