This looks kind of similar but there are no real answers. I'm just going to throw out a bunch of my thoughts here.
Is the service truly stopped? The console might not be a 100% representation. I've had some issues at times. Check your service manager (whatever platform you're on).
It might be locking on a specific request and taking up your Server CPU until you eventually get that exception. Then all resumes as normal. That's also why I don't think the service actually stopped.
More tips for tracking the problem would be to set your Log levels to Trace (especially Jetty). See if you can maybe pinpoint a particular Microflow/scenario in which it's reoccurring.
Also, did this only start on a certain build of your project, and if so, can you revert? And lastly, can you do any testing on another server?
UPDATE :
Jason, thank you for responding.
That's the thing, the server isn't stopped, it just seems like it is. The moment the users log out, it becomes responsive again.
Setting logs to trace is not viable, as we have 100 users working in the client, over 500 stores connecting via webservices (as well as numerous other systems also pushing data to the system). This will result in log files being written to at an immense rate and consuming resources.
We have a test, dev and uat server, and this has never happened on any of them, as the traffic isn't nearly enough.
Our last release was approx. 6 months ago (with crucial requirements) and this issue started about 2 months ago.
We are busy planning an upgrade to MENDIX 4, but this might still take months before it is productionised. I was hoping to resolve it in the interim.