AppEngines / Slots / Nodes / CPUs / CPU load / CPU usage

2
Can somebody explain the differences between the terms AppEngines / Slots / Nodes / CPU's and also their relation to CPU load / CPU usage when running an application in the Mendix cloud? For example: We have an application running in the Mendix cloud, on the deployment page we see we use 3 app engines, on the Application node CPU usage graph we see that the idle time has a max of 400%, How does 3 app engines correspond to a CPU usage graph showing 400% idle time? For example: For the same application/setup we also get CPU warning alert emails with CPU loads around 1.2, while the CPU usage graph at that time shows the CPU idle time has a min of 300+% How does a CPU load of 1.2 correspond to a CPU usage graph showing 300+% idle time?
asked
1 answers
2

There are probably a lot of people that can explain this better but I'll make an attempt at this.

The scale of the graphs you see in the application monitoring does not directly represent the size of your Cloud Node. The 100% scale of the graph does not indicate the maximum you can use on your environment.

Load indicates usage of your (virtual) cores, so a load of 1.2 means that 1 core is consistently at 100% and a second core is at 20%. This is something that is ok for a short period of time. But having a load of more than 1 means that there is a single microflow (thread) is running for a long period of time an consistently uses a whole core.
I found this website giving a very good explanation about load: http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages

As you are mentioning you have a 300% idle time, that means that the activities are waiting for 300% of the time because of your load. The 20% overhead that can't run right away is very costly in your case.

CPU usage would literally mean how active your cpu is, if it is at 400% I would assume that you have 4 cpu's running at 100%.


An overly simplified scenario: If you have 6 tasks, 4 cpu's and each cpu can run 1 task before completing.
You have a load of 1.2, that means all 4 cpu's are 100% occupied processing the first 4 tasks.
The other 2 tasks are waiting until a cpu frees up.
If a single task takes about 10 minutes to execute, with a 300% idle/wait time means that your task is idle 300% of the time. In other words it's waiting for 30 minutes


What you should focus on first is to find the process that causes this because this isn't something that you want to keep. Getting a bigger server would only mask the problem. If you have identified the exact process that is causing this you can post that on the forum too, there are probably enough people out here that could give you suggestions on improving it.

answered