Why are my jobs waiting due to QOSGrpCpuLimit?
On the DCSR clusters long jobs (more that one day) are only allowed to occupy 2/3 of the CPUs at any one time.
This is for the following reasons:
- To prevent the clusters being blocked by long running jobs
- To allow short jobs (less than 24 hours) to run quickly
When you submit a job it is automatically assigned a Quality of Service (QoS) policy which is used to apply this restriction.
If you see your jobs pending with the reason QOSGrpCpuLimit then it means that long running jobs are currently occupying all the available CPU slots and it will not run until some long tasks complete.