Why aren't my batch jobs starting?The RALPP Batch system uses the Maui scheduler to fairly allocate CPU resources between users, groups and Tier 2 (Grid) and Tier 3 (local) use. This means that which job starts next not on how long it has been in the queue but on recent usage of the batch system by the different user groups. So, although you job might head the list of queued jobs reported byqstat that does not mean that it will necessarily start soon.
Luckily Maui provides a few commands to look at the current status which allow you to work out what is going on.
The first showq is fairly straight forward in that it displays the current job queue in priority order:
ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 5768701 prdcms12 Running 1 2:09:06:18 Sat Oct 23 21:30:51 5768715 prdcms12 Running 1 2:09:06:18 Sat Oct 23 21:30:51 ... 5792780 pltatl09 Running 1 4:00:00:00 Mon Oct 25 12:24:33 5792777 prdh107 Running 1 4:00:00:00 Mon Oct 25 12:24:33 | ||||||||
Changed: | ||||||||
< < | 1133 Active Jobs 1133 of 1640 Processors Active (69.09%) 225 of 243 Nodes Active (92.59%) | |||||||
> > | 1500 Active Jobs 1502 of 1636 Processors Active (91.81%) 223 of 242 Nodes Active (92.15%) | |||||||
IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME | ||||||||
Added: | ||||||||
> > | 5831310 pltatl09 Idle 1 4:00:00:00 Thu Oct 28 15:04:06 5831311 pltatl09 Idle 1 4:00:00:00 Thu Oct 28 15:04:06 ... 5825489 prdh107 Idle 1 4:00:00:00 Wed Oct 27 23:21:21 5830811 prdh107 Idle 1 4:00:00:00 Thu Oct 28 14:04:08 | |||||||
Changed: | ||||||||
< < | 0 Idle Jobs | |||||||
> > | 596 Idle Jobs | |||||||
BLOCKED JOBS---------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME | ||||||||
Added: | ||||||||
> > | 5831115 atlassgm Idle 1 4:00:00:00 Thu Oct 28 14:46:15 5831178 atlassgm Idle 1 4:00:00:00 Thu Oct 28 14:52:33 5831179 atlassgm Idle 1 4:00:00:00 Thu Oct 28 14:52:33 | |||||||
Changed: | ||||||||
< < | Total Jobs: 1146 Active Jobs: 1146 Idle Jobs: 0 Blocked Jobs: 0 | |||||||
> > | Total Jobs: 2099 Active Jobs: 1500 Idle Jobs: 596 Blocked Jobs: 3 | |||||||
-- ChrisBrew - 2010-10-25 |
Why aren't my batch jobs starting?The RALPP Batch system uses the Maui scheduler to fairly allocate CPU resources between users, groups and Tier 2 (Grid) and Tier 3 (local) use. This means that which job starts next not on how long it has been in the queue but on recent usage of the batch system by the different user groups. So, although you job might head the list of queued jobs reported byqstat that does not mean that it will necessarily start soon.
Luckily Maui provides a few commands to look at the current status which allow you to work out what is going on.
The first showq is fairly straight forward in that it displays the current job queue in priority order:
ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 5768701 prdcms12 Running 1 2:09:06:18 Sat Oct 23 21:30:51 5768715 prdcms12 Running 1 2:09:06:18 Sat Oct 23 21:30:51 ... 5792780 pltatl09 Running 1 4:00:00:00 Mon Oct 25 12:24:33 5792777 prdh107 Running 1 4:00:00:00 Mon Oct 25 12:24:33 1133 Active Jobs 1133 of 1640 Processors Active (69.09%) 225 of 243 Nodes Active (92.59%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 0 Idle Jobs BLOCKED JOBS---------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 1146 Active Jobs: 1146 Idle Jobs: 0 Blocked Jobs: 0-- ChrisBrew - 2010-10-25 |