Current Status of the PPD Linux Systems
Current
PPD Linux systems are running "at Risk" on backup air conditioning however the current setup appears to be performing well and Estates and PPD Computing Group are fairly confident that we will be able to lift the "at Risk" on Monday or Tuesday.
- Most Services are up and running normally
- 52 Worker Nodes in Lab 8 (208 batch slots) have been shut down - This includes all the SL4 batch nodes
- We had problems overnight with one worker node not accepting jobs which caused a large number of lost jobs.
- The main home file server for Linux is being rebuilt, the home file system is currently being served of a backup server.
25/02/2010
PPD Linux systems are currently running at risk with reduced capacity while additional temporary air conditioning is installed in R1 Lab 8.
- Interactive Linux nodes heplnx103, heplnx105, heplnx106, heplnx107 and heplnx108 have been shutdown to reduce the heat load
- A number of disk storage nodes have been shutdown
- Grid and Local Batch jobs have been paused because of this
- Some files will be unavailable, however new files should be able to be written
- hepcvs has been shut down
- Three of the four TCAD nodes have been shutdown.
- 52 Worker Nodes in Lab 8 (208 batch slots) have been shut down - This includes all the SL4 batch nodes
Last 24 hour status according to WLCG SAM tests |
Jobs Currently running/queued on the farm |
|
|
Future planned interventions affecting the PPD Linux Systems
%CALENDAR{topic="PPDLinuxStatusEvents" showweekdayheaders="1" width="90%" cellheight="100"}%
%CALENDAR{topic="PPDLinuxStatusEvents" month="+1" showweekdayheaders="1" width="90%" cellheight="100"}%
as a list
--
ChrisBrew - 2010-02-25