How to run DaVinci on the Grid

Reconsider

First off please reconsider, the ppd batch system works a lot more seamlessly. If you can, use it. GridTips might be a useful page.

Setup

You will need all the files that are mentioned on this page HowToRunDaVinciAtRAL with the exception that the python file pointing to the files you are running over need to be LFNs (logical file names) as opposed to PFNs (physical file names) which they were on the last page, these can be found in the lhcb book keeping software.

First lets create a grid proxy and run the lhcb book keeping command, lhcb_bkk

lhcb-proxy-init
lhcb_bkk

This window allows you to save files of LFNs pointing to all the latest and greatest LHCb data.

In the default view (top radio button) the data is in /LHCb/Collisions10/Beam3500GeV-VeloClosed-MagDown/ /LHCb/Collisions10/Beam3500GeV-VeloClosed-MagUp/. there should be 1000/nb in Reco05-Stripping09-Merged and all further new data is added to Reco05-Stripping09-Prescaled-Merged, but this is prescaled. This should be like this until around Christmas 2010, so if its after then there are probably new folders in here to investigate. When you go into these sub folders a bit you will find CALIBRATION.DST, DIELECTRON.DST, etc. For B2Dpi or similar the Hadronic one is the correct one or Bhadron one. Bhadron and Charm were combined to make the Hadronic one.

Monte Carlo data is stored in /MC/2010/Beam3500GeV-VeloClosed-MagUp-Nu1/ and /MC/2010/Beam3500GeV-VeloClosed-MagDown-Nu1/ The sub folders in here have all the different decays you could want, but if the latest one doesn't check the older ones in that directory. Currently 2010-Sim03Reco03-withTruth has the most decays but there is also 2010-Sim04Reco03-withTruth and 2010-Sim07Reco06-withTruth as well with fewer decays.

Once you have found the data you want open up the sub folders as far as they go and double click "Nb of events/files" then click save files in the window that appears to save the python file.

This is the data you will be running over on the grid.

Use the files from HowToRunDaVinciAtRAL, but change the ganga script so that the backend is Dirac and not the ppd batch or interactive, like so:

t = JobTemplate( application = DaVinci(version = "v25r7p1"))

wd = "/home/hep/uoh35620/stuff/batch-generate/str09Data/grid/merged-all/"
t.application.optsfile = [wd+"DaVinciB2DX.py", wd+"LHCb_Collision10_429155_Reco05-Stripping09-Merged_90000000__HADRONICDST.py"]
t.outputsandbox = ["B2Dpi_str09.root"]

t.splitter = DiracSplitter()
t.splitter.filesPerJob = 12
t.splitter.ignoremissing = True

t.backend = Dirac()
#t.backend.settings['BannedSites'] = ['LCG.CERN.ch']

j = Job(t)
j.submit()

If one particular site is failing a lot then you can ban it with the commented out option above, however I often found it best to wait until tomorrow and just re-submit the sub job as often that is the only place with the data. I also found if a new version of DaVinci came out, it might not be installed on all grid machines so it failed, I specified a slightly older release than current which seemed to work nicely. It is advisable that you run a test job with just one input file so you can see how long it takes per file. Your grid jobs should aim to at least 4 hours long, or longer if you have many input files, I think the max is 24 hrs.

Submitting

If the above is saved as ganga.py, the following will submit the jobs.

SetupProject Ganga
ganga ganga.py

I suggest doing all this on heplnx111 or 112 as they are much faster than 108. It might also be wise to do it in a screen (man screen) instance as then you can go home and turn off your workstation.

If it cannot find replica information or you have any issues delete your proxy and re-create it, see GridTips

If that does not work, congratulations your first grid issue! Ask raja about the grid outages today, ask lhcb-distributed-analysis@cern.ch, try again tomorrow. Often it just gets fixed so it just works the next day or after your proxy has expired and been recreated.

Getting the output

Once the jobs has been submitted! The next mammoth task begins. Getting ganga to download the data from the grid. Keep an eye on the log in ~/.ganga.log with "less +F ~/.ganga.log" for example, if it has not done anything for a few hours restart ganga. when you stop it if it complains that it still has treads running (they're probably stalled), say yes, to get out of there, then kill or kill -9 everything related to ganga and python that shows up in ps, "kill 1234" for example. And run ganga again to carry on downloading the data.

If it cannot find replica information or you have any issues delete your proxy and re-create it, see GridTips

Analysing the output

Its a lot easier to analyse the output if its in one big file so see HowToMergeRootFiles#Merging_many_jobs

-- ThomasBird - 2010-09-07

Topic revision: r1 - 2010-09-07 - ThomasBird