How to run DaVinci at RALCreating an ntupleReal Datato create an ntuple with real data you need
Data Available at RALThere are several sets of data at RAL MC with dipole up, MC with dipole down, low mu (number of interactions per event) data from before the St. Petersberg conference with both dipole up and down. The data is from the Bhadron B->Dpi stripping stream while the MC data is full of signal (B->Dpi) events.data_2010_down -> /opt/ppd/lhcb/nraja/MC2010/6_7_2010_down data_2010_up -> /opt/ppd/lhcb/nraja/MC2010/6_7_2010_up MC_2010_down -> /opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST MC_2010_up -> /opt/ppd/lhcb/nraja/MC2010/MC_2010_429376_DST | ||||||||
Changed: | ||||||||
< < | The Bhadron line has been joined with the Charm stripping line to form the Hadronic stripping line, so you can use this to run on the grid with the more recent data in the stripping-09-merged in book keeping. More on this later. | |||||||
> > | The Bhadron line has been joined with the Charm stripping line to form the Hadronic stripping line, so you can use this to run on the grid with the more recent data in the stripping-09-merged in book keeping. If you want to run on the grid read HowToRunDaVinciOnTheGrid after you are able to run locally, however data can be downloaded locally to RAL, if what you want to run on is not currently at RAL and it can fit in a few 1000GB. | |||||||
Running over local dataMy python file pointing to my data looks like:from Gaudi.Configuration import * EventSelector().Input = [ # for i in /opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/* ; do echo "\" DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done " DATAFILE='pfn:/opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/00006639_00000001_1.dst' TYP='POOL_ROOTTREE' OPT='READ'", " DATAFILE='pfn:/opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/00006639_00000002_1.dst' TYP='POOL_ROOTTREE' OPT='READ'", # ... ] and can be mostly generated with the for command commented in the file and the directory changed for any of the ones listed in the above section. All of the most recent files for running on local data can be found in the sub-directories of this directory /home/hep/uoh35620/stuff/batch-generate/dataSingleTree/ incase these dont work. The DaVinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name. Several things will need to be regularly changed in this file for your specific needs.
. /opt/ppd/lhcb/lhcb/scripts/lhcbsetup.shAfter restarting your shell all this can be run interactively with the following commands. SetupProject DaVinci gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txtthe output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below. Several warnings, README!SymlinksBe careful when creating the file pointing to the data as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.OR mode sequencersWhen creating the DecayTreeTuple in the DaVinci python file it has been found that creating the ntuple using a GaudiSequencer in OR mode with another ntuple (see example below) seems to randomly throw away events and not record all the ones it should.// example of what not to do BachelorOrKstarDecayTreeTuple = GaudiSequencer("BachelorOrKstarDecayTreeTuple") BachelorOrKstarDecayTreeTuple.ModeOR = 1 BachelorOrKstarDecayTreeTuple.Members = [BachelorDecayTreeTuple,KstarDecayTreeTuple] Running on ppd batchyou can run these tasks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these tasks.#!/bin/bash stda="`date`" echo $stda . /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt echo echo $stda date echo echo Done.The dot at the beginning of the line ending in "SetupProject.sh DaVinci" is very important. If this is not there the paths will not be set up properly for your script. When using this script there are some things you settings you might want to change:
qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job. MonitoringThe following command will list your currently running jobs. you will need to replace uoh35620 with your user name.qstat -u uoh35620For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files. watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command. alias qstatme="qstat -u uoh35620" PBS Batch ErrorsI have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk Job Name: script Exec host: heplnc308.pp.rl.ac.uk/0 An error has occurred processing your job, see below. Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0 Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/ >>> error from copy /bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory >>> end error output Splitting the job with ganga on the ppd batchUsing ganga to run DaVinci can be advantageous since it is able to run it on CERN batch, PPD batch, the grid or locally very easily also it can split the job into many pieces for faster processing. Ganga likes to control the input and output files of the various jobs and sub jobs so it is best not to use absolute output paths for the ntuples otherwise all the sub jobs overwrite one another's root files, so by changing the TupleFile property in the DaVinci python file to something like "B2Dpi_Up_6_7_2010.split.direct.root", this can be prevented. To fire up ganga, in a new shell do:SetupProject Ganga gangathen in ganga create a job template with the correct options, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are different to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data. Additionally the name of the file to merge from the subjobs will need to be changed to the one specified in the DaVinci python file. The name option which would be useful to keep track of the submitted jobs does not appear to work with the PBS system as ganga tries to pass qsub -N name option which qsub does not like so it wont submit the job when you set a name. t = JobTemplate( application = DaVinci() ) dir= "/home/hep/uoh35620/stuff/batch-generate/data/up/all/" t.application.optsfile = [dir+"DaVinciAndCandidatesUp.py", dir+"6_7_2010_up.direct.py"] t.splitter = SplitByFiles() t.splitter.filesPerJob = 2 t.backend = PBS() t.backend.queue = "prod" t.merger= RootMerger() t.merger.files = ['Bu2DStarplusX.root'] j = Job(t) j.submit()you can use the qstat commands mentioned before to check that it has worked. also you will be able to look at the value of j.status and j.subjobs to see the progress of the job as a whole and also the subjobs. MC DataCreating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need
| ||||||||
Changed: | ||||||||
< < | The python file pointing to the MC data is the same format as before just with different filenames, see the Real Data section. | |||||||
> > | The latest versions of all my MC files can be found in /home/hep/uoh35620/stuff/stripping/MCtriggerSettings | |||||||
Added: | ||||||||
> > | The python file pointing to the MC data is the same format as before just with different filenames pointing to the MC data, see the Real Data section, on where exatly the MC data is and how to generate the necessary file. | |||||||
the DaVinci file that you need is this one here DaVinci.py.txt.
Again you might want to change a few things to make this work how you want it to.
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Then this can again be run like the real data, with the following command.
SetupProject DaVinci gaudirun.py DaVinci.py data.py | tee output.txt | ||||||||
Changed: | ||||||||
< < | The output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate. | |||||||
> > | The output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but chage the filenames where appropriate. | |||||||
List of FilesBelow are all the files mentioned on this wiki
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
|
How to run DaVinci at RALCreating an ntupleReal Datato create an ntuple with real data you need
Data Available at RALThere are several sets of data at RAL MC with dipole up, MC with dipole down, low mu (number of interactions per event) data from before the St. Petersberg conference with both dipole up and down. The data is from the Bhadron B->Dpi stripping stream while the MC data is full of signal (B->Dpi) events.data_2010_down -> /opt/ppd/lhcb/nraja/MC2010/6_7_2010_down data_2010_up -> /opt/ppd/lhcb/nraja/MC2010/6_7_2010_up MC_2010_down -> /opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST MC_2010_up -> /opt/ppd/lhcb/nraja/MC2010/MC_2010_429376_DST | ||||||||
Changed: | ||||||||
< < | Since the Bhadron stripping line is starting to be available for current data stripping slowly there is more data appearing in book keeping, so you can run jobs on the grid | |||||||
> > | The Bhadron line has been joined with the Charm stripping line to form the Hadronic stripping line, so you can use this to run on the grid with the more recent data in the stripping-09-merged in book keeping. More on this later. | |||||||
Changed: | ||||||||
< < | Running over the data | |||||||
> > | Running over local data | |||||||
My python file pointing to my data looks like:
from Gaudi.Configuration import * EventSelector().Input = [ # for i in /opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/* ; do echo "\" DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done " DATAFILE='pfn:/opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/00006639_00000001_1.dst' TYP='POOL_ROOTTREE' OPT='READ'", " DATAFILE='pfn:/opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/00006639_00000002_1.dst' TYP='POOL_ROOTTREE' OPT='READ'", # ... ] and can be mostly generated with the for command commented in the file and the directory changed for any of the ones listed in the above section. | ||||||||
Changed: | ||||||||
< < | the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name. | |||||||
> > | All of the most recent files for running on local data can be found in the sub-directories of this directory /home/hep/uoh35620/stuff/batch-generate/dataSingleTree/ incase these dont work. | |||||||
Added: | ||||||||
> > | The DaVinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name. | |||||||
Several things will need to be regularly changed in this file for your specific needs.
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Added: | ||||||||
> > |
| |||||||
Deleted: | ||||||||
< < | It is also worth noting that recently a new version of DaVinci seems to have come out where DV.Lumi is set to true by default, which it was not previously, this has the effect of putting all your data in lumi.root whether or not you want it there, since I don't know the point of this variable, I set it to false. Check the DaVinci.py file for this if not add it. | |||||||
Before running this you need to setup your environment to find the LHCb programs like SetupProject, to do this add the following line to your .bashrc and restart your shell. Be careful, the dot at the start of the line is very important.
. /opt/ppd/lhcb/lhcb/scripts/lhcbsetup.sh | ||||||||
Changed: | ||||||||
< < | After restarting your shell all this can be run with the following commands. | |||||||
> > | After restarting your shell all this can be run interactively with the following commands. | |||||||
SetupProject DaVinci gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txtthe output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below. Several warnings, README!SymlinksBe careful when creating the file pointing to the data as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.OR mode sequencersWhen creating the DecayTreeTuple in the DaVinci python file it has been found that creating the ntuple using a GaudiSequencer in OR mode with another ntuple (see example below) seems to randomly throw away events and not record all the ones it should.// example of what not to do BachelorOrKstarDecayTreeTuple = GaudiSequencer("BachelorOrKstarDecayTreeTuple") BachelorOrKstarDecayTreeTuple.ModeOR = 1 BachelorOrKstarDecayTreeTuple.Members = [BachelorDecayTreeTuple,KstarDecayTreeTuple] Running on ppd batch | ||||||||
Changed: | ||||||||
< < | you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks. | |||||||
> > | you can run these tasks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these tasks. | |||||||
#!/bin/bash stda="`date`" echo $stda . /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt echo echo $stda date echo echo Done. | ||||||||
Changed: | ||||||||
< < | The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script. | |||||||
> > | The dot at the beginning of the line ending in "SetupProject.sh DaVinci" is very important. If this is not there the paths will not be set up properly for your script. | |||||||
When using this script there are some things you settings you might want to change: | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job. MonitoringThe following command will list your currently running jobs. you will need to replace uoh35620 with your user name.qstat -u uoh35620For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files. watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command. alias qstatme="qstat -u uoh35620" PBS Batch ErrorsI have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk Job Name: script Exec host: heplnc308.pp.rl.ac.uk/0 An error has occurred processing your job, see below. Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0 Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/ >>> error from copy /bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory >>> end error output Splitting the job with ganga on the ppd batchUsing ganga to run DaVinci can be advantageous since it is able to run it on CERN batch, PPD batch, the grid or locally very easily also it can split the job into many pieces for faster processing. | ||||||||
Changed: | ||||||||
< < | Ganga likes to control the input and output files of the various jobs and sub jobs so it is best not to use absolute output paths for the ntuples otherwise all the sub jobs overwrite one another's root files, so by changing the NTupleSvc().Output property in the DaVinci python file to something like [ "FILE1 DATAFILE=B2Dpi_Up_6_7_2010.split.direct.root' TYP='ROOT' OPT='NEW'" ], this can be prevented. | |||||||
> > | Ganga likes to control the input and output files of the various jobs and sub jobs so it is best not to use absolute output paths for the ntuples otherwise all the sub jobs overwrite one another's root files, so by changing the TupleFile property in the DaVinci python file to something like "B2Dpi_Up_6_7_2010.split.direct.root", this can be prevented. | |||||||
Changed: | ||||||||
< < | Fire up ganga with | |||||||
> > | To fire up ganga, in a new shell do: | |||||||
SetupProject Ganga ganga | ||||||||
Changed: | ||||||||
< < | then in ganga create a job template with the correct options like so, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are differnt to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data. Additionally the name of the file to merge from the subjobs will need to be changed to the one specified in the DaVinci python file. The name option which would be useful to keep track of the submitted jobs does not appear to work with the PBS system as ganga trys to pass qsub -N name option which qsub does not like so it wont submit the job when you set a name. | |||||||
> > | then in ganga create a job template with the correct options, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are different to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data. Additionally the name of the file to merge from the subjobs will need to be changed to the one specified in the DaVinci python file. The name option which would be useful to keep track of the submitted jobs does not appear to work with the PBS system as ganga tries to pass qsub -N name option which qsub does not like so it wont submit the job when you set a name. | |||||||
t = JobTemplate( application = DaVinci() ) dir= "/home/hep/uoh35620/stuff/batch-generate/data/up/all/" t.application.optsfile = [dir+"DaVinciAndCandidatesUp.py", dir+"6_7_2010_up.direct.py"] t.splitter = SplitByFiles() t.splitter.filesPerJob = 2 t.backend = PBS() t.backend.queue = "prod" t.merger= RootMerger() t.merger.files = ['Bu2DStarplusX.root'] j = Job(t) j.submit()you can use the qstat commands mentioned before to check that it has worked. also you will be able to look at the value of j.status and j.subjobs to see the progress of the job as a whole and also the subjobs. MC DataCreating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need
SetupProject DaVinci gaudirun.py DaVinci.py data.py | tee output.txtThe output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate. List of FilesBelow are all the files mentioned on this wiki
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
How to run DaVinci at RALCreating an ntupleReal Datato create an ntuple with real data you need
| ||||||||
Added: | ||||||||
> > | Data Available at RALThere are several sets of data at RAL MC with dipole up, MC with dipole down, low mu (number of interactions per event) data from before the St. Petersberg conference with both dipole up and down. The data is from the Bhadron B->Dpi stripping stream while the MC data is full of signal (B->Dpi) events.data_2010_down -> /opt/ppd/lhcb/nraja/MC2010/6_7_2010_down data_2010_up -> /opt/ppd/lhcb/nraja/MC2010/6_7_2010_up MC_2010_down -> /opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST MC_2010_up -> /opt/ppd/lhcb/nraja/MC2010/MC_2010_429376_DSTSince the Bhadron stripping line is starting to be available for current data stripping slowly there is more data appearing in book keeping, so you can run jobs on the grid Running over the data | |||||||
My python file pointing to my data looks like:
from Gaudi.Configuration import * EventSelector().Input = [ # for i in /opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/* ; do echo "\" DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done " DATAFILE='pfn:/opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/00006639_00000001_1.dst' TYP='POOL_ROOTTREE' OPT='READ'", " DATAFILE='pfn:/opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/00006639_00000002_1.dst' TYP='POOL_ROOTTREE' OPT='READ'", # ... ] | ||||||||
Changed: | ||||||||
< < | and can be mostly generated with the for command commented in the file. | |||||||
> > | and can be mostly generated with the for command commented in the file and the directory changed for any of the ones listed in the above section. | |||||||
Deleted: | ||||||||
< < | Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed. | |||||||
the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name.
Several things will need to be regularly changed in this file for your specific needs.
. /opt/ppd/lhcb/lhcb/scripts/lhcbsetup.shAfter restarting your shell all this can be run with the following commands. SetupProject DaVinci gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txtthe output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below. | ||||||||
Added: | ||||||||
> > | Several warnings, README!SymlinksBe careful when creating the file pointing to the data as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.OR mode sequencersWhen creating the DecayTreeTuple in the DaVinci python file it has been found that creating the ntuple using a GaudiSequencer in OR mode with another ntuple (see example below) seems to randomly throw away events and not record all the ones it should.// example of what not to do BachelorOrKstarDecayTreeTuple = GaudiSequencer("BachelorOrKstarDecayTreeTuple") BachelorOrKstarDecayTreeTuple.ModeOR = 1 BachelorOrKstarDecayTreeTuple.Members = [BachelorDecayTreeTuple,KstarDecayTreeTuple] | |||||||
Running on ppd batchyou can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.#!/bin/bash stda="`date`" echo $stda . /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt echo echo $stda date echo echo Done.The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script. When using this script there are some things you settings you might want to change:
qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job. MonitoringThe following command will list your currently running jobs. you will need to replace uoh35620 with your user name.qstat -u uoh35620For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files. watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command. alias qstatme="qstat -u uoh35620" PBS Batch ErrorsI have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk Job Name: script Exec host: heplnc308.pp.rl.ac.uk/0 An error has occurred processing your job, see below. Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0 Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/ >>> error from copy /bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory >>> end error output Splitting the job with ganga on the ppd batchUsing ganga to run DaVinci can be advantageous since it is able to run it on CERN batch, PPD batch, the grid or locally very easily also it can split the job into many pieces for faster processing. Ganga likes to control the input and output files of the various jobs and sub jobs so it is best not to use absolute output paths for the ntuples otherwise all the sub jobs overwrite one another's root files, so by changing the NTupleSvc().Output property in the DaVinci python file to something like [ "FILE1 DATAFILE=B2Dpi_Up_6_7_2010.split.direct.root' TYP='ROOT' OPT='NEW'" ], this can be prevented. Fire up ganga withSetupProject Ganga gangathen in ganga create a job template with the correct options like so, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are differnt to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data. Additionally the name of the file to merge from the subjobs will need to be changed to the one specified in the DaVinci python file. The name option which would be useful to keep track of the submitted jobs does not appear to work with the PBS system as ganga trys to pass qsub -N name option which qsub does not like so it wont submit the job when you set a name. t = JobTemplate( application = DaVinci() ) dir= "/home/hep/uoh35620/stuff/batch-generate/data/up/all/" t.application.optsfile = [dir+"DaVinciAndCandidatesUp.py", dir+"6_7_2010_up.direct.py"] t.splitter = SplitByFiles() t.splitter.filesPerJob = 2 t.backend = PBS() t.backend.queue = "prod" t.merger= RootMerger() t.merger.files = ['Bu2DStarplusX.root'] j = Job(t) j.submit()you can use the qstat commands mentioned before to check that it has worked. also you will be able to look at the value of j.status and j.subjobs to see the progress of the job as a whole and also the subjobs. MC DataCreating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need
SetupProject DaVinci gaudirun.py DaVinci.py data.py | tee output.txtThe output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate. List of FilesBelow are all the files mentioned on this wiki
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
How to run DaVinci at RALCreating an ntupleReal Datato create an ntuple with real data you need
from Gaudi.Configuration import * EventSelector().Input = [ | ||||||||
Changed: | ||||||||
< < | # for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\" DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done " DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'", " DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'", | |||||||
> > | # for i in /opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/* ; do echo "\" DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done " DATAFILE='pfn:/opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/00006639_00000001_1.dst' TYP='POOL_ROOTTREE' OPT='READ'", " DATAFILE='pfn:/opt/ppd/lhcb/nraja/MC2010/MC_2010_428935_DST/00006639_00000002_1.dst' TYP='POOL_ROOTTREE' OPT='READ'", | |||||||
# ...
]
and can be mostly generated with the for command commented in the file. Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed. the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name. Several things will need to be regularly changed in this file for your specific needs.
| ||||||||
Changed: | ||||||||
< < | Then all this can be run with the following command. | |||||||
> > | It is also worth noting that recently a new version of DaVinci seems to have come out where DV.Lumi is set to true by default, which it was not previously, this has the effect of putting all your data in lumi.root whether or not you want it there, since I don't know the point of this variable, I set it to false. Check the DaVinci.py file for this if not add it. | |||||||
Added: | ||||||||
> > |
Before running this you need to setup your environment to find the LHCb programs like SetupProject, to do this add the following line to your .bashrc and restart your shell. Be careful, the dot at the start of the line is very important.
. /opt/ppd/lhcb/lhcb/scripts/lhcbsetup.shAfter restarting your shell all this can be run with the following commands. | |||||||
SetupProject DaVinci gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txtthe output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below. Running on ppd batchyou can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.#!/bin/bash stda="`date`" echo $stda . /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt echo echo $stda date echo echo Done.The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script. When using this script there are some things you settings you might want to change:
qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job. MonitoringThe following command will list your currently running jobs. you will need to replace uoh35620 with your user name.qstat -u uoh35620For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files. watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command. alias qstatme="qstat -u uoh35620" PBS Batch ErrorsI have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk Job Name: script Exec host: heplnc308.pp.rl.ac.uk/0 An error has occurred processing your job, see below. Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0 Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/ >>> error from copy /bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory >>> end error output Splitting the job with ganga on the ppd batchUsing ganga to run DaVinci can be advantageous since it is able to run it on CERN batch, PPD batch, the grid or locally very easily also it can split the job into many pieces for faster processing. Ganga likes to control the input and output files of the various jobs and sub jobs so it is best not to use absolute output paths for the ntuples otherwise all the sub jobs overwrite one another's root files, so by changing the NTupleSvc().Output property in the DaVinci python file to something like [ "FILE1 DATAFILE=B2Dpi_Up_6_7_2010.split.direct.root' TYP='ROOT' OPT='NEW'" ], this can be prevented. Fire up ganga withSetupProject Ganga gangathen in ganga create a job template with the correct options like so, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are differnt to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data. Additionally the name of the file to merge from the subjobs will need to be changed to the one specified in the DaVinci python file. The name option which would be useful to keep track of the submitted jobs does not appear to work with the PBS system as ganga trys to pass qsub -N name option which qsub does not like so it wont submit the job when you set a name. t = JobTemplate( application = DaVinci() ) dir= "/home/hep/uoh35620/stuff/batch-generate/data/up/all/" t.application.optsfile = [dir+"DaVinciAndCandidatesUp.py", dir+"6_7_2010_up.direct.py"] t.splitter = SplitByFiles() t.splitter.filesPerJob = 2 t.backend = PBS() t.backend.queue = "prod" t.merger= RootMerger() t.merger.files = ['Bu2DStarplusX.root'] j = Job(t) j.submit()you can use the qstat commands mentioned before to check that it has worked. also you will be able to look at the value of j.status and j.subjobs to see the progress of the job as a whole and also the subjobs. MC DataCreating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need
SetupProject DaVinci gaudirun.py DaVinci.py data.py | tee output.txtThe output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate. List of FilesBelow are all the files mentioned on this wiki
| ||||||||
Deleted: | ||||||||
< < | ||||||||
-- ThomasBird - 2010-07-26
|
How to run DaVinci at RALCreating an ntupleReal Datato create an ntuple with real data you need
from Gaudi.Configuration import * EventSelector().Input = [ # for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\" DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done " DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'", " DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'", # ... ] and can be mostly generated with the for command commented in the file. Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed. the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name. Several things will need to be regularly changed in this file for your specific needs.
SetupProject DaVinci gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txtthe output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below. Running on ppd batchyou can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.#!/bin/bash stda="`date`" echo $stda . /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt echo echo $stda date echo echo Done.The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script. When using this script there are some things you settings you might want to change:
qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job. MonitoringThe following command will list your currently running jobs. you will need to replace uoh35620 with your user name.qstat -u uoh35620For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files. watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command. alias qstatme="qstat -u uoh35620" PBS Batch ErrorsI have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk Job Name: script Exec host: heplnc308.pp.rl.ac.uk/0 An error has occurred processing your job, see below. Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0 Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/ >>> error from copy /bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory >>> end error output Splitting the job with ganga on the ppd batchUsing ganga to run DaVinci can be advantageous since it is able to run it on CERN batch, PPD batch, the grid or locally very easily also it can split the job into many pieces for faster processing. Ganga likes to control the input and output files of the various jobs and sub jobs so it is best not to use absolute output paths for the ntuples otherwise all the sub jobs overwrite one another's root files, so by changing the NTupleSvc().Output property in the DaVinci python file to something like [ "FILE1 DATAFILE=B2Dpi_Up_6_7_2010.split.direct.root' TYP='ROOT' OPT='NEW'" ], this can be prevented. Fire up ganga withSetupProject Ganga gangathen in ganga create a job template with the correct options like so, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are differnt to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data. Additionally the name of the file to merge from the subjobs will need to be changed to the one specified in the DaVinci python file. The name option which would be useful to keep track of the submitted jobs does not appear to work with the PBS system as ganga trys to pass qsub -N name option which qsub does not like so it wont submit the job when you set a name. t = JobTemplate( application = DaVinci() ) dir= "/home/hep/uoh35620/stuff/batch-generate/data/up/all/" t.application.optsfile = [dir+"DaVinciAndCandidatesUp.py", dir+"6_7_2010_up.direct.py"] t.splitter = SplitByFiles() t.splitter.filesPerJob = 2 t.backend = PBS() t.backend.queue = "prod" t.merger= RootMerger() t.merger.files = ['Bu2DStarplusX.root'] j = Job(t) j.submit()you can use the qstat commands mentioned before to check that it has worked. also you will be able to look at the value of j.status and j.subjobs to see the progress of the job as a whole and also the subjobs. MC DataCreating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need
SetupProject DaVinci gaudirun.py DaVinci.py data.py | tee output.txtThe output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate. List of FilesBelow are all the files mentioned on this wiki
|