Data Analysis
Creating an ntuple
Real Data
to create an ntuple with real data you need
- the DST files you wish to run on
- a python file pointing to the data
- the davinci python file
My python file pointing to my data looks like:
from Gaudi.Configuration import *
EventSelector().Input = [
# for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\" DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done
" DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
" DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
# ...
]
and can be mostly generated with the for command commented in the file.
Be careful when creating this file as
symlinks have caused me issues and prevented
DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in
0 events processed.
the davinci file needed to create the ntuple can be found here:
DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name.
Then all this can be run with the following command.
SetupProject DaVinci
gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txt
the output will be copied to the output.txt file.
Running on ppd batch
you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.
#!/bin/bash
stda="`date`"
echo $stda
. /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci
dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all
gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt
echo
echo $stda
date
echo
echo Done.
The dot at the beginning of the line ending in
SetupProject.sh
DaVinci is very important. If this is not there the paths will not be set up properly for your script.
Placing this in a file called say script and running the following will make it run on the batch system.
qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script
-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job.
Monitoring
The following command will list your currently running jobs. you will need to replace uoh35620 with your user name.
qstat -u uoh35620
For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files.
watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'
you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command.
alias qstatme="qstat -u uoh35620"
Errors
I have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.
PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk
Job Name: script
Exec host: heplnc308.pp.rl.ac.uk/0
An error has occurred processing your job, see below.
Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0
Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/
>>> error from copy
/bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory
>>> end error output
MC Data
Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need
- the DST files you wish to run on
- a python file pointing to the data
- the davinci python file
- the stripping python file
The stripping python file can be found here
StrippingB2DPiLoose.py.txt, again you probably want to change the file extension.
The python file pointing to the MC data is the same format as before just with different filenames, see the Real Data section.
the
DaVinci file that you need is this one here
DaVinci.py.txt.
Then this can again be run like the real data, with the following command.
SetupProject DaVinci
gaudirun.py DaVinci.py data.py | tee output.txt
The output will be copied to output.txt. To run it on the ppd batch system use the same method and scrips as before but cahge the filenames where appropriate.
Analysing an ntuple
List of Files
Below are all the files mentioned on this wiki
DaVinci.py.txt