Difference: ThomasBirdSandbox (1 vs. 8)

Revision 82010-07-26 - ThomasBird

 
META TOPICPARENT name="Main.ThomasBird"

Data Analysis

Creating an ntuple

Real Data

to create an ntuple with real data you need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file

My python file pointing to my data looks like:

from Gaudi.Configuration import *

EventSelector().Input   = [
#  for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\"   DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
# ...
]


and can be mostly generated with the for command commented in the file.

Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.

the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name.

Several things will need to be regularly changed in this file for your specific needs.

  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]

Then all this can be run with the following command.

SetupProject DaVinci
gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txt

the output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below.

Running on ppd batch

you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.

#!/bin/bash

stda="`date`"
echo $stda

. /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci

dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all

gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py  | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt

echo
echo $stda
date
echo
echo Done.

The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script.

When using this script there are some things you settings you might want to change:

  • The dir variable to the correct directory where the scripts are.
  • Make sure the filenames in the gaudirun command are correct and pointing to data that does not involve symlinks

Placing this in a file called say script and running the following will make it run on the batch system.

qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script

-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job.

Monitoring

The following command will list your currently running jobs. you will need to replace uoh35620 with your user name.

qstat -u uoh35620

For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files.

watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'

you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command.

alias qstatme="qstat -u uoh35620"

PBS Batch Errors

I have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.

PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk
Job Name:   script
Exec host:  heplnc308.pp.rl.ac.uk/0
An error has occurred processing your job, see below.
Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0

Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/
>>> error from copy
/bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory
>>> end error output

Splitting the job with ganga on the ppd batch

Using ganga to run DaVinci can be advantageous since it is able to run it on CERN batch, PPD batch, the grid or locally very easily also it can split the job into many pieces for faster processing.

Ganga likes to control the input and output files of the various jobs and sub jobs so it is best not to use absolute output paths for the ntuples otherwise all the sub jobs overwrite one another's root files, so by changing the NTupleSvc().Output property in the DaVinci python file to something like [ "FILE1 DATAFILE=B2Dpi_Up_6_7_2010.split.direct.root' TYP='ROOT' OPT='NEW'" ], this can be prevented.

Fire up ganga with

SetupProject Ganga
ganga

then in ganga create a job template with the correct options like so, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are differnt to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data. Additionally the name of the file to merge from the subjobs will need to be changed to the one specified in the DaVinci python file. The name option which would be useful to keep track of the submitted jobs does not appear to work with the PBS system as ganga trys to pass qsub -N name option which qsub does not like so it wont submit the job when you set a name.

t = JobTemplate( application = DaVinci() )

dir= "/home/hep/uoh35620/stuff/batch-generate/data/up/all/"
t.application.optsfile = [dir+"DaVinciAndCandidatesUp.py", dir+"6_7_2010_up.direct.py"]

t.splitter = SplitByFiles()
t.splitter.filesPerJob = 2

t.backend = PBS()
t.backend.queue = "prod"

t.merger= RootMerger()
t.merger.files = ['Bu2DStarplusX.root']

j = Job(t)
j.submit()

you can use the qstat commands mentioned before to check that it has worked. also you will be able to look at the value of j.status and j.subjobs to see the progress of the job as a whole and also the subjobs.

MC Data

Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file
  • the stripping python file

The stripping python file can be found here StrippingB2DPiLoose.py.txt, again you probably want to change the file extension.

The python file pointing to the MC data is the same format as before just with different filenames, see the Real Data section.

the DaVinci file that you need is this one here DaVinci.py.txt.

Again you might want to change a few things to make this work how you want it to.

  • The path to StrippingB2DPiLoose.py will need to be changed.
  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]

Then this can again be run like the real data, with the following command.

SetupProject DaVinci
gaudirun.py DaVinci.py data.py | tee output.txt

The output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate.

Analysing an ntuple

List of Files

Below are all the files mentioned on this wiki

-- ThomasBird - 2010-07-20

Changed:
<
<
META FILEATTACHMENT attachment="StrippingB2DPiLoose.py.txt" attr="" comment="File needed to strip MC data" date="1279621911" name="StrippingB2DPiLoose.py.txt" path="StrippingB2DPiLoose.py" size="9349" stream="StrippingB2DPiLoose.py" tmpFilename="/usr/tmp/CGItemp19331" user="ThomasBird" version="1"
>
>
Added:
>
>
META FILEATTACHMENT attachment="StrippingB2DPiLoose.py.txt" attr="" comment="File needed to strip MC data" date="1280136296" name="StrippingB2DPiLoose.py.txt" path="StrippingB2DPiLoose.mod.py" size="9349" stream="StrippingB2DPiLoose.mod.py" tmpFilename="/usr/tmp/CGItemp23848" user="ThomasBird" version="2"
 
META FILEATTACHMENT attachment="DaVinciAndCandidatesUp.py.txt" attr="" comment="File to generate ntuple for actual data" date="1279622034" name="DaVinciAndCandidatesUp.py.txt" path="DaVinciAndCandidatesUp.py" size="6554" stream="DaVinciAndCandidatesUp.py" tmpFilename="/usr/tmp/CGItemp19303" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinci.py.txt" attr="" comment="DaVinci file to run the stripping of MC data" date="1279622448" name="DaVinci.py.txt" path="DaVinci.py" size="3306" stream="DaVinci.py" tmpFilename="/usr/tmp/CGItemp19265" user="ThomasBird" version="1"

Revision 72010-07-23 - ThomasBird

 
META TOPICPARENT name="Main.ThomasBird"

Data Analysis

Creating an ntuple

Real Data

to create an ntuple with real data you need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file

My python file pointing to my data looks like:

from Gaudi.Configuration import *

EventSelector().Input   = [
#  for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\"   DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
# ...
]


and can be mostly generated with the for command commented in the file.

Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.

the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name.

Several things will need to be regularly changed in this file for your specific needs.

  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]

Then all this can be run with the following command.

SetupProject DaVinci
gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txt

the output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below.

Running on ppd batch

you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.

#!/bin/bash

stda="`date`"
echo $stda

. /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci

dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all

gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py  | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt

echo
echo $stda
date
echo
echo Done.

The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script.

When using this script there are some things you settings you might want to change:

  • The dir variable to the correct directory where the scripts are.
  • Make sure the filenames in the gaudirun command are correct and pointing to data that does not involve symlinks

Placing this in a file called say script and running the following will make it run on the batch system.

qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script

-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job.

Monitoring

The following command will list your currently running jobs. you will need to replace uoh35620 with your user name.

qstat -u uoh35620

For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files.

watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'

you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command.

alias qstatme="qstat -u uoh35620"

PBS Batch Errors

I have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.

PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk
Job Name:   script
Exec host:  heplnc308.pp.rl.ac.uk/0
An error has occurred processing your job, see below.
Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0

Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/
>>> error from copy
/bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory
>>> end error output

Splitting the job with ganga on the ppd batch

Using ganga to run DaVinci can be advantageous since it is able to run it on CERN batch, PPD batch, the grid or locally very easily also it can split the job into many pieces for faster processing.

Ganga likes to control the input and output files of the various jobs and sub jobs so it is best not to use absolute output paths for the ntuples otherwise all the sub jobs overwrite one another's root files, so by changing the NTupleSvc().Output property in the DaVinci python file to something like [ "FILE1 DATAFILE=B2Dpi_Up_6_7_2010.split.direct.root' TYP='ROOT' OPT='NEW'" ], this can be prevented.

Fire up ganga with

SetupProject Ganga
ganga
Changed:
<
<
then in ganga create a job template with the correct options like so, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are differnt to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data.
>
>
then in ganga create a job template with the correct options like so, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are differnt to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data. Additionally the name of the file to merge from the subjobs will need to be changed to the one specified in the DaVinci python file. The name option which would be useful to keep track of the submitted jobs does not appear to work with the PBS system as ganga trys to pass qsub -N name option which qsub does not like so it wont submit the job when you set a name.
 
t = JobTemplate( application = DaVinci() )

dir= "/home/hep/uoh35620/stuff/batch-generate/data/up/all/"
t.application.optsfile = [dir+"DaVinciAndCandidatesUp.py", dir+"6_7_2010_up.direct.py"]

t.splitter = SplitByFiles()
Changed:
<
<
t.splitter.filesPerJob = 5
>
>
t.splitter.filesPerJob = 2
  t.backend = PBS() t.backend.queue = "prod"
Added:
>
>
t.merger= RootMerger() t.merger.files = ['Bu2DStarplusX.root']
  j = Job(t) j.submit()

you can use the qstat commands mentioned before to check that it has worked. also you will be able to look at the value of j.status and j.subjobs to see the progress of the job as a whole and also the subjobs.

MC Data

Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file
  • the stripping python file

The stripping python file can be found here StrippingB2DPiLoose.py.txt, again you probably want to change the file extension.

The python file pointing to the MC data is the same format as before just with different filenames, see the Real Data section.

the DaVinci file that you need is this one here DaVinci.py.txt.

Again you might want to change a few things to make this work how you want it to.

  • The path to StrippingB2DPiLoose.py will need to be changed.
  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]

Then this can again be run like the real data, with the following command.

SetupProject DaVinci
gaudirun.py DaVinci.py data.py | tee output.txt

The output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate.

Analysing an ntuple

List of Files

Below are all the files mentioned on this wiki

-- ThomasBird - 2010-07-20

META FILEATTACHMENT attachment="StrippingB2DPiLoose.py.txt" attr="" comment="File needed to strip MC data" date="1279621911" name="StrippingB2DPiLoose.py.txt" path="StrippingB2DPiLoose.py" size="9349" stream="StrippingB2DPiLoose.py" tmpFilename="/usr/tmp/CGItemp19331" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinciAndCandidatesUp.py.txt" attr="" comment="File to generate ntuple for actual data" date="1279622034" name="DaVinciAndCandidatesUp.py.txt" path="DaVinciAndCandidatesUp.py" size="6554" stream="DaVinciAndCandidatesUp.py" tmpFilename="/usr/tmp/CGItemp19303" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinci.py.txt" attr="" comment="DaVinci file to run the stripping of MC data" date="1279622448" name="DaVinci.py.txt" path="DaVinci.py" size="3306" stream="DaVinci.py" tmpFilename="/usr/tmp/CGItemp19265" user="ThomasBird" version="1"

Revision 62010-07-22 - ThomasBird

 
META TOPICPARENT name="Main.ThomasBird"

Data Analysis

Creating an ntuple

Real Data

to create an ntuple with real data you need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file

My python file pointing to my data looks like:

from Gaudi.Configuration import *

EventSelector().Input   = [
#  for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\"   DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
# ...
]


and can be mostly generated with the for command commented in the file.

Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.

the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name.

Several things will need to be regularly changed in this file for your specific needs.

  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]

Then all this can be run with the following command.

SetupProject DaVinci
gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txt

the output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below.

Running on ppd batch

you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.

#!/bin/bash

stda="`date`"
echo $stda

. /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci

dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all

gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py  | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt

echo
echo $stda
date
echo
echo Done.

The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script.

When using this script there are some things you settings you might want to change:

  • The dir variable to the correct directory where the scripts are.
  • Make sure the filenames in the gaudirun command are correct and pointing to data that does not involve symlinks

Placing this in a file called say script and running the following will make it run on the batch system.

qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script

-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job.

Monitoring

The following command will list your currently running jobs. you will need to replace uoh35620 with your user name.

qstat -u uoh35620

For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files.

watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'

you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command.

alias qstatme="qstat -u uoh35620"

PBS Batch Errors

I have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.

PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk
Job Name:   script
Exec host:  heplnc308.pp.rl.ac.uk/0
An error has occurred processing your job, see below.
Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0

Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/
>>> error from copy
/bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory
>>> end error output

Splitting the job with ganga on the ppd batch
Changed:
<
<
Using ganga to run DaVinci can be advantageous since it is able to run it on cern batch, ppd batch, the grid or locally very easily also it can split the job into many pieces for faster processing.
>
>
Using ganga to run DaVinci can be advantageous since it is able to run it on CERN batch, PPD batch, the grid or locally very easily also it can split the job into many pieces for faster processing.
 
Changed:
<
<
First fire up ganga with
>
>
Ganga likes to control the input and output files of the various jobs and sub jobs so it is best not to use absolute output paths for the ntuples otherwise all the sub jobs overwrite one another's root files, so by changing the NTupleSvc().Output property in the DaVinci python file to something like [ "FILE1 DATAFILE=B2Dpi_Up_6_7_2010.split.direct.root' TYP='ROOT' OPT='NEW'" ], this can be prevented.
Added:
>
>
Fire up ganga with
 
SetupProject Ganga
ganga

then in ganga create a job template with the correct options like so, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are differnt to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data.

t = JobTemplate( application = DaVinci() )

dir= "/home/hep/uoh35620/stuff/batch-generate/data/up/all/"
t.application.optsfile = [dir+"DaVinciAndCandidatesUp.py", dir+"6_7_2010_up.direct.py"]

t.splitter = SplitByFiles()
t.splitter.filesPerJob = 5

t.backend = PBS()
t.backend.queue = "prod"

j = Job(t)
j.submit()

you can use the qstat commands mentioned before to check that it has worked. also you will be able to look at the value of j.status and j.subjobs to see the progress of the job as a whole and also the subjobs.

MC Data

Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file
  • the stripping python file

The stripping python file can be found here StrippingB2DPiLoose.py.txt, again you probably want to change the file extension.

The python file pointing to the MC data is the same format as before just with different filenames, see the Real Data section.

the DaVinci file that you need is this one here DaVinci.py.txt.

Again you might want to change a few things to make this work how you want it to.

  • The path to StrippingB2DPiLoose.py will need to be changed.
  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]

Then this can again be run like the real data, with the following command.

SetupProject DaVinci
gaudirun.py DaVinci.py data.py | tee output.txt

The output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate.

Analysing an ntuple

List of Files

Below are all the files mentioned on this wiki

-- ThomasBird - 2010-07-20

META FILEATTACHMENT attachment="StrippingB2DPiLoose.py.txt" attr="" comment="File needed to strip MC data" date="1279621911" name="StrippingB2DPiLoose.py.txt" path="StrippingB2DPiLoose.py" size="9349" stream="StrippingB2DPiLoose.py" tmpFilename="/usr/tmp/CGItemp19331" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinciAndCandidatesUp.py.txt" attr="" comment="File to generate ntuple for actual data" date="1279622034" name="DaVinciAndCandidatesUp.py.txt" path="DaVinciAndCandidatesUp.py" size="6554" stream="DaVinciAndCandidatesUp.py" tmpFilename="/usr/tmp/CGItemp19303" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinci.py.txt" attr="" comment="DaVinci file to run the stripping of MC data" date="1279622448" name="DaVinci.py.txt" path="DaVinci.py" size="3306" stream="DaVinci.py" tmpFilename="/usr/tmp/CGItemp19265" user="ThomasBird" version="1"

Revision 52010-07-21 - ThomasBird

 
META TOPICPARENT name="Main.ThomasBird"

Data Analysis

Creating an ntuple

Real Data

to create an ntuple with real data you need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file

My python file pointing to my data looks like:

from Gaudi.Configuration import *

EventSelector().Input   = [
#  for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\"   DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
# ...
]


and can be mostly generated with the for command commented in the file.

Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.

the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name.

Several things will need to be regularly changed in this file for your specific needs.

  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]

Then all this can be run with the following command.

SetupProject DaVinci
gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txt

the output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below.

Running on ppd batch

you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.

#!/bin/bash

stda="`date`"
echo $stda

. /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci

dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all

gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py  | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt

echo
echo $stda
date
echo
echo Done.

The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script.

When using this script there are some things you settings you might want to change:

  • The dir variable to the correct directory where the scripts are.
  • Make sure the filenames in the gaudirun command are correct and pointing to data that does not involve symlinks

Placing this in a file called say script and running the following will make it run on the batch system.

qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script

-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job.

Monitoring

The following command will list your currently running jobs. you will need to replace uoh35620 with your user name.

qstat -u uoh35620

For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files.

watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'

you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command.

alias qstatme="qstat -u uoh35620"
Changed:
<
<
Errors
>
>
PBS Batch Errors
  I have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.

PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk
Job Name:   script
Exec host:  heplnc308.pp.rl.ac.uk/0
An error has occurred processing your job, see below.
Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0

Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/
>>> error from copy
/bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory
>>> end error output
Added:
>
>
Splitting the job with ganga on the ppd batch

Using ganga to run DaVinci can be advantageous since it is able to run it on cern batch, ppd batch, the grid or locally very easily also it can split the job into many pieces for faster processing.

First fire up ganga with

SetupProject Ganga
ganga

then in ganga create a job template with the correct options like so, convert it into a job and submit it. You will want to change the directory where the python files are stored and also the filenames if the are differnt to the ones I have used. "6_7_2010_up.direct.py" is the name of my python file pointing to my data.

t = JobTemplate( application = DaVinci() )

dir= "/home/hep/uoh35620/stuff/batch-generate/data/up/all/"
t.application.optsfile = [dir+"DaVinciAndCandidatesUp.py", dir+"6_7_2010_up.direct.py"]

t.splitter = SplitByFiles()
t.splitter.filesPerJob = 5

t.backend = PBS()
t.backend.queue = "prod"

j = Job(t)
j.submit()

you can use the qstat commands mentioned before to check that it has worked. also you will be able to look at the value of j.status and j.subjobs to see the progress of the job as a whole and also the subjobs.

 

MC Data

Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file
  • the stripping python file

The stripping python file can be found here StrippingB2DPiLoose.py.txt, again you probably want to change the file extension.

The python file pointing to the MC data is the same format as before just with different filenames, see the Real Data section.

the DaVinci file that you need is this one here DaVinci.py.txt.

Again you might want to change a few things to make this work how you want it to.

  • The path to StrippingB2DPiLoose.py will need to be changed.
  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]

Then this can again be run like the real data, with the following command.

SetupProject DaVinci
gaudirun.py DaVinci.py data.py | tee output.txt

The output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate.

Analysing an ntuple

List of Files

Below are all the files mentioned on this wiki

-- ThomasBird - 2010-07-20

META FILEATTACHMENT attachment="StrippingB2DPiLoose.py.txt" attr="" comment="File needed to strip MC data" date="1279621911" name="StrippingB2DPiLoose.py.txt" path="StrippingB2DPiLoose.py" size="9349" stream="StrippingB2DPiLoose.py" tmpFilename="/usr/tmp/CGItemp19331" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinciAndCandidatesUp.py.txt" attr="" comment="File to generate ntuple for actual data" date="1279622034" name="DaVinciAndCandidatesUp.py.txt" path="DaVinciAndCandidatesUp.py" size="6554" stream="DaVinciAndCandidatesUp.py" tmpFilename="/usr/tmp/CGItemp19303" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinci.py.txt" attr="" comment="DaVinci file to run the stripping of MC data" date="1279622448" name="DaVinci.py.txt" path="DaVinci.py" size="3306" stream="DaVinci.py" tmpFilename="/usr/tmp/CGItemp19265" user="ThomasBird" version="1"

Revision 42010-07-20 - ThomasBird

 
META TOPICPARENT name="Main.ThomasBird"
Changed:
<
<

Data Analysis

>
>

Data Analysis

 
Added:
>
>
 

Creating an ntuple

Real Data

to create an ntuple with real data you need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file

My python file pointing to my data looks like:

from Gaudi.Configuration import *

EventSelector().Input   = [
#  for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\"   DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
# ...
]


and can be mostly generated with the for command commented in the file.

Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.

the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name.

Added:
>
>
Several things will need to be regularly changed in this file for your specific needs.
  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]
 Then all this can be run with the following command.
SetupProject DaVinci
gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txt
Changed:
<
<
the output will be copied to the output.txt file.
>
>
the output will be copied to stdout and a file called output.txt. This will create a .root file which contains the ntuple and is readable with the macros mentioned in the analysis section below.
 

Running on ppd batch

you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.

#!/bin/bash

stda="`date`"
echo $stda

. /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci

dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all

gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py  | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt

echo
echo $stda
date
echo
echo Done.

The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script.

Added:
>
>
When using this script there are some things you settings you might want to change:
  • The dir variable to the correct directory where the scripts are.
  • Make sure the filenames in the gaudirun command are correct and pointing to data that does not involve symlinks
 Placing this in a file called say script and running the following will make it run on the batch system.

qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script

-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job.

Monitoring

The following command will list your currently running jobs. you will need to replace uoh35620 with your user name.

qstat -u uoh35620

For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files.

watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'

you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command.

alias qstatme="qstat -u uoh35620"

Errors

I have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.

PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk
Job Name:   script
Exec host:  heplnc308.pp.rl.ac.uk/0
An error has occurred processing your job, see below.
Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0

Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/
>>> error from copy
/bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory
>>> end error output

MC Data

Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file
  • the stripping python file

The stripping python file can be found here StrippingB2DPiLoose.py.txt, again you probably want to change the file extension.

The python file pointing to the MC data is the same format as before just with different filenames, see the Real Data section.

the DaVinci file that you need is this one here DaVinci.py.txt.

Added:
>
>
Again you might want to change a few things to make this work how you want it to.
  • The path to StrippingB2DPiLoose.py will need to be changed.
  • The number of events to analyse, EvtMax which takes a value of -1 for all or a number>0 for that number of events.
  • If the input data is simulated or not change DV.Simulation to True for simulated
  • The place where the final root ntuple is saved NTupleSvc().Output I usually have it pointing to something like [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.direct.root' TYP='ROOT' OPT='NEW'" ]
 Then this can again be run like the real data, with the following command.
SetupProject DaVinci
gaudirun.py DaVinci.py data.py | tee output.txt
Changed:
<
<
The output will be copied to output.txt. To run it on the ppd batch system use the same method and scrips as before but cahge the filenames where appropriate.
>
>
The output will be copied to output.txt. To run it on the ppd batch system use the same method and scripts as before but cahge the filenames where appropriate.
 

Analysing an ntuple

List of Files

Below are all the files mentioned on this wiki

Changed:
<
<
>
>
-- ThomasBird - 2010-07-20
Deleted:
<
<
DaVinci.py.txt
 
META FILEATTACHMENT attachment="StrippingB2DPiLoose.py.txt" attr="" comment="File needed to strip MC data" date="1279621911" name="StrippingB2DPiLoose.py.txt" path="StrippingB2DPiLoose.py" size="9349" stream="StrippingB2DPiLoose.py" tmpFilename="/usr/tmp/CGItemp19331" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinciAndCandidatesUp.py.txt" attr="" comment="File to generate ntuple for actual data" date="1279622034" name="DaVinciAndCandidatesUp.py.txt" path="DaVinciAndCandidatesUp.py" size="6554" stream="DaVinciAndCandidatesUp.py" tmpFilename="/usr/tmp/CGItemp19303" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinci.py.txt" attr="" comment="DaVinci file to run the stripping of MC data" date="1279622448" name="DaVinci.py.txt" path="DaVinci.py" size="3306" stream="DaVinci.py" tmpFilename="/usr/tmp/CGItemp19265" user="ThomasBird" version="1"

Revision 32010-07-20 - ThomasBird

 
META TOPICPARENT name="Main.ThomasBird"
Deleted:
<
<
-- ThomasBird - 2010-07-19
 

Data Analysis

Creating an ntuple

Real Data

to create an ntuple with real data you need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file

My python file pointing to my data looks like:

from Gaudi.Configuration import *

EventSelector().Input   = [
#  for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\"   DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
# ...
]


and can be mostly generated with the for command commented in the file.

Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.

Changed:
<
<
the davinci file needed to create the ntuple is as follows
>
>
the davinci file needed to create the ntuple can be found here: DaVinciAndCandidatesUp.py.txt, you might want to remove the .txt from the file name.
Deleted:
<
<
import GaudiPython
from Gaudi.Configuration import *
from GaudiPython.Bindings import gbl, AppMgr, Helper
from Configurables import GaudiSequencer, HltSelReportsMaker, OutputStream, DaVinci
from Configurables import PrintDecayTree
from os import environ
from Configurables import DecayTreeTuple, EventTuple
from Configurables import CombineParticles, FilterDesktop
from Configurables import TupleToolMCBackgroundInfo, TupleToolMCTruth, TupleToolTrigger, TupleToolTISTOS, TupleToolDecay, BackgroundCategory
from StrippingConf.StrippingLine import StrippingLine
 
Deleted:
<
<

DV = DaVinci() EvtMax = -1 PrintFreq = 100 SkipEvents = 0 DataType = "2010" DV.Simulation = True

DV.L0 = False #want this to be False for data; True will overwrite decisions. the same applies to the next few settings DV.Hlt = False ReplaceL0BanksWithEmulated = False #reruns the L0. need more info #DV.HltThresholdSettings = 'Physics_25Vis_25L0_2Hlt1_2Hlt2_May10' #only makes sense if re-running trigger

ApplicationMgr().ExtSvc += [ "NTupleSvc" ] NTupleSvc().Output = [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.root' TYP='ROOT' OPT='NEW'" ] ().Output = [ "FILE1 DATAFILE='castor:/castor/cern.ch/user/j/jnardull/RealDataRootFiles/B2DX_UP_FromHadron_6_7_2010.root' TYP='ROOT' OPT='NEW'" ]

from Configurables import CondDB CondDB().IgnoreHeartBeat = True

#stripping stream location location = '/Event/Bhadron/Phys/B2DXStrippingSelChi2Loose'

from PhysSelPython.Wrappers import AutomaticData, Selection, SelectionSequence # Get the Candidates from the DST. # Treat particles already on the DST as data-on-demand, but give the full path. B2DXSel = AutomaticData(Location = location) # Filter the Candidate. Replace 'ALL' by some cuts.

B2DXFilter = FilterDesktop('B2DXFilter') B2DXFilter.Code = "ALL" #replace this if wanted

# make a Selection B2DXFilterSel = Selection(name = 'B2DXFilterSel', Algorithm = B2DXFilter, RequiredSelections = [ B2DXSel])

# build the SelectionSequence B2DXSeq = SelectionSequence('B2DXSeq', TopSelection = B2DXFilterSel) DV.appendToMainSequence( [ B2DXSeq.sequence() ] )

######################################################################## # DecayTreeTuples ########################################################################

# # Initialisation #

TupleSequence = GaudiSequencer ("TupleSequencer")

AnalysisDecayTreeTuple = DecayTreeTuple("AnalysisDecayTreeTuple") BachelorDecayTreeTuple = DecayTreeTuple("BachelorDecayTreeTuple") KstarDecayTreeTuple = DecayTreeTuple("KstarDecayTreeTuple")

BachelorOrKstarDecayTreeTuple = GaudiSequencer("BachelorOrKstarDecayTreeTuple") ModeOR = 1 BachelorOrKstarDecayTreeTuple.Members = [BachelorDecayTreeTuple,KstarDecayTreeTuple]

TupleSequence.Members += [AnalysisDecayTreeTuple] TupleSequence.Members += [BachelorOrKstarDecayTreeTuple]

#Plots InputLocations = ["B2DXFilterSel"] AnalysisDecayTreeTuple.Decay = "[[B0]cc -> (^D~0 => {^K+ ^pi-, ^K- ^pi+,^K+ ^K-,^pi+ ^pi-}) ? ]cc" AnalysisDecayTreeTuple.Branches = { "D" : "[[B0]cc -> (^D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-}) {K+, pi+}]cc" ,"Positive_D_daughter" : "[[B0]cc -> (D~0 => {^K+ pi-, K- ^pi+, ^K+ K-, ^pi+ pi-, ^K+ ^pi+,K- pi-}) ? ]cc" ,"Negative_D_daughter" : "[[B0]cc -> (D~0 => {K+ ^pi-, ^K- pi+, K+ ^K-, pi+ ^pi-, K+ pi+,^K- ^pi-}) ? ]cc" ,"B" : "[B0]cc : [[B0]cc -> (D~0 => {K+ pi-, K- pi+,K+ K-,pi+ pi-, K+ pi+, K- pi-}) ? ]cc" }

BachelorDecayTreeTuple = DecayTreeTuple("BachelorDecayTreeTuple") InputLocations = ["B2DXFilterSel"] BachelorDecayTreeTuple.Decay = "[[B+]cc -> (^D~0 => {^K+ ^pi-, ^K- ^pi+,^K+ ^K-,^pi+ ^pi-,^K+ ^pi+,^K- ^pi-}) {^K+, ^pi+}]cc" BachelorDecayTreeTuple.Branches = { "D" : "[[B+]cc -> (^D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) {K+, pi+}]cc" ,"Positive_D_Daughter" : "[[B+]cc -> (D~0 => {^K+ pi-, K- ^pi+, ^K+ K-, ^pi+ pi-,^K+ ^pi+,K- pi-}) {K+, pi+}]cc" ,"Negative_D_Daughter" : "[[B+]cc -> (D~0 => {K+ ^pi-, ^K- pi+, K+ ^K-, pi+ ^pi-,K+ pi+,^K- ^pi-}) {K+, pi+}]cc" ,"Bachelor" : "[[B+]cc -> (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) {^K+, ^pi+}]cc" ,"B" : "[B+]cc : [[B+]cc -> (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) {K+, pi+}]cc" }

KstarDecayTreeTuple = DecayTreeTuple("KstarDecayTreeTuple") InputLocations = ["B2DXFilterSel"] KstarDecayTreeTuple.Decay = "[[B0]cc -> (^D~0 => {^K+ ^pi-, ^K- ^pi+,^K+ ^K-,^pi+ ^pi-,^K+ ^pi+,^K- ^pi-}) (^K*(892)0 => ^K+ ^pi-) ]cc" KstarDecayTreeTuple.Branches = { "KStar" : "[[B0]cc -> (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (^K*(892)0 => K+ pi-) ]cc" ,"D" : "[[B0]cc -> (^D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (K*(892)0 => K+ pi-) ]cc" ,"Positive_D_Daughter" : "[[B0]cc -> (D~0 => {^K+ pi-, K- ^pi+, ^K+ K-, ^pi+ pi-,^K+ ^pi+,K- pi-}) (K*(892)0 => K+ pi-) ]cc" ,"Negative_D_Daughter" : "[[B0]cc -> (D~0 => {K+ ^pi-, ^K- pi+, K+ ^K-, pi+ ^pi-,K+ pi+,^K- ^pi-}) (K*(892)0 => K+ pi-) ]cc" ,"K_from_kstar" : "[[B0]cc -> (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (K*(892)0 => ^K+ pi-) ]cc" ,"Pi_from_kstar" : "[[B0]cc -> (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (K*(892)0 => K+ ^pi-) ]cc" ,"B" : "[B0]cc : [[B0]cc -> (D~0 => {K+ pi-, K- pi+,K+ K-,pi+ pi-}) (K*(892)0 => K+ pi-) ]cc" }

ToolList += [ "TupleToolGeometry" , "TupleToolAngles" , "TupleToolKinematic" , "TupleToolPrimaries" , "TupleToolPid" , "TupleToolEventInfo" , "TupleToolTrackInfo" ]

ToolList += [ "TupleToolGeometry" , "TupleToolAngles" , "TupleToolKinematic" , "TupleToolPrimaries" , "TupleToolPid" , "TupleToolEventInfo" , "TupleToolTrackInfo" ]

ToolList += [ "TupleToolGeometry" , "TupleToolAngles" , "TupleToolKinematic" , "TupleToolPrimaries" , "TupleToolPid" , "TupleToolEventInfo" , "TupleToolTrackInfo" , "LoKi::Hybrid::TupleTool/LoKi_All" ]

######################################################################## # Add sequence to DaVinci ########################################################################

UserAlgorithms += [TupleSequence]

 Then all this can be run with the following command.
SetupProject DaVinci
Changed:
<
<
gaudirun.py DaVinci.py data.py | tee output.txt
>
>
gaudirun.py DaVinciAndCandidatesUp.py data.py | tee output.txt
  the output will be copied to the output.txt file.

Running on ppd batch

you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.

#!/bin/bash

stda="`date`"
echo $stda

. /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci

dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all

gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py  | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt

echo
echo $stda
date
echo
echo Done.

The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script.

Placing this in a file called say script and running the following will make it run on the batch system.

qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script

-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job.

Monitoring

The following command will list your currently running jobs. you will need to replace uoh35620 with your user name.

qstat -u uoh35620
Changed:
<
<
For looking at jobs in real time I like to use the following, where the file is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name.
>
>
For looking at jobs in real time I like to use the following, where the file at the beginning is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name. There are also other various paths after that which would alos need to be changed, good luck, but you could just delete the end it only monitors the file sizes of root files.
 
Changed:
<
<
watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620'
>
>
watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620 ; echo ; echo ; du -hs ~/stuff ; echo ; echo ; find ~/stuff -iname "*.root" -print0 | xargs --null du | sort -nr | cut -f 2 | tr "\\n" "\\0" | xargs --null du -h | sed "s#/home/hep/uoh35620/stuff/##"'
  you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command.

alias qstatme="qstat -u uoh35620"

Errors

I have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.

PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk
Job Name:   script
Exec host:  heplnc308.pp.rl.ac.uk/0
An error has occurred processing your job, see below.
Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0

Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/
>>> error from copy
/bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory
>>> end error output

MC Data

Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file
Added:
>
>
  • the stripping python file
 
Added:
>
>
The stripping python file can be found here StrippingB2DPiLoose.py.txt, again you probably want to change the file extension.

The python file pointing to the MC data is the same format as before just with different filenames, see the Real Data section.

the DaVinci file that you need is this one here DaVinci.py.txt.

Then this can again be run like the real data, with the following command.

SetupProject DaVinci
gaudirun.py DaVinci.py data.py | tee output.txt

The output will be copied to output.txt. To run it on the ppd batch system use the same method and scrips as before but cahge the filenames where appropriate.

 

Analysing an ntuple

Added:
>
>

List of Files

Below are all the files mentioned on this wiki

DaVinci.py.txt

META FILEATTACHMENT attachment="StrippingB2DPiLoose.py.txt" attr="" comment="File needed to strip MC data" date="1279621911" name="StrippingB2DPiLoose.py.txt" path="StrippingB2DPiLoose.py" size="9349" stream="StrippingB2DPiLoose.py" tmpFilename="/usr/tmp/CGItemp19331" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinciAndCandidatesUp.py.txt" attr="" comment="File to generate ntuple for actual data" date="1279622034" name="DaVinciAndCandidatesUp.py.txt" path="DaVinciAndCandidatesUp.py" size="6554" stream="DaVinciAndCandidatesUp.py" tmpFilename="/usr/tmp/CGItemp19303" user="ThomasBird" version="1"
META FILEATTACHMENT attachment="DaVinci.py.txt" attr="" comment="DaVinci file to run the stripping of MC data" date="1279622448" name="DaVinci.py.txt" path="DaVinci.py" size="3306" stream="DaVinci.py" tmpFilename="/usr/tmp/CGItemp19265" user="ThomasBird" version="1"
 

Revision 22010-07-19 - ThomasBird

 
META TOPICPARENT name="Main.ThomasBird"
-- ThomasBird - 2010-07-19

Data Analysis

Creating an ntuple

Real Data

to create an ntuple with real data you need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file

My python file pointing to my data looks like:

from Gaudi.Configuration import *

EventSelector().Input   = [
#  for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\"   DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
# ...
]
Changed:
<
<

and can be mostly generated with the command written in the file
>
>

and can be mostly generated with the for command commented in the file.
 
Added:
>
>
Be careful when creating this file as symlinks have caused me issues and prevented DaVinci from reading the data. Since the files existed it caused no errors and it seemed to just skip the file it was looking at resulting in 0 events processed.
 the davinci file needed to create the ntuple is as follows
import GaudiPython
from Gaudi.Configuration import *
from GaudiPython.Bindings import gbl, AppMgr, Helper
from Configurables import GaudiSequencer, HltSelReportsMaker, OutputStream, DaVinci
from Configurables import PrintDecayTree
from os import environ
from Configurables import DecayTreeTuple, EventTuple
from Configurables import CombineParticles, FilterDesktop
from Configurables import TupleToolMCBackgroundInfo, TupleToolMCTruth, TupleToolTrigger, TupleToolTISTOS, TupleToolDecay, BackgroundCategory
from StrippingConf.StrippingLine import StrippingLine



DV = DaVinci()
DV.EvtMax = -1
DV.PrintFreq = 100
DV.SkipEvents = 0
DV.DataType = "2010"
DV.Simulation = True

DV.L0 = False #want this to be False for data; True will overwrite decisions. the same applies to the next few settings
DV.Hlt = False
DV.ReplaceL0BanksWithEmulated = False #reruns the L0. need more info
#DV.HltThresholdSettings = 'Physics_25Vis_25L0_2Hlt1_2Hlt2_May10' #only makes sense if re-running trigger

ApplicationMgr().ExtSvc +=  [ "NTupleSvc" ]
NTupleSvc().Output =  [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.root' TYP='ROOT' OPT='NEW'" ]
#NTupleSvc().Output =  [ "FILE1 DATAFILE='castor:/castor/cern.ch/user/j/jnardull/RealDataRootFiles/B2DX_UP_FromHadron_6_7_2010.root' TYP='ROOT' OPT='NEW'" ]

from Configurables import CondDB
CondDB().IgnoreHeartBeat = True

#stripping stream location
location = '/Event/Bhadron/Phys/B2DXStrippingSelChi2Loose'

from PhysSelPython.Wrappers import AutomaticData, Selection, SelectionSequence
# Get the Candidates from the DST.
# Treat particles already on the DST as data-on-demand, but give the full path.
B2DXSel = AutomaticData(Location = location)
# Filter the Candidate. Replace 'ALL' by some cuts.

B2DXFilter = FilterDesktop('B2DXFilter')
B2DXFilter.Code = "ALL" #replace this if wanted

# make a Selection
B2DXFilterSel = Selection(name = 'B2DXFilterSel',
                          Algorithm = B2DXFilter,
                          RequiredSelections = [ B2DXSel])

# build the SelectionSequence
B2DXSeq = SelectionSequence('B2DXSeq', TopSelection = B2DXFilterSel)
DV.appendToMainSequence( [ B2DXSeq.sequence() ] )

########################################################################
# DecayTreeTuples
########################################################################

#
# Initialisation
#

TupleSequence = GaudiSequencer ("TupleSequencer")

AnalysisDecayTreeTuple = DecayTreeTuple("AnalysisDecayTreeTuple")
BachelorDecayTreeTuple = DecayTreeTuple("BachelorDecayTreeTuple")
KstarDecayTreeTuple = DecayTreeTuple("KstarDecayTreeTuple")

BachelorOrKstarDecayTreeTuple = GaudiSequencer("BachelorOrKstarDecayTreeTuple")
BachelorOrKstarDecayTreeTuple.ModeOR = 1
BachelorOrKstarDecayTreeTuple.Members = [BachelorDecayTreeTuple,KstarDecayTreeTuple]

TupleSequence.Members += [AnalysisDecayTreeTuple]
TupleSequence.Members += [BachelorOrKstarDecayTreeTuple]


#Plots
AnalysisDecayTreeTuple.InputLocations = ["B2DXFilterSel"]
AnalysisDecayTreeTuple.Decay = "[[B0]cc ->  (^D~0 => {^K+ ^pi-, ^K- ^pi+,^K+ ^K-,^pi+ ^pi-}) ? ]cc"
AnalysisDecayTreeTuple.Branches = {
    "D" : "[[B0]cc ->  (^D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-}) {K+, pi+}]cc"
    ,"Positive_D_daughter" : "[[B0]cc ->  (D~0 => {^K+ pi-, K- ^pi+, ^K+ K-, ^pi+ pi-, ^K+ ^pi+,K- pi-}) ? ]cc"
    ,"Negative_D_daughter" : "[[B0]cc ->  (D~0 => {K+ ^pi-, ^K- pi+, K+ ^K-, pi+ ^pi-, K+ pi+,^K- ^pi-}) ? ]cc"
    ,"B" : "[B0]cc : [[B0]cc ->  (D~0 => {K+ pi-, K- pi+,K+ K-,pi+ pi-, K+ pi+, K- pi-}) ? ]cc"
    }

BachelorDecayTreeTuple = DecayTreeTuple("BachelorDecayTreeTuple")
BachelorDecayTreeTuple.InputLocations = ["B2DXFilterSel"]
BachelorDecayTreeTuple.Decay = "[[B+]cc ->  (^D~0 => {^K+ ^pi-, ^K- ^pi+,^K+ ^K-,^pi+ ^pi-,^K+ ^pi+,^K- ^pi-}) {^K+, ^pi+}]cc"
BachelorDecayTreeTuple.Branches = {
    "D" : "[[B+]cc ->  (^D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) {K+, pi+}]cc"
    ,"Positive_D_Daughter" : "[[B+]cc ->  (D~0 => {^K+ pi-, K- ^pi+, ^K+ K-, ^pi+ pi-,^K+ ^pi+,K- pi-}) {K+, pi+}]cc"
    ,"Negative_D_Daughter" : "[[B+]cc ->  (D~0 => {K+ ^pi-, ^K- pi+, K+ ^K-, pi+ ^pi-,K+ pi+,^K- ^pi-}) {K+, pi+}]cc"
    ,"Bachelor" : "[[B+]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) {^K+, ^pi+}]cc"
    ,"B" : "[B+]cc : [[B+]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) {K+, pi+}]cc"
    }

KstarDecayTreeTuple = DecayTreeTuple("KstarDecayTreeTuple")
KstarDecayTreeTuple.InputLocations = ["B2DXFilterSel"]
KstarDecayTreeTuple.Decay = "[[B0]cc -> (^D~0 => {^K+ ^pi-, ^K- ^pi+,^K+ ^K-,^pi+ ^pi-,^K+ ^pi+,^K- ^pi-}) (^K*(892)0 => ^K+ ^pi-) ]cc"
KstarDecayTreeTuple.Branches = {
    "KStar" : "[[B0]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (^K*(892)0 => K+ pi-) ]cc"
    ,"D" : "[[B0]cc ->  (^D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (K*(892)0 => K+ pi-) ]cc"
    ,"Positive_D_Daughter" : "[[B0]cc ->  (D~0 => {^K+ pi-, K- ^pi+, ^K+ K-, ^pi+ pi-,^K+ ^pi+,K- pi-}) (K*(892)0 => K+ pi-) ]cc"
    ,"Negative_D_Daughter" : "[[B0]cc ->  (D~0 => {K+ ^pi-, ^K- pi+, K+ ^K-, pi+ ^pi-,K+ pi+,^K- ^pi-}) (K*(892)0 => K+ pi-) ]cc"
    ,"K_from_kstar" : "[[B0]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (K*(892)0 => ^K+ pi-) ]cc"
    ,"Pi_from_kstar" : "[[B0]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (K*(892)0 => K+ ^pi-) ]cc"
    ,"B" : "[B0]cc : [[B0]cc ->  (D~0 => {K+ pi-, K- pi+,K+ K-,pi+ pi-}) (K*(892)0 => K+ pi-) ]cc"
    }


KstarDecayTreeTuple.ToolList +=  [
    "TupleToolGeometry"
    , "TupleToolAngles"
    , "TupleToolKinematic"
    , "TupleToolPrimaries"
    , "TupleToolPid"
    , "TupleToolEventInfo"
    , "TupleToolTrackInfo"
     ]


BachelorDecayTreeTuple.ToolList +=  [
    "TupleToolGeometry"
    , "TupleToolAngles"
    , "TupleToolKinematic"
    , "TupleToolPrimaries"
    , "TupleToolPid"
    , "TupleToolEventInfo"
    , "TupleToolTrackInfo"
    ]

AnalysisDecayTreeTuple.ToolList +=  [
    "TupleToolGeometry"
    , "TupleToolAngles"
    , "TupleToolKinematic"
    , "TupleToolPrimaries"
    , "TupleToolPid"
    , "TupleToolEventInfo"
    , "TupleToolTrackInfo"
    , "LoKi::Hybrid::TupleTool/LoKi_All" 
    ]

########################################################################
# Add sequence to DaVinci
########################################################################

DV.UserAlgorithms += [TupleSequence]

Then all this can be run with the following command.

SetupProject DaVinci
gaudirun.py DaVinci.py data.py | tee output.txt

the output will be copied to the output.txt file.

Running on ppd batch

you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.

#!/bin/bash

stda="`date`"
echo $stda

. /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci

dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all

gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py  | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt

echo
echo $stda
date
echo
echo Done.

The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script.

Placing this in a file called say script and running the following will make it run on the batch system.

qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script

-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job.

Added:
>
>
Monitoring
  The following command will list your currently running jobs. you will need to replace uoh35620 with your user name.

qstat -u uoh35620

For looking at jobs in real time I like to use the following, where the file is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name.

watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620'
Added:
>
>
you might want to make alias of these commands in your ~/.bashrc file so the following in ~/.bashrc will make a new command called qstatme which will be like typing the full command.

alias qstatme="qstat -u uoh35620"

Errors

I have gotten this email a few times and I'm not really sure of the cause, maybe an error on the bathch macine, but anyway, if you get this just keep re-submitting exactly the same job until it works. The most I've ever had to do it is about 5 times.

PBS Job Id: 4121274.heplnx208.pp.rl.ac.uk
Job Name:   script
Exec host:  heplnc308.pp.rl.ac.uk/0
An error has occurred processing your job, see below.
Post job file processing error; job 4121274.heplnx208.pp.rl.ac.uk on host heplnc308.pp.rl.ac.uk/0

Unable to copy file /var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU to /home/hep/uoh35620/job-out/
>>> error from copy
/bin/cp: cannot stat `/var/spool/pbs/spool/4121274.heplnx208.pp.rl.ac.uk.OU': No such file or directory
>>> end error output
 

MC Data

Changed:
<
<
Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple.
>
>
Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple, but can be done with a simalar set up as before. Again you will need
Added:
>
>
  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file
 

Analysing an ntuple

Revision 12010-07-19 - ThomasBird

 
META TOPICPARENT name="Main.ThomasBird"
-- ThomasBird - 2010-07-19

Data Analysis

Creating an ntuple

Real Data

to create an ntuple with real data you need

  • the DST files you wish to run on
  • a python file pointing to the data
  • the davinci python file

My python file pointing to my data looks like:

from Gaudi.Configuration import *

EventSelector().Input   = [
#  for i in /home/hep/uoh35620/stuff/data/data_2010_up/* ; do echo "\"   DATAFILE='pfn:${i}' TYP='POOL_ROOTTREE' OPT='READ'\"," ; done
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000001_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
"   DATAFILE='pfn:/home/hep/uoh35620/stuff/data/data_2010_up/00007054_00000002_1.bhadron.dst' TYP='POOL_ROOTTREE' OPT='READ'",
# ...
]


and can be mostly generated with the command written in the file

the davinci file needed to create the ntuple is as follows

import GaudiPython
from Gaudi.Configuration import *
from GaudiPython.Bindings import gbl, AppMgr, Helper
from Configurables import GaudiSequencer, HltSelReportsMaker, OutputStream, DaVinci
from Configurables import PrintDecayTree
from os import environ
from Configurables import DecayTreeTuple, EventTuple
from Configurables import CombineParticles, FilterDesktop
from Configurables import TupleToolMCBackgroundInfo, TupleToolMCTruth, TupleToolTrigger, TupleToolTISTOS, TupleToolDecay, BackgroundCategory
from StrippingConf.StrippingLine import StrippingLine



DV = DaVinci()
DV.EvtMax = -1
DV.PrintFreq = 100
DV.SkipEvents = 0
DV.DataType = "2010"
DV.Simulation = True

DV.L0 = False #want this to be False for data; True will overwrite decisions. the same applies to the next few settings
DV.Hlt = False
DV.ReplaceL0BanksWithEmulated = False #reruns the L0. need more info
#DV.HltThresholdSettings = 'Physics_25Vis_25L0_2Hlt1_2Hlt2_May10' #only makes sense if re-running trigger

ApplicationMgr().ExtSvc +=  [ "NTupleSvc" ]
NTupleSvc().Output =  [ "FILE1 DATAFILE='/home/hep/uoh35620/stuff/batch-generate/data/up/all/B2Dpi_Up_6_7_2010.root' TYP='ROOT' OPT='NEW'" ]
#NTupleSvc().Output =  [ "FILE1 DATAFILE='castor:/castor/cern.ch/user/j/jnardull/RealDataRootFiles/B2DX_UP_FromHadron_6_7_2010.root' TYP='ROOT' OPT='NEW'" ]

from Configurables import CondDB
CondDB().IgnoreHeartBeat = True

#stripping stream location
location = '/Event/Bhadron/Phys/B2DXStrippingSelChi2Loose'

from PhysSelPython.Wrappers import AutomaticData, Selection, SelectionSequence
# Get the Candidates from the DST.
# Treat particles already on the DST as data-on-demand, but give the full path.
B2DXSel = AutomaticData(Location = location)
# Filter the Candidate. Replace 'ALL' by some cuts.

B2DXFilter = FilterDesktop('B2DXFilter')
B2DXFilter.Code = "ALL" #replace this if wanted

# make a Selection
B2DXFilterSel = Selection(name = 'B2DXFilterSel',
                          Algorithm = B2DXFilter,
                          RequiredSelections = [ B2DXSel])

# build the SelectionSequence
B2DXSeq = SelectionSequence('B2DXSeq', TopSelection = B2DXFilterSel)
DV.appendToMainSequence( [ B2DXSeq.sequence() ] )

########################################################################
# DecayTreeTuples
########################################################################

#
# Initialisation
#

TupleSequence = GaudiSequencer ("TupleSequencer")

AnalysisDecayTreeTuple = DecayTreeTuple("AnalysisDecayTreeTuple")
BachelorDecayTreeTuple = DecayTreeTuple("BachelorDecayTreeTuple")
KstarDecayTreeTuple = DecayTreeTuple("KstarDecayTreeTuple")

BachelorOrKstarDecayTreeTuple = GaudiSequencer("BachelorOrKstarDecayTreeTuple")
BachelorOrKstarDecayTreeTuple.ModeOR = 1
BachelorOrKstarDecayTreeTuple.Members = [BachelorDecayTreeTuple,KstarDecayTreeTuple]

TupleSequence.Members += [AnalysisDecayTreeTuple]
TupleSequence.Members += [BachelorOrKstarDecayTreeTuple]


#Plots
AnalysisDecayTreeTuple.InputLocations = ["B2DXFilterSel"]
AnalysisDecayTreeTuple.Decay = "[[B0]cc ->  (^D~0 => {^K+ ^pi-, ^K- ^pi+,^K+ ^K-,^pi+ ^pi-}) ? ]cc"
AnalysisDecayTreeTuple.Branches = {
    "D" : "[[B0]cc ->  (^D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-}) {K+, pi+}]cc"
    ,"Positive_D_daughter" : "[[B0]cc ->  (D~0 => {^K+ pi-, K- ^pi+, ^K+ K-, ^pi+ pi-, ^K+ ^pi+,K- pi-}) ? ]cc"
    ,"Negative_D_daughter" : "[[B0]cc ->  (D~0 => {K+ ^pi-, ^K- pi+, K+ ^K-, pi+ ^pi-, K+ pi+,^K- ^pi-}) ? ]cc"
    ,"B" : "[B0]cc : [[B0]cc ->  (D~0 => {K+ pi-, K- pi+,K+ K-,pi+ pi-, K+ pi+, K- pi-}) ? ]cc"
    }

BachelorDecayTreeTuple = DecayTreeTuple("BachelorDecayTreeTuple")
BachelorDecayTreeTuple.InputLocations = ["B2DXFilterSel"]
BachelorDecayTreeTuple.Decay = "[[B+]cc ->  (^D~0 => {^K+ ^pi-, ^K- ^pi+,^K+ ^K-,^pi+ ^pi-,^K+ ^pi+,^K- ^pi-}) {^K+, ^pi+}]cc"
BachelorDecayTreeTuple.Branches = {
    "D" : "[[B+]cc ->  (^D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) {K+, pi+}]cc"
    ,"Positive_D_Daughter" : "[[B+]cc ->  (D~0 => {^K+ pi-, K- ^pi+, ^K+ K-, ^pi+ pi-,^K+ ^pi+,K- pi-}) {K+, pi+}]cc"
    ,"Negative_D_Daughter" : "[[B+]cc ->  (D~0 => {K+ ^pi-, ^K- pi+, K+ ^K-, pi+ ^pi-,K+ pi+,^K- ^pi-}) {K+, pi+}]cc"
    ,"Bachelor" : "[[B+]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) {^K+, ^pi+}]cc"
    ,"B" : "[B+]cc : [[B+]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) {K+, pi+}]cc"
    }

KstarDecayTreeTuple = DecayTreeTuple("KstarDecayTreeTuple")
KstarDecayTreeTuple.InputLocations = ["B2DXFilterSel"]
KstarDecayTreeTuple.Decay = "[[B0]cc -> (^D~0 => {^K+ ^pi-, ^K- ^pi+,^K+ ^K-,^pi+ ^pi-,^K+ ^pi+,^K- ^pi-}) (^K*(892)0 => ^K+ ^pi-) ]cc"
KstarDecayTreeTuple.Branches = {
    "KStar" : "[[B0]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (^K*(892)0 => K+ pi-) ]cc"
    ,"D" : "[[B0]cc ->  (^D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (K*(892)0 => K+ pi-) ]cc"
    ,"Positive_D_Daughter" : "[[B0]cc ->  (D~0 => {^K+ pi-, K- ^pi+, ^K+ K-, ^pi+ pi-,^K+ ^pi+,K- pi-}) (K*(892)0 => K+ pi-) ]cc"
    ,"Negative_D_Daughter" : "[[B0]cc ->  (D~0 => {K+ ^pi-, ^K- pi+, K+ ^K-, pi+ ^pi-,K+ pi+,^K- ^pi-}) (K*(892)0 => K+ pi-) ]cc"
    ,"K_from_kstar" : "[[B0]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (K*(892)0 => ^K+ pi-) ]cc"
    ,"Pi_from_kstar" : "[[B0]cc ->  (D~0 => {K+ pi-, K- pi+, K+ K-, pi+ pi-,K+ pi+,K- pi-}) (K*(892)0 => K+ ^pi-) ]cc"
    ,"B" : "[B0]cc : [[B0]cc ->  (D~0 => {K+ pi-, K- pi+,K+ K-,pi+ pi-}) (K*(892)0 => K+ pi-) ]cc"
    }


KstarDecayTreeTuple.ToolList +=  [
    "TupleToolGeometry"
    , "TupleToolAngles"
    , "TupleToolKinematic"
    , "TupleToolPrimaries"
    , "TupleToolPid"
    , "TupleToolEventInfo"
    , "TupleToolTrackInfo"
     ]


BachelorDecayTreeTuple.ToolList +=  [
    "TupleToolGeometry"
    , "TupleToolAngles"
    , "TupleToolKinematic"
    , "TupleToolPrimaries"
    , "TupleToolPid"
    , "TupleToolEventInfo"
    , "TupleToolTrackInfo"
    ]

AnalysisDecayTreeTuple.ToolList +=  [
    "TupleToolGeometry"
    , "TupleToolAngles"
    , "TupleToolKinematic"
    , "TupleToolPrimaries"
    , "TupleToolPid"
    , "TupleToolEventInfo"
    , "TupleToolTrackInfo"
    , "LoKi::Hybrid::TupleTool/LoKi_All" 
    ]

########################################################################
# Add sequence to DaVinci
########################################################################

DV.UserAlgorithms += [TupleSequence]

Then all this can be run with the following command.

SetupProject DaVinci
gaudirun.py DaVinci.py data.py | tee output.txt

the output will be copied to the output.txt file.

Running on ppd batch

you can run these taks on the ppd batch system, since doing a large number of events can take many hours. below is a script I use to run these taks.

#!/bin/bash

stda="`date`"
echo $stda

. /afs/rl.ac.uk/lhcb/lhcb/LBSCRIPTS/LBSCRIPTS_v5r2/InstallArea/scripts/SetupProject.sh DaVinci

dir=/home/hep/uoh35620/stuff/batch-generate/data/up/all

gaudirun.py ${dir}/DaVinciAndCandidatesUp.py ${dir}/6_7_2010_up.py  | tee ${dir}/output.txt | tee /home/hep/uoh35620/stuff/batch-generate/test.txt

echo
echo $stda
date
echo
echo Done.

The dot at the beginning of the line ending in SetupProject.sh DaVinci is very important. If this is not there the paths will not be set up properly for your script.

Placing this in a file called say script and running the following will make it run on the batch system.

qsub -q prod -S /bin/bash -j oe -o /home/hep/uoh35620/job-out/ script

-q chooses the queue you wish to submit the job to while -j oe joins the stderr and stdout together into one file. -o puts the output file into a given directory and script is the above script to run the job.

The following command will list your currently running jobs. you will need to replace uoh35620 with your user name.

qstat -u uoh35620

For looking at jobs in real time I like to use the following, where the file is the output of one of the tee's in the script and uoh35620 will need to be changed to your user name.

watch -n 10 'tail -n 20 /home/hep/uoh35620/stuff/batch-generate/test.txt ; echo ; echo ; qstat -u uoh35620'

MC Data

Creating the ntuple with monte carlo data is harder than with the actual data since you must first strip it before creating the ntuple.

Analysing an ntuple

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback