-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moved Egamma TensorFlow sessions to global cache #46655
base: master
Are you sure you want to change the base?
Conversation
assign ml |
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46655/42576 |
A new Pull Request was created by @valsdav for master. It involves the following packages:
@cmsbuild, @jfernan2, @mandrenguyen, @valsdav, @y19y19 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
please test |
-1 Failed Tests: UnitTests RelVals RelVals-INPUT AddOn Unit TestsI found 4 errors in the following unit tests: ---> test test_MC_22_crosscheck had ERRORS ---> test test_MC_23_crosscheck had ERRORS ---> test test_MC_23_setup had ERRORS and more ... RelVals
Expand to see more relval errors ...RelVals-INPUT
Expand to see more relval errors ...
AddOn Tests
Expand to see more addon errors ... |
please test |
please abort test |
90d7ded
to
83d1d98
Compare
please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46655/42587 |
Pull request #46655 was updated. @jfernan2, @mandrenguyen, @valsdav, @y19y19 can you please check and sign again. |
-1 Failed Tests: Build ClangBuild BuildI found compilation warning when building: See details on the summary page. Clang BuildI found compilation error while trying to compile with clang. Command used:
>> Entering Package RecoEgamma/EgammaPhotonProducers >> Entering Package RecoEgamma/EgammaTools >> Entering Package RecoEgamma/ElectronIdentification >> Entering Package RecoEgamma/PhotonIdentification >> Compile sequence completed for CMSSW CMSSW_14_2_X_2024-11-11-1100 gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 1 + eval scram build outputlog '&&' '(python3' /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cms-bot/buildLogAnalyzer.py --logDir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_2_X_2024-11-11-1100/tmp/el8_amd64_gcc12/cache/log/src '||' 'true)' ++ scram build outputlog >> Entering Package RecoEgamma/EgammaElectronProducers Entering library rule at src/RecoEgamma/EgammaElectronProducers/plugins >> Compiling edm plugin src/RecoEgamma/EgammaElectronProducers/plugins/ElectronNHitSeedProducer.cc |
EgammaDNNHelper was storing TF graphs globally by job, but TF sessions were owned my the GSFElectronProducer and GEDPhotonProducers. This commit moves the TFSessions in the EgammaDNNHelper, making them global by job.
83d1d98
to
22246a6
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46655/42591 |
please test |
+1 Size: This PR adds an extra 20KB to repository Comparison SummarySummary:
|
Below is an untested recipe to reproduce what I did for #46494 # Set up developer area
cmssw-el7
export SCRAM_ARCH=slc7_amd64_gcc12
cmsrel CMSSW_14_0_16
cd CMSSW_14_0_16/src
cmsenv
# Set up the configuration
cp /eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1/job/WMTaskSpace/cmsRun1/PSet.p* .
xrdcp root://eoscms.cern.ch//eos/cms/tier0/store/data/Run2024H/Muon1/RAW/v1/000/386/319/00000/27604955-1005-455b-90eb-8195e2f02cb8.root .
cat >>PSet.py <<EOF
process.source.fileNames = ["file:27604955-1005-455b-90eb-8195e2f02cb8.root"]
process.maxEvents.input = 20
process.options.numberOfThreads = 4
process.options.numberOfStreams = 4
process.IgProfService = cms.Service('IgProfService',
reportFirstEvent = cms.untracked.int32(0),
reportEventInterval = cms.untracked.int32(10),
reportToFileAtPostEvent = cms.untracked.string('| gzip -c > 4th_4st_07.mem.%I.gz')
)
EOF
# Run IgProf memory profiler
igprof -d -t cmsRunGlibC -z -mp -o test_07.mem.gz cmsRunGlibC PSet.py > out.txt 2>&1
igprof-analyse --sqlite -d -v -g -r MEM_LIVE test_4th_4st_07.mem.20.gz | sqlite3 test_4th_4st_07.20_live.sql3
# Make the memory profile web browseable (needs a web area with cgi-bin access)
cp (which igprof-navigator) <web dir>/cgi-bin
mkdir <web dir>/cgi-bin/data
cp test_4th_4st_07.20_live.sql3 <web dir>/cgi-bin/data Open web browser to |
Thanks a lot! Unfortunately the data file is no more available, can I just use another one? |
@@ -65,6 +66,7 @@ namespace egammaTools { | |||
std::vector<uint> nInputs_; | |||
|
|||
std::vector<std::unique_ptr<const tensorflow::GraphDef>> graphDefs_; | |||
std::vector<std::unique_ptr<tensorflow::Session>> sessions_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming tensor flow::Session
is const thread-safe this should be std::vector<std::unique_ptr<const tensorflow::Session>>
as it is used across threads concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC tensorflow::Session
interface is not const
, even if it is thread safe and does not mutate the Session
object (i.e. in principle the interface could be const
) :(
PR description:
Solves #46494. Related to #46040
EgammaDNNHelper
was storing TF graphs globally by job, but TF sessions were owned my theGSFElectronProducer
andGEDPhotonProducers
by stream.This PR moves the TFSessions in the
EgammaDNNHelper
, making them global by job.PR validation:
Purely technical PR, no changes expected.