Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation violation for StandAloneMuonProducer in CMSSW_14_0_7 #45035

Closed
francescobrivio opened this issue May 24, 2024 · 30 comments
Closed

Comments

@francescobrivio
Copy link
Contributor

Dear all,
As reported in
https://cms-talk.web.cern.ch/t/segmentation-violation-in-promptreco-for-parkingdoublemuonlowmass1-run-380963/41520
we have a segmentation violation for the ParkingDoubleMuonLowMass dataset in Run 380963, with the following stack trace:

Begin processing the 1590th record. Run 380963, Event 73250130, LumiSection 85 on stream 3 at 24-May-2024 02:57:25.663 UTC
%MSG-e KFUpdator:  StandAloneMuonProducer:displacedStandAloneMuons  24-May-2024 02:57:26 UTC Run: 380963 Event: 73250130
 could not invert martix:
[   3.0909e+13 4.23366e+13 6.14231e+14 8.06196e+14
   4.23366e+13 5.79893e+13 8.41324e+14 1.10426e+15
   6.14231e+14 8.41324e+14 1.22062e+16 1.60209e+16
   8.06196e+14 1.10426e+15 1.60209e+16 2.10279e+16 ]
%MSG

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

...
Thread 7 (Thread 0x14b2a4ffd700 (LWP 1099) "cmsRun"):
#0  0x000014b2fe1bd301 in poll () from /lib64/libc.so.6
#1  0x000014b2fa18c6af in full_read.constprop () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x000014b2fa140dbc in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  0x000014b2fa141720 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x000014b2c4879cb7 in Propagator::propagateWithPath(TrajectoryStateOnSurface const&, Plane const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libTrackP
ropagationSteppingHelixPropagator.so
#6  0x000014b2f4a1b219 in ForwardDetLayer::compatible(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0
_7/lib/el8_amd64_gcc12/libTrackingToolsDetLayers.so
#7  0x000014b2c47ef36f in MuRingForwardLayer::compatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMS
SW_14_0_7/lib/el8_amd64_gcc12/libRecoMuonDetLayers.so
#8  0x000014b2c47efc73 in MuRingForwardDoubleLayer::groupedCompatibleDets(TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/
cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoMuonDetLayers.so
#9  0x000014b28ba43c78 in MuonDetLayerMeasurements::groupedMeasurements(DetLayer const*, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&, edm::Event const&) () from /cvmfs/
cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoMuonMeasurementDet.so
#10 0x000014b28ba44989 in MuonDetLayerMeasurements::groupedMeasurements(DetLayer const*, TrajectoryStateOnSurface const&, Propagator const&, MeasurementEstimator const&) () from /cvmfs/cms.cern.ch/el8_amd
64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoMuonMeasurementDet.so
#11 0x000014b2683aed74 in StandAloneMuonFilter::findBestMeasurements(DetLayer const*, TrajectoryStateOnSurface const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12
/libRecoMuonStandAloneTrackFinder.so
#12 0x000014b2683b2585 in StandAloneMuonFilter::refit(TrajectoryStateOnSurface const&, DetLayer const*, Trajectory&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/l
ibRecoMuonStandAloneTrackFinder.so
#13 0x000014b2683b46a4 in StandAloneMuonTrajectoryBuilder::trajectories(TrajectorySeed const&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_7/lib/el8_amd64_gcc12/libRecoMuonStandAloneTr
ackFinder.so
...
Current Modules:

Module: StandAloneMuonProducer:displacedStandAloneMuons (crashed)
Module: CkfTrackCandidateMaker:lowPtTripletStepTrackCandidates
Module: CkfTrackCandidateMaker:lowPtQuadStepTrackCandidates
Module: TrackingMonitor:TrackSeedMondetachedQuadStep
Module: PoolOutputModule:write_MINIAOD
Module: SiStripMonitorTrack:SiStripMonitorTrackCommon
Module: MkFitProducer:detachedTripletStepTrackCandidatesMkFit
Module: CkfTrackCandidateMaker:pixelPairStepTrackCandidates

A fatal system signal has occurred: segmentation violation
Complete

A minimal recipe to reproduce the error is:

cmsrel CMSSW_14_0_7
cd CMSSW_14_0_7/src
cmsenv
cp /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/displacedStandAloneMuons/job_3221638/job/WMTaskSpace/cmsRun1/PSet.p* .
cat <<EOF >> PSet.py
    process.options.numberOfThreads=cms.untracked.uint32(1)
    process.options.numberOfStreams=cms.untracked.uint32(1)
    process.source.skipEvents = cms.untracked.uint32(1589)
EOF
cmsRun -e PSet.py
@cmsbuild
Copy link
Contributor

cmsbuild commented May 24, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @francescobrivio.

@Dr15Jones, @antoniovilela, @rappoccio, @sextonkennedy, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@francescobrivio
Copy link
Contributor Author

assign reconstruction

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction

@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

@francescobrivio
Copy link
Contributor Author

francescobrivio commented May 24, 2024

FYI @cms-sw/muon-pog-l2 and @cms-sw/muon-dpg-l2

@francescobrivio
Copy link
Contributor Author

type muon

@cmsbuild cmsbuild added the muon label May 24, 2024
@mmusich
Copy link
Contributor

mmusich commented May 24, 2024

For the record with this:

diff --git a/TrackingTools/DetLayers/src/ForwardDetLayer.cc b/TrackingTools/DetLayers/src/ForwardDetLayer.cc
index d6f0afdf22c..b5d2eaa27ac 100644
--- a/TrackingTools/DetLayers/src/ForwardDetLayer.cc
+++ b/TrackingTools/DetLayers/src/ForwardDetLayer.cc
@@ -88,7 +88,13 @@ pair<bool, TrajectoryStateOnSurface> ForwardDetLayer::compatible(const Trajector
     edm::LogError("DetLayers") << "ERROR: BarrelDetLayer::compatible() is used before the layer surface is initialized";
   // throw an exception? which one?
 
-  TrajectoryStateOnSurface myState = prop.propagate(ts, specificSurface());
+  TrajectoryStateOnSurface myState;
+  if UNLIKELY (!ts.isValid()) {
+    edm::LogError("DetLayers") << "ERROR: trying to propagate invalid TSOS" << std::endl;
+  } else {
+    myState = prop.propagate(ts, specificSurface());
+  }
+
   if UNLIKELY (!myState.isValid())
     return make_pair(false, myState);
 

one gets past the error and in the log running the script at #45035 (comment) one can find:

%MSG-e KFUpdator:  StandAloneMuonProducer:displacedStandAloneMuons  24-May-2024 11:33:49 CEST Run: 380963 Event: 73250130
 could not invert martix:
[   3.0909e+13 4.23366e+13 6.14231e+14 8.06196e+14
   4.23366e+13 5.79893e+13 8.41324e+14 1.10426e+15
   6.14231e+14 8.41324e+14 1.22062e+16 1.60209e+16
   8.06196e+14 1.10426e+15 1.60209e+16 2.10279e+16 ]
%MSG
%MSG-e DetLayers:  StandAloneMuonProducer:displacedStandAloneMuons  24-May-2024 11:33:49 CEST Run: 380963 Event: 73250130
ERROR: trying to propage invalid TSOS
%MSG
%MSG-e DetLayers:  StandAloneMuonProducer:displacedStandAloneMuons  24-May-2024 11:33:49 CEST Run: 380963 Event: 73250130
ERROR: trying to propage invalid TSOS
%MSG

it seems plausible that then the problem originates here:

} else {
edm::LogError("KFUpdator") << " could not invert martix:\n" << (V + VMeas);
return TrajectoryStateOnSurface();
}

where upon being unable to invert a matrix, a default constructed TSOS (with invalid state) is returned.

@jfernan2
Copy link
Contributor

Let me tag the Muon Reco contacts too: @24LopezR @rbhattacharya04

@slava77
Copy link
Contributor

slava77 commented May 24, 2024

it seems plausible that then the problem originates here:

cmssw/TrackingTools/KalmanUpdators/src/KFUpdator.cc

it's not in the crash stack.
Do you mean the preceding LogError?
For a fix, instead of ForwardDetLayer::compatible, an upstream caller may be a better place to decide what to do with a bad state that originated there.

@mmusich
Copy link
Contributor

mmusich commented May 25, 2024

Do you mean the preceding LogError?

yes

an upstream caller may be a better place to decide what to do with a bad state that originated there.

is:

diff --git a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
index 5253626ab46..0150b405859 100644
--- a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
+++ b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
@@ -299,6 +299,11 @@ std::vector<TrajectoryMeasurement> StandAloneMuonFilter::findBestMeasurements(co
   std::vector<TrajectoryMeasurement> result;
   std::vector<TrajectoryMeasurement> measurements;
 
+  if (!tsos.isValid()) {
+    edm::LogError(metname) << "ERROR: trying to propagate invalid TSOS" << std::endl;
+    return result;
+  }
+
   if (theOverlappingChambersFlag && layer->hasGroups()) {
     std::vector<TrajectoryMeasurementGroup> measurementGroups =
         theMeasurementExtractor->groupedMeasurements(layer, tsos, *propagator(), *estimator());

upstream enough ?

@slava77
Copy link
Contributor

slava77 commented May 25, 2024

@@ -299,6 +299,11 @@ std::vector<TrajectoryMeasurement> StandAloneMuonFilter::findBestMeasurements(co

StandAloneMuonFilter::refit is missing checks of validity, although it does check the empty output from the findBestMeasurements in lieu of that.
In case the invalid TSOS is the StandAloneMuonFilter::refit input, then going further up would be more clear.

@slava77
Copy link
Contributor

slava77 commented May 25, 2024

on a second thought, I'm not really against the edm::LogError in the ForwardDetLayers; it's still more appropriate than a crash.

The hope from going upstream would be to see if there it could become a case more suitable for a LogInfo

@slava77
Copy link
Contributor

slava77 commented May 25, 2024

type tracking

@mmusich
Copy link
Contributor

mmusich commented May 25, 2024

In case the invalid TSOS is the StandAloneMuonFilter::refit input, then going further up would be more clear.
[...]
The hope from going upstream would be to see if there it could become a case more suitable for a LogInfo

for this case I tested (successfully) also this variant:

diff --git a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
index 5253626ab46..08deb0200a1 100644
--- a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
+++ b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
@@ -233,7 +233,13 @@ void StandAloneMuonFilter::refit(const TrajectoryStateOnSurface& initialTSOS,
     LogTrace(metname) << "search Trajectory Measurement from: " << lastTSOS.globalPosition();
 
     // pick the best measurement from each group
-    std::vector<TrajectoryMeasurement> bestMeasurements = findBestMeasurements(*layer, lastTSOS);
+    std::vector<TrajectoryMeasurement> bestMeasurements{};
+
+    if (lastTSOS.isValid()) {
+      bestMeasurements = findBestMeasurements(*layer, lastTSOS);
+    } else {
+      edm::LogInfo(metname) << "Invalid last TSOS, will not find best measurements ";
+    }
 
     // RB: Different ways can be choosen if no bestMeasurement is available:
     // 1- check on lastTSOS-initialTSOS eta difference
@@ -248,10 +254,16 @@ void StandAloneMuonFilter::refit(const TrajectoryStateOnSurface& initialTSOS,
     // wrt the initial one (i.e. seed), then try to find the measurements
     // according to the lastButOne FTS. (1B)
     double lastdEta = fabs(lastTSOS.freeTrajectoryState()->momentum().eta() - eta0);
+
     if (bestMeasurements.empty() && lastdEta > 0.1) {
       LogTrace(metname) << "No measurement and big eta variation wrt seed" << endl
                         << "trying with lastButOneUpdatedTSOS";
-      bestMeasurements = findBestMeasurements(*layer, theLastButOneUpdatedTSOS);
+
+      if (theLastButOneUpdatedTSOS.isValid()) {
+        bestMeasurements = findBestMeasurements(*layer, theLastButOneUpdatedTSOS);
+      } else {
+        edm::LogInfo(metname) << "Invalid last but one updated TSOS, will not find best measurements ";
+      }
     }
 
     //if no measurement found and the current FTS has an eta very different
@@ -259,7 +271,12 @@ void StandAloneMuonFilter::refit(const TrajectoryStateOnSurface& initialTSOS,
     //according to the initial FTS. (1A)
     if (bestMeasurements.empty() && lastdEta > 0.1) {
       LogTrace(metname) << "No measurement and big eta variation wrt seed" << endl << "tryng with seed TSOS";
-      bestMeasurements = findBestMeasurements(*layer, initialTSOS);
+
+      if (initialTSOS.isValid()) {
+        bestMeasurements = findBestMeasurements(*layer, initialTSOS);
+      } else {
+        edm::LogInfo(metname) << "Invalid initial TSOS, will not find best measurements ";
+      }
     }
 
     // FIXME: uncomment this line!!

(in the specifics of this crash, it was actually theLastButOneUpdatedTSOS not being valid that lead to the crash in StandAloneMuonFilter::findBestMeasurements.).

I'm not really against the edm::LogError in the ForwardDetLayers; it's still more appropriate than a crash.

if either way is fine, I'd let the domain experts to decide / implement a protection.

@francescobrivio
Copy link
Contributor Author

We currently have other two failures all reporting

Module: StandAloneMuonProducer:displacedStandAloneMuons (crashed)
  • PromptReco_Run381065_ParkingDoubleMuonLowMass3
    • Tarball in
      /eos/cms/tier0/store/unmerged/data/logs/prod/2024/5/25/PromptReco_Run381065_ParkingDoubleMuonLowMass3/Reco/0000/3/3508ad47-e770-4fff-bf82-400ad0f155cd-63-3-logArchive.tar.gz
      
  • PromptReco_Run381067_ParkingDoubleMuonLowMass2
    • Tarball in
      /eos/cms/tier0/store/unmerged/data/logs/prod/2024/5/26/PromptReco_Run381067_ParkingDoubleMuonLowMass2/Reco/0000/3/a85151ad-f91a-4493-ac15-756b7c6fba0f-3-3-logArchive.tar.gz
      

@francescobrivio
Copy link
Contributor Author

@cms-sw/reconstruction-l2 @24LopezR @rbhattacharya04
Do you have any update?
@mmusich proposed two different possible solutions in the comments above...

@francescobrivio
Copy link
Contributor Author

francescobrivio commented May 27, 2024

I have tested the provided fix on the crashing jobs, and while it solves the first issue we observe, it doesn't help with the crashes observed in the other 2 jobs:

  • PromptReco_Run380963_ParkingDoubleMuonLowMass1 --> solved
  • PromptReco_Run381065_ParkingDoubleMuonLowMass3 --> still crashing
  • PromptReco_Run381067_ParkingDoubleMuonLowMass2 --> still crashing

Recipes to reproduce the second two jobs
Run381065

cmsrel CMSSW_14_0_7
cd CMSSW_14_0_7/src
cmsenv
cp /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/displacedStandAloneMuons/job_3544004/job/WMTaskSpace/cmsRun1/PSet* .
cat <<EOF >> PSet.py
    process.options.numberOfThreads=cms.untracked.uint32(1)
    process.options.numberOfStreams=cms.untracked.uint32(1)
    process.source.skipEvents = cms.untracked.uint32(1807)
EOF
cmsRun PSet.py

Run381067

cmsrel CMSSW_14_0_7
cd CMSSW_14_0_7/src
cmsenv
cp /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/displacedStandAloneMuons/job_3585803/job/WMTaskSpace/cmsRun1/PSet* .
cat <<EOF >> PSet.py
    process.options.numberOfThreads=cms.untracked.uint32(1)
    process.options.numberOfStreams=cms.untracked.uint32(1)
    process.source.skipEvents = cms.untracked.uint32(1463)
EOF
cmsRun PSet.py

And Matt's fix can be fetched with:

cmsrel CMSSW_14_0_7
cd CMSSW_14_0_7/src
cmsenv
git cms-addpkg RecoMuon/StandAloneTrackFinder
git remote add matt [email protected]:mandrenguyen/cmssw.git
git fetch matt muSegFault_140X
git cherry-pick addd479c0ea87af0b586ed366c64d0de3212e4df
scram b -j 8

@mmusich
Copy link
Contributor

mmusich commented May 27, 2024

PromptReco_Run381065_ParkingDoubleMuonLowMass3 --> still crashing
PromptReco_Run381067_ParkingDoubleMuonLowMass2 --> still crashing

if you have reproduced offline, can you update the recipe with the number of events to skip to get faster at the crash?

@francescobrivio
Copy link
Contributor Author

PromptReco_Run381065_ParkingDoubleMuonLowMass3 --> still crashing
PromptReco_Run381067_ParkingDoubleMuonLowMass2 --> still crashing

if you have reproduced offline, can you update the recipe with the number of events to skip to get faster at the crash?

sorry i forgot to add it in the recipe! I've edited the message above.

@mmusich
Copy link
Contributor

mmusich commented May 27, 2024

Run381065

cmsrel CMSSW_14_0_7
cd CMSSW_14_0_7/src
cmsenv
cp /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/displacedStandAloneMuons/job_3544004/job/WMTaskSpace/cmsRun1/PSet* .
cat <<EOF >> PSet.py
    process.options.numberOfThreads=cms.untracked.uint32(1)
    process.options.numberOfStreams=cms.untracked.uint32(1)
    process.source.skipEvents = cms.untracked.uint32(1807)
EOF
cmsRun PSet.py

Run381067

cmsrel CMSSW_14_0_7
cd CMSSW_14_0_7/src
cmsenv
cp /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/displacedStandAloneMuons/job_3585803/job/WMTaskSpace/cmsRun1/PSet* .
cat <<EOF >> PSet.py
    process.options.numberOfThreads=cms.untracked.uint32(1)
    process.options.numberOfStreams=cms.untracked.uint32(1)
    process.source.skipEvents = cms.untracked.uint32(1463)
EOF
cmsRun PSet.py

the two other crashes happen at a different location:

double lastdEta = fabs(lastTSOS.freeTrajectoryState()->momentum().eta() - eta0);

in this case lastTSOS is not valid.
By initializing lastdEta only if lastTSOS is valid fixes both crashes, though I don't have any feeling about which value lastdEta should be initialized to (in my example below I initialize to 0. so that the two other clauses triggered by lastdEta > 0.1 are never entered).

diff --git a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
index 5253626ab46..30f1003b9eb 100644
--- a/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
+++ b/RecoMuon/StandAloneTrackFinder/src/StandAloneMuonFilter.cc
@@ -233,7 +233,15 @@ void StandAloneMuonFilter::refit(const TrajectoryStateOnSurface& initialTSOS,
     LogTrace(metname) << "search Trajectory Measurement from: " << lastTSOS.globalPosition();
 
     // pick the best measurement from each group
-    std::vector<TrajectoryMeasurement> bestMeasurements = findBestMeasurements(*layer, lastTSOS);
+    std::vector<TrajectoryMeasurement> bestMeasurements{};
+
+    double lastdEta{0.f};
+    if (lastTSOS.isValid()) {
+      lastdEta = fabs(lastTSOS.freeTrajectoryState()->momentum().eta() - eta0);
+      bestMeasurements = findBestMeasurements(*layer, lastTSOS);
+    } else {
+      edm::LogInfo(metname) << "Invalid last TSOS, will not find best measurements ";
+    }

@24LopezR
Copy link
Contributor

Hi all,
It looks like the crash is ultimately coming from this part of MuonTrajectoryUpdator.cc (which is one step below StandAloneMuonFilter.cc which Marco is addressing):

lastUpdatedTSOS = measurementUpdator()->update(propagatedTSOS, *((*recHit).get()));

which calls KFUpdator::update and returns an invalid TSOS. A possible workaround could be to skip the recHit whenever this occurs.

diff --git a/RecoMuon/TrackingTools/src/MuonTrajectoryUpdator.cc b/RecoMuon/TrackingTools/src/MuonTrajectoryUpdator.cc
index f4afc5ba640..30c3ac95fd3 100644
--- a/RecoMuon/TrackingTools/src/MuonTrajectoryUpdator.cc
+++ b/RecoMuon/TrackingTools/src/MuonTrajectoryUpdator.cc
@@ -156,6 +156,12 @@ pair<bool, TrajectoryStateOnSurface> MuonTrajectoryUpdator::update(const Traject
 
             lastUpdatedTSOS = measurementUpdator()->update(propagatedTSOS, *((*recHit).get()));
 
+            if (!lastUpdatedTSOS.isValid()) {
+              edm::LogInfo(metname) << "Invalid last TSOS, will skip RecHit ";
+              lastUpdatedTSOS = propagatedTSOS; // Revert update
+              continue;
+            }
+
             LogTrace(metname) << "  Fit   Position : " << lastUpdatedTSOS.globalPosition()

I just tested and this modification avoids the crash in both jobs, but implies a slight modification in muon trajectory updator, which may be sensitive. As this condition (returning invalid TSOS) should never happen, I think we can go with it for the moment.

But I would like to hear a second opinion on it from CMSSW experts, thanks.

@mandrenguyen
Copy link
Contributor

mandrenguyen commented May 27, 2024

Thanks @24LopezR
May I suggest you make a PR with that solution to 14_1_X and a backport to 14_0_X?
People can then give feedback in the PR thread while the tests are running in parallel.

@mmusich
Copy link
Contributor

mmusich commented May 27, 2024

I just tested and this modification avoids the crash in both jobs, but implies a slight modification in muon trajectory updator, which may be sensitive. As this condition (returning invalid TSOS) should never happen, I think we can go with it for the moment.

so this fixes all the 3 instances, (including the one in run-380963 ), right?
Then perhaps #45049 should be reverted.

@24LopezR
Copy link
Contributor

so this fixes all the 3 instances, (including the one in run-380963 ), right? Then perhaps #45049 should be reverted.

Well, I tested my modification on top of PR #45049. So in principle it can't be safely reverted.
Let me do in the meanwhile more standalone tests of my PR.

@mmusich
Copy link
Contributor

mmusich commented May 27, 2024

Well, I tested my modification on top of PR #45049.

I did test it without and it didn't crash.

@mpresill
Copy link

mpresill commented May 28, 2024

We have a have a similar segmentation violation in Run 381115 . Here is the tarball with the log:
/eos/cms/tier0/store/unmerged/data/logs/prod/2024/5/28/PromptReco_Run381115_ParkingSingleMuon0/Reco/0000/3/0609ffe4-6b01-45f9-bb3f-b9263252040e-268-3-logArchive.tar.gz
If anyone wants to test the patch for that too.

Matteo (ORM)

@gpetruc
Copy link
Contributor

gpetruc commented Jun 11, 2024

Considering that the fix is merged and already included in 14_0_8 then maybe the issue can be closed?

Giovanni (ORM)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@francescobrivio
Copy link
Contributor Author

Closing as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants