Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash at the end of REST file processing with Jana2 #834

Open
aaust opened this issue Aug 30, 2024 · 15 comments
Open

Crash at the end of REST file processing with Jana2 #834

aaust opened this issue Aug 30, 2024 · 15 comments
Labels

Comments

@aaust
Copy link
Contributor

aaust commented Aug 30, 2024

Reported by @zihlmann

I tried to use JANA2 on the ifarm using version_5.19.1_jana2.xml and I looked at CPP/NPP data REST files.
To me it seems to always crash at the end or close to the end. I tried 2 files and in both cases the crash is around event # 2million. So it looks like at the end of the file.

The command I used was:
hd_root -PLUGINS monitoring_hists /cache/halld/RunPeriod-2022-05/recon/ver01/REST/101534/dana_rest_101534_000.hddm

@aaust
Copy link
Contributor Author

aaust commented Aug 31, 2024

I just built a new version set version_5.19.2_jana2.xml with Raiqa's fix. The hd_root process seems to end correctly at the end of the file, but the output from monitoring_hists looks very different from the Jana1 output. Somehow, I also get all these error messages:

DL1MCTrigger_factory:  This program MUST be used with an HDDM file as input!
   Default seeds will be used for the random generator

Beni @zihlmann , could you please run the test again?

@RaiqaRasool
Copy link

The difference in results between JANA1 and JANA2 in monitoring_hists is expected due to the changes in how the Get function behaves in JANA2 compared to JANA1.

Screenshot

If the level of discrepancy you've observed is somewhat the same as it is shown in the image above then it is indeed due to the updated Get function. To achieve consistent results with what produced in JANA1, you'll need to build halld_recon against the rasool_jana2_mimicking_jana1 branch rather than the rasool_jana2 branch.

Regarding the DL1MCTrigger_factory error you mentioned, I am not sure why you are getting that. I ran the monitoring_hists plugin using the following command and it ran without any errors for me:

hd_root /cache/halld/RunPeriod-2023-01/rawdata/Run121120/hd_rawdata_121120_000.evio -PPLUGINS=monitoring_hists

Could you please share the exact command you used? This would help me reproduce the issue and investigate further.

@zihlmann
Copy link
Contributor

zihlmann commented Sep 3, 2024

using the new xml file (jana2 15.19.2) the command:
hd_root -PLUGINS=monitoring_hists /cache/halld/RunPeriod-2022-05/recon/ver01/REST/101534/dana_rest_101534_000.hddm

does run through without error. However, the hd_root.root output file is empty.

@aaust
Copy link
Contributor Author

aaust commented Sep 3, 2024

Sorry, there was one P missing:
hd_root -PPLUGINS=monitoring_hists /cache/halld/RunPeriod-2022-05/recon/ver01/REST/101534/dana_rest_101534_000.hddm

@nsjarvis
Copy link
Contributor

nsjarvis commented Sep 3, 2024

Referencing this: #435

@zihlmann
Copy link
Contributor

zihlmann commented Sep 3, 2024

ups, i should have seen that myself.
now I get the error similar to you:

DL1MCTrigger_factory: This program MUST be used with an HDDM file as input!
Default seeds will be used for the random generator

on another note: I still get the same error when trying to compile a new plugin under JANA2 with scons
[zihlmann@ifarm2401 TESTJANA2]$ scons
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
Compiling [JEventProcessor_TESTJANA2.cc]
In file included from JEventProcessor_TESTJANA2.cc:8:
JEventProcessor_TESTJANA2.h:13:61: error: expected class-name before ‘{’ token
13 | class JEventProcessor_TESTJANA2:public jana::JEventProcessor{
| ^
JEventProcessor_TESTJANA2.h:20:17: error: ‘jerror_t’ does not name a type; did you mean ‘error_t’?
20 | jerror_t init(void); ///< Called once at program start.
| ^~~~~~~~
| error_t
JEventProcessor_TESTJANA2.h:21:17: error: ‘jerror_t’ does not name a type; did you mean ‘error_t’?
21 | jerror_t brun(jana::JEventLoop *eventLoop, int32_t runnumber); ///< Called everytime a new run number is detected.
| ^~~~~~~~
| error_t

@RaiqaRasool
Copy link

The tests I performed before were using .evio files as input to monitoring_hists and I never got this error during those tests. However, I encountered the error mentioned above when running it with an .hddm file, so I’ll need to investigate this further. Thanks for sharing the command.

As for the error you're seeing when trying to compile a new plugin under JANA2 with scons, this is happening because the mkplugin scripts haven’t been updated yet for JANA2. We're in the final stages of updating them, but we’re holding off on completing this until we finalize the documentation on the syntax changes from JANA1 to JANA2. This way, we can ensure any additional updates required for the code generated by these scripts are included. I’ll do my best to have this ready before the next GlueX meeting. I apologize for the inconvenience.

@aaust
Copy link
Contributor Author

aaust commented Sep 5, 2024

Thank you! I repeated your test with hd_rawdata_121120_000.evio. I processed the file with 12 threads with JANA1 (-PNTHREADS=12) which resulted in 120Hz. On the other hand, I was not able to use JANA2 in multithreaded mode so I only got a 14Hz average.

@aaust
Copy link
Contributor Author

aaust commented Sep 5, 2024

Here is a link to a pdf of all monitoring histograms for Jana1 and Jana2 for this raw data file:
https://halldweb.jlab.org/talks/2024/jana2_comparison_independent_121120_000.pdf
Besides minor differences in the pulls (pages 96-113), I see a few other issues:

  • SC matching (pages 127-130)
  • TOF matching (pages 154-157, 161, 163, 164, 165, 180, 181)
  • FCAL matching (pages 172-175)

Are these differences that are expected due to the change of the Get() function? Some of the look quite drastic.

@RaiqaRasool
Copy link

Thank you very much for sharing the results in PDF format. Would it be possible for you to build a version of JANA2 that mimics JANA1 in terms of Get() function behavior by using rasool_jana2_mimicking_jana1 branch instead of rasool_jana2? If feasible, could you also generate the same set of results you provided in the PDF using that version? This would help us quickly identify whether the observed differences are solely due to the Get() function.

I can certainly handle this myself, but I don't have the script you used to generate these results. If it wouldn’t be too much trouble, could you assist with this?

In the meantime, I’m actively investigating the issue with:

DL1MCTrigger_factory: This program MUST be used with an HDDM file as input!
Default seeds will be used for the random generator

and I'm close to pinpointing the cause. Additionally, I'm also looking into why multithreading isn't functioning properly for .evio files in the context of monitoring histograms.

@sdobbs
Copy link
Contributor

sdobbs commented Sep 6, 2024 via email

@RaiqaRasool
Copy link

Thanks for pointing that out, Sean. I searched for DL1MCTrigger in the PRs and commits on the master branch but couldn't find anything. Could you share the specific PRs where this was addressed?

I last rebased the rasool_jana2 branch with the master branch on August 12th.

@sdobbs
Copy link
Contributor

sdobbs commented Sep 6, 2024 via email

@RaiqaRasool
Copy link

Thanks, Sean. I reviewed the PR and found the changes made in DTrigger_factory.cc.

@aaust
Copy link
Contributor Author

aaust commented Sep 9, 2024

@RaiqaRasool I repeated the comparison with the branch rasool_jana2_mimicking_jana1 (version_5.19.3_jana2.xml). Indeed, most differences disappeared or became very small. Only the difference in the dead zone of the 2-dimensional TOF matching histograms (pages 161, 163, 230, 231) remain striking. It could still be that your halld_recon branch is not 100% aligned with what I was using for comparison (https://github.com/jeffersonlab/halld_recon/releases/tag/4.49.0), so I don't worry too much about it.
Here is a link to the pdf file:
https://halldweb.jlab.org/talks/2024/jana2_comparison_independent_121120_000_mimicking_jana1.pdf

RaiqaRasool added a commit that referenced this issue Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants