Tracing cleanup #7168

macladson · 2025-03-18T14:54:24Z

Issue Addressed

#7153
#7146
#7147
#7148 -> Thanks to @ackintosh

Proposed Changes

This PR does the following:

Disable logging to file when using either --logfile-max-number 0 or --logfile-max-size 0. Note that disabling the log file in this way will also disable discv5 and libp2p logging.
discv5 and libp2p logging will be disabled by default unless running beacon_node or boot_node. This also should fix the VC panic we were seeing.
Removes log rotation and compression from libp2p and discv5 logs. It is now limited to 1 file and will rotate based on the value of the --logfile-max-size flag. We could potentially add flags specifically to control the size/number of these, however I felt a single log file was sufficient. Perhaps @AgeManning has opinions about this?
Removes all dependency logging and references to dep_log.
Introduces workspace filtering to file and stdout. This explicitly allows logs from members of the Lighthouse workspace, disallowing all others. It uses a proc macro which pulls the member list from cargo metadata at compile time. This might be over-engineered but my hope is that this list will not require maintenance.
Unifies file and stdout JSON format. With slog, the formats were slightly different. @ThreeHrSleep worked to maintain that format difference, to ensure there was no breaking changes. If these format differences are actually problematic we can restore it, however I felt the added complexity wasn't worth it.
General code improvements and cleanup.

Outstanding Issues

This does not fix certain debug logs related to the sync span being included in INFO logs. I suspect our spans will need to be reworked to fix this.
Trying to set --debug-level crit or logfile-debug-level crit will still fail. (Fix available) -> Remove crit as an option from the CLI entirely #7169
Logfile permission issues: Logfiles do not restrict permissions after rotation #7170 libp2p and discv5 logfiles are not created with restricted permissions #7171

ackintosh · 2025-03-20T22:45:28Z

@macladson I've filed a PR to fix missing libp2p(gossipsub) logs. macladson#4 Please have a look when you have time.

Fix missing gossipsub logs

macladson · 2025-03-21T02:09:01Z

@macladson I've filed a PR to fix missing libp2p(gossipsub) logs. macladson#4 Please have a look when you have time.

Amazing, thank you so much for noticing this! ❤️

ackintosh · 2025-03-21T08:56:31Z

Additionally, I noticed that the --logfile-debug-level option does not affect to discv5.log or libp2p.log. Currently both logs contain TRACE logs so the file size increases very quickly.

I've created macladson#5 to fix this. 🙏

michaelsproul · 2025-03-23T22:31:40Z

Removed the v7.0.0 label as we aren't releasing v7.0.0 from unstable, so it won't include tracing at all

mergify · 2025-03-23T22:31:46Z

This pull request has merge conflicts. Could you please resolve them @macladson? 🙏

AgeManning · 2025-03-24T01:48:06Z

Single log file is fine with me :)

macladson · 2025-03-24T01:59:54Z

Additionally, I noticed that the --logfile-debug-level option does not affect to discv5.log or libp2p.log. Currently both logs contain TRACE logs so the file size increases very quickly.

I wonder if defaulting to DEBUG is actually fine here, then we can use RUST_LOG to control the verbosity of the discv5 and libp2p logs? Seems to me that --logfile-debug-level info (or higher) would yield mostly empty discv5/libp2p logfiles anyway, whereas --logfile-debug-level info is perfectly sensible for beacon.log. I agree that we should not default to TRACE though

common/lighthouse_macros/Cargo.toml

jimmygchen

Hey @macladson

Thanks for the clean up and fixing up the issues! 🙏

I've added some comments, let me know your thoughts.

With the default debug-level and logfile-debug-level, I noticed a large amount of trace level logs in the discv5.log and libp2p.log files. @ackintosh has noticed this and created a fix for this (see his comment above). However, even debug level logs for these are quite huge, and libp2p and gossip log files were INFO level by default previously (different to beacon.log), because they're very noisy, and we had to use the RUST_LOG env var to enable deps debug logs. I think we might want to maintain this behaviour due to rate of file size growth and disk IO?

lighthouse/environment/src/lib.rs

common/lighthouse_macros/Cargo.toml

Cargo.toml

lighthouse/environment/src/tracing_common.rs

common/lighthouse_macros/src/lib.rs

common/lighthouse_macros/Cargo.toml

common/logging/src/tracing_libp2p_discv5_logging_layer.rs

jimmygchen · 2025-03-31T05:57:22Z

common/logging/src/tracing_libp2p_discv5_logging_layer.rs

+pub fn create_libp2p_discv5_tracing_layer(
+    base_tracing_log_path: Option<PathBuf>,
+    max_log_size: u64,
+) -> Option<Libp2pDiscv5TracingLayer> {


The API for this function can be simplified - we know base_tracing_log_path when calling this function, so this can be simplified by removing Option here and only call this function if the log path is Some.

common/logging/src/tracing_libp2p_discv5_logging_layer.rs

common/logging/src/lib.rs

lighthouse/src/main.rs

macladson · 2025-03-31T08:31:38Z

Thanks for the review, I'll fix these up soon!

However, even debug level logs for these are quite huge, and libp2p and gossip log files were INFO level by default previously (different to beacon.log), because they're very noisy, and we had to use the RUST_LOG env var to enable deps debug logs. I think we might want to maintain this behaviour due to rate of file size growth and disk IO?

I think the main thing to consider is what the most useful information in these logs is. It could be the case that INFO logs generally don't provide anything worthwhile for debugging and so we have to set RUST_LOG=debug manually after discovering some issue. Seems to me this friction could be avoided if we logged at debug in the first place. I'm not convinced the old behaviour was better but I'm also not really in a position to make this determination.

As an aside about file size growth and IO, my understanding is that database read/writes dwarf disk IO to the point where logging is pretty insignificant. And the files themselves are each limited to 10MB so there's no concern about unbounded file growth.

jimmygchen · 2025-03-31T12:33:22Z

common/logging/src/tracing_libp2p_discv5_logging_layer.rs

+    fn on_event(&self, event: &tracing::Event<'_>, _ctx: Context<S>) {
+        let meta = event.metadata();
+        let log_level = meta.level();
+        let timestamp = Local::now().format("%Y-%m-%d %H:%M:%S").to_string();


We currently use this approach Local::now().format() for formatting timestamp in 3 places in logging, and this format function seems relatively expensive - it adds up significantly given it gets called very frequently.

I did a brief profiling and it showed that ~11.44% of total CPU time is spent in chrono formatting functions, if I calculated correctly:

$ grep chrono lighthouse_2025-03-31-182111.collapsed | wc -l 1426 $ wc -l lighthouse_2025-03-31-182111.collapsed 12468 lighthouse_2025-03-31-182111.collapsed

Might be worth optimising a bit here, e.g. consider alternatives, e.g. StrftimeItems or the time crate.

Happy to create a separate issue to address this, given this PR already fixed a bunch of things and would be good to merge soon.

I made an attempt using StrftimeItems and it's showing much better results (~3.58% overhead) - although still quite high just for logging timestamps.
macladson#6

I wonder how much of that impact is due to using Utc::now vs using StrftimeItems. It would be nice if we could keep local time timestamps. From memory I believe this has been a point of feedback from users in the past

Oh yeah, I reckon most of the impact is from switching to StrftimeItems, DateTime::format seems to be the more expensive operation (The former is recommended in thechrono repo over DateTime::format for performance-critical applications). I tried using Utc::now to avoid timezone conversion to see how much improvements we can get.

IMO local timezone is useful for users (mostly consuming stdout rather than the debug files), and debug files is mostly for us - in that case I think UTC would be less confusing - but yeah don't have a strong opinion on timezone. I think it would be useful to have a summary of logging changes from tracing somewhere, e.g. default logging level change, timestamp logging, CLI flags etc

Created #7232 to address this.

jimmygchen

Hey @macladson ,

Thanks for the updates. It's looking much cleaner now!

I'm keen to merge this in soon and happy to leave the remaining comments if you prefer to address them separately, since none of them were introduced from your PR?

mergify · 2025-04-01T05:20:01Z

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

☑️ test-suite-success

You may have to fix your CI before adding the pull request to the queue again.
If you update this pull request, to fix the CI, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

jimmygchen · 2025-04-01T06:18:14Z

@mergify dequeue

mergify · 2025-04-01T06:18:25Z

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #7168 has been dequeued by a dequeue command

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.
If you do update this pull request, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

mergify · 2025-04-01T06:18:26Z

dequeue

✅ The pull request has been removed from the queue `default`

jimmygchen · 2025-04-01T06:40:43Z

Waiting for #7235 to be merged to unstable to unblock CI.

jimmygchen · 2025-04-01T09:37:07Z

@mergify queue

mergify · 2025-04-01T09:37:18Z

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 4839ed6

Cleanup

d063422

macladson added ready-for-review The code is ready for review UX-and-logs v7.0.0 New release c. Q1 2025 labels Mar 18, 2025

macladson requested a review from jimmygchen March 18, 2025 14:54

macladson changed the title ~~Cleanup~~ Tracing cleanup Mar 18, 2025

macladson added 2 commits March 19, 2025 01:55

cargo sort

ab8b8cf

cargo udeps

1c5043b

ThreeHrSleep mentioned this pull request Mar 19, 2025

Disable discv5 & libp2p file logging when running VC #7163

Closed

Fix target name for gossipsub

9d08e77

ackintosh mentioned this pull request Mar 20, 2025

Fix missing gossipsub logs macladson/lighthouse#4

Merged

Merge pull request #4 from ackintosh/tracing-cleanup-fix-gossipsub

ed693db

Fix missing gossipsub logs

michaelsproul added v7.1.0 Post-Electra release and removed v7.0.0 New release c. Q1 2025 labels Mar 23, 2025

Merge branch 'unstable' into tracing-cleanup

581c788

jimmygchen reviewed Mar 26, 2025

View reviewed changes

common/lighthouse_macros/Cargo.toml Outdated Show resolved Hide resolved

jimmygchen reviewed Mar 31, 2025

View reviewed changes

Rename macro and use EnvFilter for dep logging

6f5d316

macladson commented Mar 31, 2025

View reviewed changes

lighthouse/src/main.rs Show resolved Hide resolved

macladson added 3 commits March 31, 2025 20:46

Visibility and macro fixes

6e48dd4

Add docs and fix typo

764f0bb

Formatting

ef89437

jimmygchen reviewed Mar 31, 2025

View reviewed changes

jimmygchen mentioned this pull request Mar 31, 2025

Try different date time formatting method macladson/lighthouse#6

Closed

jimmygchen approved these changes Apr 1, 2025

View reviewed changes

jimmygchen added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Apr 1, 2025

jimmygchen mentioned this pull request Apr 1, 2025

Optimise tracing timestamp formatting to reduce logging overhead #7232

Closed

jimmygchen added ready-for-merge This PR is ready to merge. and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Apr 1, 2025

mergify bot added a commit that referenced this pull request Apr 1, 2025

Merge of #7168

c1c4b0e

mergify bot mentioned this pull request Apr 1, 2025

merge queue: embarking unstable (bde0f1e) and #7168 together #7233

Closed

6 tasks

mergify bot added a commit that referenced this pull request Apr 1, 2025

Merge of #7168

7166bca

mergify bot mentioned this pull request Apr 1, 2025

merge queue: embarking unstable (bde0f1e) and #7168 together #7236

Closed

6 tasks

jimmygchen mentioned this pull request Apr 1, 2025

Fix panic when running VC #7140

Closed

Merge branch 'unstable' into tracing-cleanup

d3278bc

mergify bot merged commit 4839ed6 into sigp:unstable Apr 1, 2025
31 checks passed

Tracing cleanup #7168

Tracing cleanup #7168

Uh oh!

Conversation

macladson commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

Proposed Changes

Outstanding Issues

Uh oh!

ackintosh commented Mar 20, 2025

Uh oh!

macladson commented Mar 21, 2025

Uh oh!

ackintosh commented Mar 21, 2025

Uh oh!

michaelsproul commented Mar 23, 2025

Uh oh!

mergify bot commented Mar 23, 2025

Uh oh!

AgeManning commented Mar 24, 2025

Uh oh!

macladson commented Mar 24, 2025

Uh oh!

Uh oh!

jimmygchen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jimmygchen Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macladson commented Mar 31, 2025

Uh oh!

jimmygchen Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

jimmygchen Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

jimmygchen Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

macladson Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

jimmygchen Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

jimmygchen Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

jimmygchen left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Apr 1, 2025

Uh oh!

jimmygchen commented Apr 1, 2025

Uh oh!

mergify bot commented Apr 1, 2025

Uh oh!

mergify bot commented Apr 1, 2025

✅ The pull request has been removed from the queue default

Uh oh!

jimmygchen commented Apr 1, 2025

Uh oh!

jimmygchen commented Apr 1, 2025

Uh oh!

mergify bot commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

macladson commented Mar 18, 2025 •

edited

Loading

jimmygchen Mar 31, 2025 •

edited

Loading

✅ The pull request has been removed from the queue `default`

mergify bot commented Apr 1, 2025 •

edited

Loading