-
Notifications
You must be signed in to change notification settings - Fork 880
Tracing cleanup #7168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracing cleanup #7168
Conversation
@macladson I've filed a PR to fix missing libp2p(gossipsub) logs. macladson#4 Please have a look when you have time. |
Fix missing gossipsub logs
Amazing, thank you so much for noticing this! ❤️ |
Additionally, I noticed that the I've created macladson#5 to fix this. 🙏 |
Removed the v7.0.0 label as we aren't releasing v7.0.0 from unstable, so it won't include tracing at all |
This pull request has merge conflicts. Could you please resolve them @macladson? 🙏 |
Single log file is fine with me :) |
I wonder if defaulting to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @macladson
Thanks for the clean up and fixing up the issues! 🙏
I've added some comments, let me know your thoughts.
With the default debug-level
and logfile-debug-level
, I noticed a large amount of trace level logs in the discv5.log
and libp2p.log
files. @ackintosh has noticed this and created a fix for this (see his comment above). However, even debug level logs for these are quite huge, and libp2p and gossip log files were INFO
level by default previously (different to beacon.log
), because they're very noisy, and we had to use the RUST_LOG
env var to enable deps debug logs. I think we might want to maintain this behaviour due to rate of file size growth and disk IO?
pub fn create_libp2p_discv5_tracing_layer( | ||
base_tracing_log_path: Option<PathBuf>, | ||
max_log_size: u64, | ||
) -> Option<Libp2pDiscv5TracingLayer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API for this function can be simplified - we know base_tracing_log_path
when calling this function, so this can be simplified by removing Option
here and only call this function if the log path is Some
.
Thanks for the review, I'll fix these up soon!
I think the main thing to consider is what the most useful information in these logs is. It could be the case that As an aside about file size growth and IO, my understanding is that database read/writes dwarf disk IO to the point where logging is pretty insignificant. And the files themselves are each limited to 10MB so there's no concern about unbounded file growth. |
fn on_event(&self, event: &tracing::Event<'_>, _ctx: Context<S>) { | ||
let meta = event.metadata(); | ||
let log_level = meta.level(); | ||
let timestamp = Local::now().format("%Y-%m-%d %H:%M:%S").to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently use this approach Local::now().format()
for formatting timestamp in 3 places in logging, and this format
function seems relatively expensive - it adds up significantly given it gets called very frequently.
I did a brief profiling and it showed that ~11.44% of total CPU time is spent in chrono
formatting functions, if I calculated correctly:
$ grep chrono lighthouse_2025-03-31-182111.collapsed | wc -l
1426
$ wc -l lighthouse_2025-03-31-182111.collapsed
12468 lighthouse_2025-03-31-182111.collapsed
Might be worth optimising a bit here, e.g. consider alternatives, e.g. StrftimeItems
or the time
crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to create a separate issue to address this, given this PR already fixed a bunch of things and would be good to merge soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made an attempt using StrftimeItems
and it's showing much better results (~3.58% overhead) - although still quite high just for logging timestamps.
macladson#6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how much of that impact is due to using Utc::now
vs using StrftimeItems
. It would be nice if we could keep local time timestamps. From memory I believe this has been a point of feedback from users in the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I reckon most of the impact is from switching to StrftimeItems
, DateTime::format
seems to be the more expensive operation (The former is recommended in thechrono
repo over DateTime::format
for performance-critical applications). I tried using Utc::now
to avoid timezone conversion to see how much improvements we can get.
IMO local timezone is useful for users (mostly consuming stdout rather than the debug files), and debug files is mostly for us - in that case I think UTC would be less confusing - but yeah don't have a strong opinion on timezone. I think it would be useful to have a summary of logging changes from tracing somewhere, e.g. default logging level change, timestamp logging, CLI flags etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #7232 to address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @macladson ,
Thanks for the updates. It's looking much cleaner now!
I'm keen to merge this in soon and happy to leave the remaining comments if you prefer to address them separately, since none of them were introduced from your PR?
This pull request has been removed from the queue for the following reason: The merge conditions cannot be satisfied due to failing checks: You may have to fix your CI before adding the pull request to the queue again. |
@mergify dequeue |
This pull request has been removed from the queue for the following reason: Pull request #7168 has been dequeued by a You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it. |
✅ The pull request has been removed from the queue
|
Waiting for #7235 to be merged to unstable to unblock CI. |
@mergify queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at 4839ed6 |
Issue Addressed
#7153
#7146
#7147
#7148 -> Thanks to @ackintosh
Proposed Changes
This PR does the following:
--logfile-max-number 0
or--logfile-max-size 0
. Note that disabling the log file in this way will also disablediscv5
andlibp2p
logging.discv5
andlibp2p
logging will be disabled by default unless runningbeacon_node
orboot_node
. This also should fix the VC panic we were seeing.libp2p
anddiscv5
logs. It is now limited to 1 file and will rotate based on the value of the--logfile-max-size
flag. We could potentially add flags specifically to control the size/number of these, however I felt a single log file was sufficient. Perhaps @AgeManning has opinions about this?dep_log
.Outstanding Issues
debug
logs related to thesync
span being included inINFO
logs. I suspect our spans will need to be reworked to fix this.--debug-level crit
orlogfile-debug-level crit
will still fail. (Fix available) -> Removecrit
as an option from the CLI entirely #7169libp2p
anddiscv5
logfiles are not created with restricted permissions #7171