Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async logging using MDC #3083

Open
felixjulianheitmann opened this issue May 2, 2024 · 6 comments
Open

Async logging using MDC #3083

felixjulianheitmann opened this issue May 2, 2024 · 6 comments

Comments

@felixjulianheitmann
Copy link

With #2907 it became possible to log additional key-value formatted attributes with a message.

We like the idea could use something like this for a first step in the direction of structured logging.

The approach is a little limited, though. In our app, we use multiple different async loggers. The PR #2907 uses thread local storage to store the key-value pairs.

This means, multi-threaded systems get different messages depending on which thread logs the actual message. Async logging is not possible this way as noted by the original author.

Additionally, even in a single-threaded application multiple loggers would all be logging the same attributes as they all query the same global context variable. This is not ideal - at least not for our use-case.

To allow async logging, we would need to make the context variable not a thread_local static variable but an actual global static variable:

static thread_local mdc_map_t context;
return context;

Additionally, of course all access would need to be secured with a mutex. With a static class member however, we would need a source file to define it in (at least in C++11).

That only solves the async logging. Not the global state across all loggers.

To solve the second issue, we would need to make the mdc class/context a member of the mdc_formatter.

class mdc_formatter : public flag_formatter {

At that point we are already instantiating the mdc class and don't need to resort to static and thread_local variables. We could just make the context a regular member that gets initialized on creation.

This is a breaking change to the previous behavior, however.

What I am getting at is the following:

  • do you think it's work investigating this further?
  • how would we deal with backwards compatibility?
    • runtime options: incur a runtime overhead
    • compile time options: incur complexity during testing / conditional builds

I am pretty confident, I can implement this and make a PR but I would rather get your blessing about how you want me to approach this, beforehand.

@felixjulianheitmann
Copy link
Author

felixjulianheitmann commented May 2, 2024

An MWE which unfortunately, doesn't compile on godbolt is this:

#include <spdlog/spdlog.h>
#include <spdlog/sinks/stdout_sinks.h>
#include <spdlog/mdc.h>

int main()
{
    auto create_logger = [](auto name) {
        auto logger = std::make_shared<spdlog::logger>(
            name,
            std::make_shared<spdlog::sinks::stdout_sink_st>()
        );
        logger->set_pattern("[%H:%M:%S %z] [%^%L%$] [%&] %v");
        logger->info("Logger created");
        return logger;
    };

    auto log_a = create_logger("log_a");
    auto log_b = create_logger("log_b");

    spdlog::mdc::put("my_key", "my_value");

    log_a->info("Hello World from log_a");
    log_b->info("Hello World from log_b");
}

It produces the following output:

[15:02:46 +02:00] [I] [] Logger created
[15:02:46 +02:00] [I] [] Logger created
[15:02:46 +02:00] [I] [my_key:my_value] Hello World from log_a
[15:02:46 +02:00] [I] [my_key:my_value] Hello World from log_b

@tt4g
Copy link
Contributor

tt4g commented May 2, 2024

You should decide if MDC will be copied when another thread is spawned from one thread.
When a new thread is spawned, the MDC of the new thread will be in one of the following states:

  • MDC is copied from the original thread.
  • MDC is empty It is not a copy of the original thread.

It should be necessary to decide whether to implement both of these two behaviors or only one of them.

@felixjulianheitmann
Copy link
Author

I don't think MDC should be linked to the thread at all.

Maybe, I am misunderstanding the feature but I would argue it's much more desirable to have attributes associated to logger objects than to threads.

If I were to implement it from scratch, I would ditch thread_local storage and just go for regular members. When a new thread is spawned it's accessing the same variables as the original thread. Obviously, these variables would have to be locked behind a mutex.

Having thread specific variables loses it's gain as soon as you create a thread-pool.
When using async loggers, I need to allocate a thread pool for them with spdlog::init_thread_pool(...).

I haven't looked into the implementation but it feels like none of the contextual attributes I would set within the business logic would ever make it to any of the sinks when using async loggers and thread_local storage.

So my approach for a re-implementation would be:

  • make the MDC map containing the key-values a regular member variable of spdlog::mdc
  • make spdlog::mdc a member of the spdlog::logger
  • provide functions like add_mdc/remove_mdc/get_mdc to spdlog::logger
  • have the logger format these during logging if the pattern requires it

It's difficult however, to keep the current behavior of using thread local contexts while implementing it this way.

I am not sure yet but I would maybe try to find some runtime function that switches from current behavior to the one, I am trying to describe. I want to keep backwards compatibility and have the current behavior be the default to not break user-code.

@tt4g
Copy link
Contributor

tt4g commented May 3, 2024

First, I am not a native speaker and may write in hard-to-read English. Please point out any parts that do not make sense to you.

I understand your point of view.
The implementation of MDCs being owned by threads has been found to be problematic in frameworks that sequentially execute dependent tasks in a thread pool, where MDCs tied to threads cannot propagate data correctly.
This is because the tasks executed by the thread pool are interchanged, and if the MDC of the immediately preceding task is inherited, the wrong MDC will be referenced.

The best way to solve this problem is not yet available to my knowledge, but many tools and frameworks often take the solution of providing a context that is tied to a task or thread, and the MDC is owned by the context.
Many developers want to share loggers with threads and tasks, but MDC is independent data in units of tasks and threads.
So there are many frameworks that provide contexts as independent data repositories on a per-thread or per-task basis.
This is a more developer-friendly implementation than having MDCs owned by something like a logger that is shared by multiple threads or tasks.
However, this is possible because the framework provides a mechanism for transparently switching the MDC that the logger refers to when switching threads or tasks.

I did not initially offer this as an opinion because I thought it would be very difficult to provide this implementation and would be well beyond the scope of the logging functionality provided by spdlog.

@gabime
Copy link
Owner

gabime commented May 3, 2024

Current implementation is just the simplest we could think of without breaking the v1.x API and without hurting the performance.

The intention is to explore this approach and revise it in the v2.x branch once we have a deeper understanding and gather user feedback.

@felixjulianheitmann you can open a pr in branch v2.x if you feel you have better solution. Please make sure to make it is as simple as possible as it is easy to over complicate this.

@felixjulianheitmann
Copy link
Author

Hi, thanks for the blessing, I will give it a try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants