Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48059][CORE] Implement the structured log framework on the java side #46301

Closed
wants to merge 8 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Apr 30, 2024

What changes were proposed in this pull request?

The pr aims to implement the structured log framework on the java side.

Why are the changes needed?

Currently, the structured log framework on the scala side is basically available, but theSpark Core code also includes some Java code, which also needs to be connected to the structured log framework.

Does this PR introduce any user-facing change?

Yes, only for developers.

How was this patch tested?

  • Add some new UT.
  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

@panbingkun
Copy link
Contributor Author

@gengliangwang
In the migration of logs to the structured log framework, I found that we missed the corresponding implementation on the java side. This PR is to supplement it.

@panbingkun
Copy link
Contributor Author

panbingkun commented Apr 30, 2024

We currently support three ways to write logs:

  • loggger.{error/warn/info/debug/trace}(string)
  • loggger.{error/warn/info/debug/trace}(string, throwable)
  • loggger.{error/warn/info/debug/trace}({string template with {} inside}, JMDC...), eg:
    logger.info("Executor {} failed", JMDC(EXECUTOR_ID, "xxxx")

After this pr, we can prohibit importing the following two classes in the java code:

  • import org.slf4j.Logger;
  • import org.slf4j.LoggerFactory;

@gengliangwang
Copy link
Member

@panbingkun Thanks for bringing this up.
I check the current Spark java code:

find . -name "*.java"|xargs grep "logInfo\|logWarn\|logError"|grep -v target|grep -v test

and found no log4j usages. Is this for external systems? If yes, the external system should be able to use the current Logging trait as well, right?

@panbingkun
Copy link
Contributor Author

@panbingkun Thanks for bringing this up. I check the current Spark java code:

find . -name "*.java"|xargs grep "logInfo\|logWarn\|logError"|grep -v target|grep -v test

and found no log4j usages. Is this for external systems? If yes, the external system should be able to use the current Logging trait as well, right?

@gengliangwang
Hmm..., let me give you some examples, eg:

logger.info(
"{} bytes of memory were used by task {} but are not associated with specific consumers",
memoryNotAccountedFor, taskAttemptId);
logger.info(
"{} bytes of memory are used for execution and {} bytes of memory are used for storage",
memoryManager.executionMemoryUsed(), memoryManager.storageMemoryUsed());

logger.error("Was unable to delete spill file {}", file.getAbsolutePath());

LOGGER.warn("Unable to report histogram {}", sanitizedName, e);

image

@gengliangwang
Copy link
Member

@panbingkun I see. There are about 88 loggings with variables

find . -name "*.java"|xargs grep -i "logger.info\|logger.warn\|logger.error"|grep "{}"| grep -v target|grep -v test

Since all the logs are using {} as variable placeholders, shall we try following the approach Sequence of Key-Value Pairs (Seq[(K, V)]) from the SPIP?
Also, I think we can reference the LogKey.scala in the java side.

@panbingkun
Copy link
Contributor Author

@panbingkun I see. There are about 88 loggings with variables

find . -name "*.java"|xargs grep -i "logger.info\|logger.warn\|logger.error"|grep "{}"| grep -v target|grep -v test

Since all the logs are using {} as variable placeholders, shall we try following the approach Sequence of Key-Value Pairs (Seq[(K, V)]) from the SPIP? Also, I think we can reference the LogKey.scala in the java side.

I think this is a good suggestion. Let me try it.

}
}

public void error(String msg, MDC... mdcs) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the following design is shown in the document,
image
But I think the parameter void error(String msg, MDC... mdcs) is more in line with the style of Java? WDYT @gengliangwang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine to me

@panbingkun panbingkun marked this pull request as ready for review May 2, 2024 14:31
@panbingkun
Copy link
Contributor Author

@gengliangwang
This pr is ready for review.

if (mdcs == null || mdcs.length == 0) {
slf4jLogger.warn(msg);
} else {
withLogContext(msg, mdcs, null, mt -> slf4jLogger.warn(mt.message));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check the log level before calling withLogContext. For example, if the log level is ERROR, we don't need to enter the method withLogContext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import org.apache.spark.internal.LogKeys;
import org.apache.spark.internal.MDC;

public abstract class LoggerSuiteBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ideally, we can have a followup to reduce the duplicated test code.

@gengliangwang
Copy link
Member

Thanks, merging to master

@panbingkun
Copy link
Contributor Author

Thanks, merging to master

Thank you very much.
Based on this, I will do log migration and auxiliary work on the java side.
(eg, after the migration is completed, we can add a rule in checkstyle that prohibits importing org.slf4j.Logger and org.slf4j.LoggerFactory)

@gengliangwang
Copy link
Member

@panbingkun Awesome, thanks for the work!

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…a side

### What changes were proposed in this pull request?
The pr aims to implement the structured log framework on the `java side`.

### Why are the changes needed?
Currently, the structured log framework on the `scala side` is basically available, but the`Spark Core` code also includes some `Java code`, which also needs to be connected to the structured log framework.

### Does this PR introduce _any_ user-facing change?
Yes, only for developers.

### How was this patch tested?
- Add some new UT.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#46301 from panbingkun/structured_logger_java.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
gengliangwang added a commit that referenced this pull request May 15, 2024
### What changes were proposed in this pull request?

Two new classes `org.apache.spark.internal.Logger` and `org.apache.spark.internal.LoggerFactory` were introduced from #46301.
Given that Logger is a widely recognized **interface** in Log4j, it may lead to confusion to have a class with the same name. To avoid this and clarify its purpose within the Spark framework, I propose renaming `org.apache.spark.internal.Logger` to `org.apache.spark.internal.SparkLogger`. Similarly, to maintain consistency, `org.apache.spark.internal.LoggerFactory` should be renamed to `org.apache.spark.internal.SparkLoggerFactory`.

### Why are the changes needed?

To avoid naming confusion and clarify the java Spark logger purpose within the logging framework

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

GA tests
### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46600 from gengliangwang/refactorLogger.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants