You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the AtlasGenerator job, calculating statistics is a separate (optional) stage.
The idea is to replace this with a Spark Double Accumulator earlier in the flow. The Accumulator can still support the custom AtlasStatistics class and be optional.
This might improve the overall runtime of the statistics portion, as it would be done inline with Atlas creation. However, I don't have data to back up this assumption. Writing a task in case there is interest in streamlining this portion of the job.
The text was updated successfully, but these errors were encountered:
In the AtlasGenerator job, calculating statistics is a separate (optional) stage.
The idea is to replace this with a Spark Double Accumulator earlier in the flow. The Accumulator can still support the custom AtlasStatistics class and be optional.
This might improve the overall runtime of the statistics portion, as it would be done inline with Atlas creation. However, I don't have data to back up this assumption. Writing a task in case there is interest in streamlining this portion of the job.
The text was updated successfully, but these errors were encountered: