- Added Core-hours and Heap-hourse allocation info to report
- Added new JSON report format that can be used apart from HTML
- Added sending diagnostics data
- New warnings added:
- Data Spill from memory to disk warning
- Long time spent in Garbage Collection warning
- Logging enhancements:
- Logging level can now be set to DEBUG/INFO/WARN/ERROR
- Log file can now be written to a different directory than html report
- Log file can now be written to a different filesystem than html report
- Full package and class name added to log format
- Replace stage chart with number of tasks vs CPU capacity chart
- Remove SparkScope logs from HTML report
- SparkScope logs are now contained in separate .log file
- Charts data points limitation:
- All charts have maximum number of rendered datapoints
- When number of datapoints exceeds the limit, chart values are interpolated
- EventLog prefiltering:
- EventLog context loader prefilters events before parsing them to json and applying filtering
- Metrics spill refactor
- we are now spilling single merged csv file per instance containing all metrics instead of 5 separate files(heap used, heap used in %, heap size, non-heap used, cpuTime)
- local&hadoop metrics are grouped by appName+appId on storage(just like s3 metrics)
- All sinks/reporters rewritten from Java to Scala
- New charts
- chart of stages in time with number of tasks per stage
- chart of active executors in time
- seperate memory charts for each executor instead of aggregated
- UI enhancements
- appName field added to application summary
- added executor and driver memoryOverheads to charts
- S3 compatibility
- spilling metrics to s3 (SparkScopeCsvSink will spill metrics to s3 for directory starting with s3/s3a/s3n prefix)
- analyzing metrics spilled to s3 (SparkScopeListener can read metrics from s3)
- running offline on eventlog stored in s3 (SparkScopeApp can run offline on eventlog stored in s3)
- saving html reports to s3
- Add custom SparkScopeCsvSink
- hdfs:/ and maprfs:/ directories are treated as hadoop directories and are handled by HdfsCsvReporter
- file:/ and other directories are treated as local and are handled by LocalCsvReporter
- sink now dumps only useful metrics:
- 5 executor metrics: jvm.heap.used, jvm.heap.usage, jvm.heap.max, jvm.non-heap.used, executor.cpuTime
- 4 driver metrics: jvm.heap.used, jvm.heap.usage, jvm.heap.max, jvm.non-heap.used
- fixed calculation of wasted CPU/Memory
- Create SparkScopeApp with CLI to run SparkScope as standalone app
- reads application context, spark conf and events from eventLog
- can be run on finished or running spark application
Initial SparkScope-spark3 release:
- Compatible:
- JDK: 8/11/17
- Spark: 3.2/3.3/3.4/3.5
- Charts:
- heap & non-heap usage charts
- cpu utilization charts
- charts for driver, executors and aggregated charts for whole application
- Stats:
- heap & non-heap usage stats
- cpu utilization and memory utlization stats
- stats for driver, executors and aggregated stats for whole application
- CPU and Heap Memory Waste stats