Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genie.Logger.initialize_logging causes slow Arrow writes #673

Open
poulter7 opened this issue Sep 14, 2023 · 5 comments
Open

Genie.Logger.initialize_logging causes slow Arrow writes #673

poulter7 opened this issue Sep 14, 2023 · 5 comments
Assignees

Comments

@poulter7
Copy link

poulter7 commented Sep 14, 2023

Describe the bug
I observe that after calling Genie.Logger.initialize_logging() invocations to Arrow.write() are much slower.

To reproduce
Source script:

# slow.jl
using Pkg
Pkg.add("Arrow")
Pkg.add("Genie")
Pkg.add("DataFrames")
using Arrow
using Genie
using DataFrames
if length(ARGS) > 0
    Genie.Logger.initialize_logging()
end
@time Arrow.write(IOBuffer(), DataFrame(rand(2000, 2000), :auto))
@time Arrow.write(IOBuffer(), DataFrame(rand(2000, 2000), :auto))
@time Arrow.write(IOBuffer(), DataFrame(rand(2000, 2000), :auto))

First invocation of the script:

julia ./src/demo/slow.jl
  4.969270 seconds (5.48 M allocations: 461.108 MiB, 3.13% gc time, 95.31% compilation time: 3% of which was recompilation)
  0.213863 seconds (89.38 k allocations: 107.136 MiB, 4.35% gc time)
  0.212987 seconds (89.38 k allocations: 107.136 MiB, 4.44% gc time)

Second invocation of the script:

julia ./src/demo/slow.jl 1
 17.421436 seconds (49.03 M allocations: 6.738 GiB, 3.36% gc time, 34.72% compilation time: 2% of which was recompilation)
 11.818701 seconds (42.00 M allocations: 6.284 GiB, 3.92% gc time)
 12.145096 seconds (42.00 M allocations: 6.275 GiB, 4.37% gc time)

Expected behavior
I'd not expect any performance effects on Arrow performance.

Additional context

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 1 on 64 virtual cores
Environment:
  JULIA_GPG = 3673DF529D9049477F76B37566E3C7DC03D6E495
  JULIA_VERSION = 1.9.3
  JULIA_PATH = /usr/local/julia

Package version:

pkg> st 
  [69666777] Arrow v2.6.2
  [a93c6f00] DataFrames v1.6.1
  [c43c736e] Genie v5.18.2
@poulter7
Copy link
Author

If I remove global_logger from the chain:
LoggingExtras.TeeLogger(LoggingExtras.MinLevelLogger(logger, Genie.config.log_level)) |> timestamp_logger |> global_logger

In: /root/.julia/packages/Genie/.../src/Logger.jl
I observe it speeds up again.

@essenciary
Copy link
Member

@poulter7 Wow, thanks, I'll take a look, very interesting.

@antholzer
Copy link
Contributor

The problem is that the created Logger always applies the timestamp_logger transformation even for debug messages. Arrow prints a lot of debug messages and is thus very slow.

@poulter7 can you check if the pr #693 fixes your issue.

@essenciary
Copy link
Member

I've merged #693 - let me know if all good and I'll tag a release.

@essenciary
Copy link
Member

It's being released now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants