Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some small printing upgrades #2344

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Oct 13, 2023

This is intended for FluxML/Fluxperimental.jl#20 but probably a good idea anyway. Motivating example is something like this (where bigger arrays will print pages of numbers):

julia> struct Tmp2; x; y; end; Flux.@functor Tmp2

julia> Chain(Tmp2([Dense(2,3), randn(3,4)'], (x=1:3, y=Dense(3,4), z=rand(3))))
Chain(
  Tmp2(
    Array(
      Dense(2 => 3),                    # 9 parameters
      [0.351978391016603 0.6408681372462821 -1.326533184688648; 0.09481930831795712 
1.430103476272605 0.7250467613675332; 2.03372151428719 -0.015879812799495713 
1.9499692162118236; -1.6346846180722918 -0.8364610153059454 -1.2907265737483433],  # 12 parameters
    ),
    NamedTuple(
      1:3,                              # 3 parameters
      Dense(3 => 4),                    # 16 parameters
      [0.9666158193429335, 0.01613900990539574, 0.0205920186127464],  # 3 parameters
    ),
  ),
)                   # Total: 7 arrays, 43 parameters, 644 bytes.

Notice that Array() and NamedTuple() aren't actually valid syntax. Also that it prints whole arrays if they aren't inside a layer the way it expects. After this PR:

julia> Chain(Tmp2([Dense(2,3), randn(3,4)'], (x=1:3, y=Dense(3,4), z=rand(3))))
Chain(
  Tmp2(
    [
      Dense(2 => 3),                    # 9 parameters
      4×3 Adjoint,                      # 12 parameters
    ],
    (;
      x = 3-element UnitRange,          # 3 parameters
      y = Dense(3 => 4),                # 16 parameters
      z = 3-element Array,              # 3 parameters
    ),
  ),
)                   # Total: 7 arrays, 43 parameters, 644 bytes.

@ToucheSir
Copy link
Member

Is having the summary instead of the actual contents important for @compact? If not, my preference would be to keep the more accurate printing but drop that part.

@mcabbott
Copy link
Member Author

mcabbott commented Oct 14, 2023

The idea is that printing the values of a huge matrix isn't helpful, but showing the size is. This is what the present printing of e.g. Dense achieves.

But @compact seems to often want to store the arrays themselves without some layer.

(It's not exactly summary as this gets quite long for e.g. CuArrays)

@ToucheSir
Copy link
Member

Revisiting this, it would be nice to retain some more info such as eltype if the contents aren't printed. Though it's far from perfect, could we avoid a lot of complexity by setting the :limit and :compact IOContext properties when showing these leaf nodes?

@mcabbott
Copy link
Member Author

mcabbott commented Nov 28, 2023

Latest commit keeps eltype as suggested (or really, the first type parameter):

julia> struct Tmp2; x; y; end; Flux.@functor Tmp2

julia> Chain(Tmp2([Dense(2,3), randn(3,4)'], (x=1:3, y=Dense(3,4), z=rand(3))))
Chain(
  Tmp2(
    [
      Dense(2 => 3),                    # 9 parameters
      4×3 Adjoint{Float64,...},         # 12 parameters
    ],
    (;
      x = 3-element UnitRange{Int64},   # 3 parameters
      y = Dense(3 => 4),                # 16 parameters
      z = 3-element Vector{Float64},    # 3 parameters
    ),
  ),
)                   # Total: 7 arrays, 43 parameters, 780 bytes.

Note also that no model using existing layers is affected by this. The goal is mainly to make @compact layers print more like the existing ones, by describing their parameter arrays, not printing the actual numbers.

@darsnack
Copy link
Member

darsnack commented Dec 1, 2023

We were discussing this during our call this week. The consensus was that using IOContext(..., :limit => true) is preferable to a custom summary for arrays. This would keep things succinct while still matching Base standards/expectations.

There are times when the type (e.g. Adjoint) is desired. This case might be better addressed by offering something like Flax's tabulate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants