-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read-ahead or concurrent fetching? #13
Comments
Are you able to draw a waterfall graph of each file that needed to be fetched or provide some details as to the size of files and the relative time to first byte? There are many ways to pre-compute parts, yes this would include using an intermediary layer to concurrently prepare some responses etc |
I think this reproduces the problem in a reasonable experimental way, a livebook: Packmatic starvationMix.install([:packmatic, :kino, :kino_vega_lite])
alias VegaLite, as: Vl Sectiondelay = Kino.Input.number("Delay, ms", default: 300) |> Kino.render()
file_size = Kino.Input.number("File size, kb", default: 512) |> Kino.render()
entry_count = Kino.Input.number("Files", default: 200) chart =
Vl.new(width: 400, height: 400)
|> Vl.mark(:line)
|> Vl.encode_field(:x, "x", type: :quantitative)
|> Vl.encode_field(:y, "y", type: :quantitative)
|> Kino.VegaLite.new() t1 = System.os_time(:millisecond)
log_event = fn event ->
t2 = System.os_time(:millisecond)
offset = t2 - t1
case event do
%Packmatic.Event.EntryUpdated{stream_bytes_emitted: bytes} ->
seconds = offset / 1000
if seconds > 0 do
kb_per_second = bytes / 1024 / seconds
IO.inspect(kb_per_second, label: "kb/s")
point = %{x: seconds, y: kb_per_second}
Kino.VegaLite.push(chart, point)
end
%Packmatic.Event.EntryCompleted{} ->
IO.inspect(event)
_ ->
nil
end
:ok
end latency = Kino.Input.read(delay)
size = Kino.Input.read(file_size)
small_remote_file = fn ->
# Overhead latency for request
:timer.sleep(latency)
# size 512 kb
{:ok, {:random, size * 1024}}
end count = Kino.Input.read(entry_count)
entries =
1..count
|> Enum.map(fn num ->
[
source: {:dynamic, small_remote_file},
path: "#{num}.txt"
]
end)
{t, _} =
:timer.tc(fn ->
entries
|> Packmatic.build_stream(on_event: log_event)
|> Stream.run()
end)
IO.inspect(t / 1000, label: "took ms")
IO.inspect(count * latency, label: "entries * delay, ms")
|
Pasting a livebook in github is kinda weird :D |
@lawik Revisiting the problem there are some solutions around this
It would depend on: Whether the sources are on different hosts that resolve to different IP/port pairs which would require separate connections Whether the individual files are large or small Etc There is also another solution which is to keep the encoding entries hot-addable so you have a producer and the consumer just goes on and on until it gets an end message. Then the intermediary layer can be added |
I no longer have the problem because we optimized away the need for about 7000 files and suddenly things are quite snappy. The last option you mention would let the developer determine their own level of look-ahead. I assume this could be modeled as a a stream of entries instead of a finalized list? |
I have an archive that has a lot of small files and while I haven't measured to confirm I'm pretty sure the streaming is slowing down a lot (download drops from multiple Mb/s to a few Kb/s) as the overhead/latency of each file fetch is more significant than the transfer time.
Thousands of files in this case.
It does complete eventually but it would be neat to be able to ask Packmatic to buffer at least X bytes forward beyond the current need or similar.
Is there a way to do it that I've missed or would this be a good addition in your eyes?
The text was updated successfully, but these errors were encountered: