Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config/index does not always preserve index correctly #957

Open
jarpy opened this issue Oct 1, 2019 · 3 comments
Open

config/index does not always preserve index correctly #957

jarpy opened this issue Oct 1, 2019 · 3 comments

Comments

@jarpy
Copy link
Contributor

jarpy commented Oct 1, 2019

The index function in config.clj doesn't always preserve the existing index as expected. In particular, when index is called multiple times within the same config, the index is replaced, which results in odd behaviour.

Given this config:

(info "Let binding index.")
(let [index (index)]
  (streams prn))

(info "Let binding index again.")
(let [index (index)]
  (streams prn))

(info "Calling (index).")
(index)

and adding some debug logs to the index function, we see this output:

INFO [2019-10-01 22:25:22,834] main - riemann.bin - Loading /home/jarpy/src/riemann/riemann/repro.config
INFO [2019-10-01 22:25:22,838] main - riemann.bin - PID 5181
INFO [2019-10-01 22:25:22,840] main - riemann.config - Let binding index.
INFO [2019-10-01 22:25:22,843] main - riemann.config - ❌ Index replaced.
INFO [2019-10-01 22:25:22,844] main - riemann.config - Let binding index again.
INFO [2019-10-01 22:25:22,844] main - riemann.config - ❌ Index replaced.
INFO [2019-10-01 22:25:22,845] main - riemann.config - Calling (index).
INFO [2019-10-01 22:25:22,845] main - riemann.config - ❌ Index replaced.
INFO [2019-10-01 22:25:22,850] main - riemann.core - Hyperspace core online

This causes weird behaviour with the index in any config where index is called (probably let-bound) multiple times. In particular, expiry gets messed up. With this config:

(instrumentation {:enabled? false})
(tcp-server  {:host "0.0.0.0"})
(periodically-expire 1)

(info "Let binding index.")
(let [index (index)]
  (streams
   (not-expired index prn)
   (expired prn)))

(info "Calling (index).")
(index)

when injecting an event with :ttl 1, we'd expect to print an expired event within 2 seconds, but it never arrives:

INFO [2019-10-01 22:29:17,943] main - riemann.bin - Loading /home/jarpy/src/riemann/riemann/repro.config
INFO [2019-10-01 22:29:17,948] main - riemann.bin - PID 6131
INFO [2019-10-01 22:29:17,969] main - riemann.config - Let binding index.
INFO [2019-10-01 22:29:17,973] main - riemann.config - ❌ Index replaced.
INFO [2019-10-01 22:29:17,975] main - riemann.config - Calling (index).
INFO [2019-10-01 22:29:17,975] main - riemann.config - ❌ Index replaced.
INFO [2019-10-01 22:29:18,065] clojure-agent-send-off-pool-4 - riemann.transport.tcp - TCP server 0.0.0.0 5555 online
INFO [2019-10-01 22:29:18,065] main - riemann.core - Hyperspace core online
#riemann.codec.Event{:host "host-one", :service "service-one", :state nil, :description nil, :metric nil, :tags nil, :time 1569932975, :ttl 1.0}
[...time passes]

Is this because the reaper is attached to a different index than the one that contains the event?

Partly, I think this is related to the behaviour of index vs its docstring:

"Set the index used by this core. Returns the index."

It looks like it technically sets the index used by the next core, and that change is not made concrete until something or someone calls for a core transition. Adding some more debug, it looks like (:index @core) remains nil even after several calls to (index):

INFO [2019-10-01 22:36:41,367] main - riemann.bin - Loading /home/jarpy/src/riemann/riemann/repro.config
INFO [2019-10-01 22:36:41,372] main - riemann.bin - PID 8433
INFO [2019-10-01 22:36:41,391] main - riemann.config - Let binding index.
INFO [2019-10-01 22:36:41,394] main - riemann.config - ❌ Index replaced.
INFO [2019-10-01 22:36:41,394] main - riemann.config - Class of index is: nil
INFO [2019-10-01 22:36:41,394] main - riemann.config - Class of index' is: class riemann.core$wrap_index$reify__11753
INFO [2019-10-01 22:36:41,396] main - riemann.config - Calling (index).
INFO [2019-10-01 22:36:41,396] main - riemann.config - ❌ Index replaced.
INFO [2019-10-01 22:36:41,396] main - riemann.config - Class of index is: nil
INFO [2019-10-01 22:36:41,396] main - riemann.config - Class of index' is: class riemann.core$wrap_index$reify__11753
INFO [2019-10-01 22:36:41,489] clojure-agent-send-off-pool-3 - riemann.transport.tcp - TCP server 0.0.0.0 5555 online
INFO [2019-10-01 22:36:41,490] main - riemann.core - Hyperspace core online

A work-around in the config is to manually "fix" the index into the core by calling a combination of index and apply! before doing anything serious:

(index)
(apply!)

(instrumentation {:enabled? false})
(tcp-server  {:host "0.0.0.0"})
(periodically-expire 1)

(info "Let binding index.")
(let [index (index)]
  (streams
   (not-expired index prn)
   (expired prn)))

(info "Calling (index).")
(index)

This config preserves the index for future calls to index, and expires the test event as expected:

INFO [2019-10-01 22:41:38,791] main - riemann.bin - Loading /home/jarpy/src/riemann/riemann/repro.config
INFO [2019-10-01 22:41:38,796] main - riemann.bin - PID 10259
INFO [2019-10-01 22:41:38,801] main - riemann.config - ❌ Index replaced.
INFO [2019-10-01 22:41:38,801] main - riemann.config - Class of index is: nil
INFO [2019-10-01 22:41:38,801] main - riemann.config - Class of index' is: class riemann.core$wrap_index$reify__11753
INFO [2019-10-01 22:41:38,807] main - riemann.core - Hyperspace core online
INFO [2019-10-01 22:41:38,835] main - riemann.config - Let binding index.
INFO [2019-10-01 22:41:38,836] main - riemann.config - 💚 Index preserved.
INFO [2019-10-01 22:41:38,836] main - riemann.config - Class of index is: class riemann.core$wrap_index$reify__11753
INFO [2019-10-01 22:41:38,836] main - riemann.config - Class of index' is: class riemann.core$wrap_index$reify__11753
INFO [2019-10-01 22:41:38,837] main - riemann.config - Calling (index).
INFO [2019-10-01 22:41:38,838] main - riemann.config - 💚 Index preserved.
INFO [2019-10-01 22:41:38,838] main - riemann.config - Class of index is: class riemann.core$wrap_index$reify__11753
INFO [2019-10-01 22:41:38,838] main - riemann.config - Class of index' is: class riemann.core$wrap_index$reify__11753
INFO [2019-10-01 22:41:38,928] clojure-agent-send-off-pool-3 - riemann.transport.tcp - TCP server 0.0.0.0 5555 online
INFO [2019-10-01 22:41:38,929] main - riemann.core - Hyperspace core online
#riemann.codec.Event{:host "host-one", :service "service-one", :state nil, :description nil, :metric nil, :tags nil, :time 1569933820, :ttl 1.0}
{:host "host-one", :service "service-one", :state "expired", :time 1569933821847/1000}

I wonder if the index function itself should ensure that the index is fixed on startup like this. Alternatively, should there be an additional index! function that really does set the index on the current core?

So, please excuse the wall-o-text. I'd love to hear any ideas from experienced Riemann hackers on whether this looks like a real problem, and if so, what we could (safely) do to change it.

Thank you.

@jarpy jarpy changed the title config/index does not preserve index correctly config/index does not always preserve index correctly Oct 1, 2019
@faxm0dem
Copy link
Contributor

faxm0dem commented Oct 2, 2019

For me or has always been one of the golden Riemann rules: thou shalt only use one index. It's stated in the documentation pretty clearly, don't you think?

@jarpy
Copy link
Contributor Author

jarpy commented Oct 2, 2019

I must admit that I don't remember reading that in the documentation, but memory is not my best talent! :) Do you know where it is? The API documentation is the stuff that I always have open when I'm working with Riemann, so I might add that warning to the docstring if we need it (similar to the one that async-queue! has.)

But I think we can do more, and ask whether that rule should be necessary. On my team, we have another rule. It's:

Thou shalt not write an s-expression that fits not on thy 4K monitor.

If we can only call riemann.config/index once, then we end up with a let form that is hundreds of lines long. Thankfully, we have a work-around now, but I'd love to fix it in Riemann proper.

Currently, the behaviour of index is a bit subtle. It's interesting to note that the repro config from before:

(instrumentation {:enabled? false})
(tcp-server  {:host "0.0.0.0"})
(periodically-expire 1)

(info "Let binding index.")
(let [index (index)]
  (streams
   (not-expired index prn)
   (expired prn)))

(info "Calling (index).")
(index)

works correctly if you send Riemann a SIGHUP and trigger a reload. This means that the behaviour of a loaded config can be different from the behaviour of the same config that has been reloaded. I think it would be a better user experience if a config always behaved the same, and I'd love to write a patch that would get us there.

A couple of ideas:

  1. Have riemann.config/index do what is says on the tin. Make it set the index of the current core, at least if (:index @core) is currently nil, like during the first evaluation of the config.
  2. Provide a function that returns the index without changing anything. The index function is currently dual-purpose. It returns the index, but also does things to potentially change the index of the next core. In my config, at least, I'm interested in returning the index multiple times, but not in changing it. It's also worth noting that currently, the index will never change in practice, because it's always the same type (NBHM) and can thus be preserved across core transitions (though I have an experimental branch that uses MapDB to provide a disk-persisted index).
  3. Provide documentation about the behaviour, with tips on how to work around it if you're allergic to ultra-long s-expressions, like my team.

These options are presented in descending order of my personal preference, but also in descending order of invasiveness. What do folks think about them (or others)? Cheers.

jarpy added a commit to jarpy/goodwillow that referenced this issue Jul 2, 2021
Fix the "orphaned indicies" problem described in riemann#957.

Allows a Riemann config to make multiple calls to riemann.config/index
with predictable results, both on initial load and after a reload.

Without this change, multiple calls are OK after reload or explicit
apply, but create orphan indices on initial load.
jarpy added a commit to jarpy/goodwillow that referenced this issue Jul 2, 2021
Fix the "orphaned indicies" problem described in riemann#957.

Allows a Riemann config to make multiple calls to riemann.config/index
with predictable results, both on initial load and after a reload.

Without this change, multiple calls are OK after reload or explicit
apply, but create orphan indices on initial load.

Closes: riemann#957
jarpy added a commit to jarpy/goodwillow that referenced this issue Jul 2, 2021
Fix the "orphaned indicies" problem described in riemann#957.

Allows a Riemann config to make multiple calls to riemann.config/index
with predictable results, both on initial load and after a reload.

Without this change, multiple calls are OK after reload or explicit
apply, but create orphan indices on initial load.

Closes: riemann#957
@jarpy
Copy link
Contributor Author

jarpy commented Sep 10, 2021

I ended up going with option 1, which is implemented in #997.

jarpy added a commit to jarpy/goodwillow that referenced this issue Jan 12, 2022
Set current index on first call of config/index

Fix the "orphaned indices" problem described in riemann#957.

Allows a Riemann config to make multiple calls to riemann.config/index
with predictable results, both on initial load and after a reload.

Without this change, multiple calls are OK after reload or explicit
apply, but create orphan indices on initial load.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants