Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inject! hangs up with current timestamps #941

Open
0xRand0m opened this issue Feb 11, 2019 · 3 comments
Open

inject! hangs up with current timestamps #941

0xRand0m opened this issue Feb 11, 2019 · 3 comments

Comments

@0xRand0m
Copy link

0xRand0m commented Feb 11, 2019

Hello fellow riemann enthusiasts,

while running tests with current event timestamps I encountered the problem that my test events were not injected into the streams. The following snippet showcases the issue. When run as is the test does not terminate. If you remove the last event (its timestamp has one more digit than the first two) it terminates within seconds.

(tests
  (deftest timestamp-test
    (prn (inject! [{:host "test.example.com" :service "test" :metric 1337 :time 109889374}
                   {:host "test.example.com" :service "test" :metric 1337 :time 109889375}
                   {:host "test.example.com" :service "test" :metric 1337 :time 1109889374}]))))

I took a peek into the source code but could not find any reasons for this behavior.
The obvious workaround is to use smaller timestamps but it would be nice to be able to use real life examples as test cases.

Thanks in advance for any help!

@mcorbin
Copy link
Contributor

mcorbin commented Feb 11, 2019

Hello,

I'm not able to reproduce your issue:

(logging/init {:file "/var/log/riemann/riemann.log"})

(let [host "127.0.0.1"]
  (tcp-server {:host host})
  (udp-server {:host host})
  (repl-server {:host host})
  (ws-server  {:host host}))

(streams
  (tap :foo))

(tests (deftest timestamp-test
         (let [result
               (inject! [{:host "test.example.com" :service "test" :metric 1337 :time 109889374}
                         {:host "test.example.com" :service "test" :metric 1337 :time 109889375}
                         {:host "test.example.com" :service "test" :metric 1337 :time 1109889374}])]
           (= (:foo result)
              [{:host "test.example.com" :service "test" :metric 1337 :time 109889374}
               {:host "test.example.com" :service "test" :metric 1337 :time 109889375}
               {:host "test.example.com" :service "test" :metric 1337 :time 1109889374}]))))
riemann test config.clj 
INFO [2019-02-11 19:30:55,013] main - riemann.bin - Loading /home/mathieu/Documents/config.clj
INFO [2019-02-11 19:30:55,073] main - riemann.repl - REPL server {:port 5557, :host 127.0.0.1} online

Testing riemann.config-test

Ran 1 tests containing 0 assertions.
0 failures, 0 errors.

What are you testing ?

@0xRand0m
Copy link
Author

Hi,

my apologies I should have added the whole config - but I managed to single out a possible culprit.
Your example ran fine for me too - but then I added (periodically-expire 5) before streams and this time I got a NullPointerException that does not occur when I remove the last test event.
I took a look at the reaper code but my understanding of the riemann internals is unfortunately limited.

(logging/init {:file "/var/log/riemann/riemann.log"})

(let [host "127.0.0.1"]
  (tcp-server {:host host})
  (udp-server {:host host})
  (repl-server {:host host})
  (ws-server  {:host host}))

(periodically-expire 5)
(let [index (index)]
(streams
  index
  (tap :foo)))

(tests (deftest timestamp-test
         (let [result
               (inject! [{:host "test.example.com" :service "test" :metric 1337 :time 109889374}
                         {:host "test.example.com" :service "test" :metric 1337 :time 109889375}
                         {:host "test.example.com" :service "test" :metric 1337 :time 1109889374}])]
           (= (:foo result)
              [{:host "test.example.com" :service "test" :metric 1337 :time 109889374}
               {:host "test.example.com" :service "test" :metric 1337 :time 109889375}
               {:host "test.example.com" :service "test" :metric 1337 :time 1109889374}]))))
WARN [2019-02-12 12:17:41,688] Thread-7 - riemann.core - instrumentation service caught
java.lang.NullPointerException: null
	at clojure.core$swap_BANG_.invokeStatic(core.clj:2345)
	at clojure.core$swap_BANG_.invoke(core.clj:2337)
	at riemann.test$tap_stream$stream__9675.invoke(test.clj:41)
	at riemann.core$stream_BANG_$fn__9747.invoke(core.clj:20)
	at riemann.core$stream_BANG_.invokeStatic(core.clj:19)
	at riemann.core$stream_BANG_.invoke(core.clj:15)
	at riemann.core$instrumentation_service$measure__9756.invoke(core.clj:59)
	at riemann.service.ThreadService$thread_service_runner__6513$fn__6514.invoke(service.clj:101)
	at riemann.service.ThreadService$thread_service_runner__6513.invoke(service.clj:100)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.lang.Thread.run(Thread.java:748)

@sanel
Copy link
Contributor

sanel commented Jul 29, 2019

@bickerle few tips:

  1. use (is (= (:foo result) ...
  2. add (instrumentation {:enabled? false}) before declaring index

@mcorbin this is a weird bug that, when instrumentation is explicitly enabled or disabled, it will not happen. Unless instrumentation call is made with (instrumentation), instrumentation service will kick in from different thread automatically, while tests are running, causing riemann.test/*results* to be set to nil for instrumentation thread (*results* is per-thread variable).

I'm not able to track what is difference between explicit (instrumentation {}) call and implicit one, started from riemann by default. The only thing I noticed is that, when (instrumentation {}) is called even with :enabled? true, instrumentation thread will not be started during tests.

For now, here is pretty dumb patch that take care of *results* if multiple threads are involved.

diff --git a/src/riemann/test.clj b/src/riemann/test.clj
index 09f224d..b112945 100644
--- a/src/riemann/test.clj
+++ b/src/riemann/test.clj
@@ -36,9 +36,9 @@
   [name child]
   (fn stream [event]
     ; Record event
-    (-> *results*
-        (get name)
-        (swap! conj event))
+    (some-> *results*
+            (get name)
+            (swap! conj event))
 
     ; Forward downstream
     (child event)))

sanel added a commit to sanel/riemann that referenced this issue Apr 27, 2020
Behavior is explained on [1]

In short, unless (instrumentation) is explicitly set and tests has
time gap that will put Riemann in sleep, another thread will kick
instrumentation while tests are running, causing NullPointerException.

[1] riemann#941 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants