-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate if multiple sources can use the same tag #2
Comments
It seems yes, but it also seems that not all sources can have a custom tag. This might affect whether or not we need to have a single session for all blocks. |
I'm not that versed in Ruby, but is it possible to create some sort of For now, we are definitely going for a simple implementation and see how On Thu, Apr 2, 2015 at 7:14 PM, Imron [email protected] wrote:
|
You can have class variables (e.g. the equivalent of static variables in Java). The code to take care of threading issues is basically all there already to handle the fact that fluentd's BufferedOutput can use multiple threads, it's just a matter of using @@ instead of @ in front of the variables that would be shared by all threads (currently @last_timestamp, @thread_ids and @next_id plus a mutex) and adding session id into that mix also. |
After a bit more thought, I'd probably move all the session state in to a separate Session class which would take care of providing thread-safe access to the last timestamp and the map of scalry thread_ids and any other info. The Session class would be a Ruby singleton (which is thread safe in terms of no data races when being created/initialised) and the scalyr fluentd plugin would just call out to that to get timestamps and thread_ids. |
Ok, let's go ahead and do that whenever you have time. It seems worthwhile to me to take this approach early, and it doesn't seem like much work on your end. |
Ok, this is now done, however there is one potential issue I wanted to check on. The Scalyr API docs state that:
Which is all well and good, but what happens where we have multiple blocks and multiple threads and encounter a situation like: Thread 1 -> Start processing 100 events From here we can see that for the batch of events sent by thread 2, even though the timestamp was strictly increasing, by the time we send the entire batch of events the server has already received a different batch for the same session that now includes timestamps higher than the lowest value of the current batch. e.g. Thread 2 contains a timestamp of 1001, but the server has already received a batch of events from Thread 1 contains 2001. So, the timestamps for each event are unique across all threads and are strictly increasing when the event is created. By the time the events reach the server however there is the possibility that it might have received events with a later timestamp already. If that is going to be an issue, the only way I can see around this is to hold the mutex for the entire event processing and sending, which is possible, but has performance implications that negate the point of having multiple threads in the first place. |
There are two general approaches to avoid the problems you mention. If either of these is too much too implement or if you feel like they would cost too much time, then let's not do it for now and leave it as a TODO. The current version will work well enough and this is just fine tuning it.
Anyway, unless you say otherwise, it feels to me that making this perfect is going to cost too much work for now and should just be done after the syslog work and when more customers are using the fluentd plugin. |
Option 1 would work, but then we'd lose all the backoff/retry functionality from fluentd and would need to implement that ourselves. We'd also then be holding all the events in memory (unless we wanted to do something more complex), compared to fluentd backs the buffered output to a file by default so could potentially lose events if there was an error that shutdown the process (e.g. something caused by an out of memory error) With option 2 and using the standard Ruby http code, we'd have to hold the lock until we get the response back (there's no way to split request/response). Blocking for the entire request/response is possibly going to cause issues if there's a network problem and the request delays for several 10s of seconds before timing out (the fluentd buffers might start to fill up). There are other http libs that support asynchronous http request/responses, however I'm not sure how well maintained they are (especially with respect to older version) and whether you want to introduce a dependency on a third-party lib. How big of a problem is it if the scenario I mentioned in my previous comment happens? I've already pushed the code that has a single session across all threads, so if it's an issue I'd need to revert that back to the previous commit which has a session per thread. |
Ugh, ok, sorry for the trouble on this. Yes, please go ahead and revert your change. If we do receive events with lower timestamps that what was previously accepted, those new events will be dropped and recorded in the database. Let's leave this issue open and come back when we need to fix it. It could also be that when we design the new API, we could make this problem go away. |
(I meant ".. will be dropped and NOT recorded in the database" |
Ok. The change has been reverted on master, and I've added new branch single-session which has all the single session code. I'll leave this issue open. |
Hah, need to make sure I don't click 'close and comment'. |
Check to see if multiple sources can use the same tag.
The text was updated successfully, but these errors were encountered: