-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supervisor/gen spawn options #9256
Supervisor/gen spawn options #9256
Conversation
CT Test Results 2 files 96 suites 1h 8m 53s ⏱️ Results for commit 2096c7d. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
I think this is unnecessary as the tls_dyn_connection_sup is a one_for_all-supervisor which means that it will hibernate. From supervisor.erl:
|
@IngelaAndin good hint, but that doesn't apply here, it will only happen after the initial otp/lib/ssl/src/tls_dyn_connection_sup.erl Lines 55 to 62 in ac7be0d
otp/lib/ssl/src/tls_gen_connection.erl Lines 102 to 107 in ac7be0d
This I also saw by tracing, as described in the post on the forums, the supervisor was indeed not hibernating. It might actually be a good change to repeat this hibernation logic on all places where a static supervisor finishes (re-)starting children, but that might be a lot more complex change considering the many places and different strategies to restart children, but might also be a better option, wdyt? Overall one of the two hibernation strategies would be a really good move for supervisors, either the one available in this PR or the one proposed above. I can work on it if you have any preference. |
@NelsonVides This will no longer be true when #9231 is merged. |
Oooooooooooh, right, I didn't see that PR! Amazing, really amazing! 🤩 |
@IngelaAndin though, wait... when you do |
Huum .. I do not know how much garbage that will generate but I guess the it will be a lot less than start up. Will have to think about it. |
So in general the issue is with supervisors that are pretty much entirely inactive, however never go to hibernation again. I see three solutions:
My thoughts. I have some availability to work on any of them :) |
My thinking right now is that the ssl use case for which_child/2 is going to bee quite a common use case. So maybe it would be nice if there was a new optional flag to supervisor that would make it possible to configure an idle timeout for the static supervisors, so that the application can configure that timeout to an appropriate value so that the hibernation will occur after initial calls to which_child/2. And maybe even occur later too if woken up later but then probably becomes idle again. Keeping the default to hibernate right away as is what happens now. @Maria-12648430, @u3s, @dgud |
Hi all, happy new year :) @IngelaAndin (Re-)hibernating after some idle timeout seems to me a good way to go, yes. This could even be allowed for I may be mistaken, but at a quick skim, One somewhat aesthetic point is that we would have to put a second meaning on the existing |
Hey @Maria-12648430! 👋🏽 I think that being able to pass gen_server flags to the underlying gen_server would be the ideal solution, the user often plans for what it is expected a gen_server to do and pass |
Yes I think it should be accomplished using the gen_server flags, although I do not think that the user of a supervisor should need to know that supervisors are gen_servers and I think think supervisors could have a configuration of an idle timeout. I think it feels like there should be an additional supervisor flag that could be called idle_timeout. It could default to infinity that is no timeout. Then static supervisors could do hibernation after init only if the timeout is infinity for backwards compatibility. |
Fair enough. In that case, of the available |
I think what @IngelaAndin meant was an additional option (?) That is, have one option to specify when to go into hibernation after supervisor start, and another to specify when the operating supervisor goes into hibernation when not having received a message for a while. This can't be done by just passing the |
@Maria-12648430 No I mean I only want one option to the supervisor, additional supervisor flag, we can call it hibernate_after, |
@NelsonVides No, I do not think we should overlad start_link/3, I want it to be configured in the supervisor. Like:
|
Oh, so changing childspecs actually, that's more changes I'd have imagined. What about the case when the children is not a supervisor but a worker? Workers are of many types and have many APIs, so this flag would then make sense only for when the child is a supervisor, which is a new thing in these specs, all other flags are valid for both supervisors and workers, is that ok? 🤔 |
@NelsonVides Yes good point, maybe it would make sense to extend the type argument {supervisor, HibernateAfter} ?! We need to think about it I will bring the question up in my team too. Will get back to you. |
@IngelaAndin Hm, in the child specs, really? 🤔 By intuition, I don't think that is a good way to go. It would mean that the supervisor has to know more about the child it wants to start than it actually should. I think this belongs in the supervisor flags instead. Those concern the supervisor itself (vs its children), so there is no outside-of-the-module knowledge needed. Arguably, a supervisor knows best what it is for and how it should behave regarding hibernation. If it wants or needs to be told how to behave hibernation-wise by a parent supervisor or something else, that is possible to do via Roughly sketching this up in my mind (and assuming we put the option in the supervisor flags), it could look simple as this: We put the timeout into all returns from |
@Maria-12648430 yes maybe I was being too reactive here, and maybe hibernate_after could actually be applied to workers too if you really want, as they need to be well behaved OTP-processes that would be expected to handle hibernate_after at least for the common case and if you make a very customized one you could also be expected to know that hibernate_after will not work. |
I think I agree with @Maria-12648430 here, adding this to the child_spec means the supervisor needs to keep track of more things, and, it needs to pass this information to the children in a way that the children need to understand: that means, all the custom supervisors and worker children out there in the world would need to extend their APIs to understand this new information, and the supervisor would need to check if the given children is compatible with one or another API... and then we one day want to add And the child_spec currently already has a way to pass information to its children: as given at their If you wanted to keep that hibernate_after flag in the supervisor for SSL, we could do
and then in the implementation of -module(tls_dyn_connection_sup).
start_link(Options) ->
supervisor:start_link(?MODULE, [], Options). So if we nevertheless need to expand the supervisor API to get the hibernate flag, it seems to me like we don't need to introduce a second mechanism in the child_spec to pass flags (which are just input data for the children really), we can just let child-specs keep passing the init arguments, and in that case we as a user can decide whether to pass hibernate arguments to children we know will understand the parameter and to not pass them to children that won't, leaving that decision to the supervisor's client code, and not to the internal supervisor implementation. |
@IngelaAndin not sure we're talking about the same thing 😅 I made a quick proof of concept here, to show what I have in mind: Maria-12648430#1 |
@Maria-12648430 and @NelsonVides Have not had time yet to check, but I am starting to feel that maybe I wrote to much gen_statem code and to little gen_server code lately and I think that this is part of the confusion. I will check your suggestion tomorrow. I am starting to feel that "over-load" start_link for supervisor could be ok. I will see if I can also merge my PR using which_child/2 tomorrow. |
I had a deeper look into the related parts of the code (including @NelsonVides' suggestions in this PR), and this is what I think now. Comparing @NelsonVides' changes with what I drafted, the difference is that his approach passes the One downside of @NelsonVides' approach however is that the option can only be set from the "outside", ie from the starting process, by passing the Finally, this also means that the hibernation value can not be changed (AFAICS) during a release upgrade. A downside of my drafted approach is, apart from requiring a custom hibernate-after handling implementation, that it relies on yet another message ( It may be possible to combine both approaches, like this: Do not pass the On a kinda side note, going into instant hibernation after/in |
Ok, so in my world static supervisors are mostly very inactive processes that are started at application startup and that in turn start up their worker processes that may be more supervisors and some actual workers. Then the supervisors hopefully have nothing to do until the application is stopped when they are involved stopping everything as gracefully as possible. So this of course is not entirely true as there might occur intermediate problems and process that will be restarted and soft-code upgrades and the So now anyway with supervisor:which_children/2 I guess it will be more common that a hibernated supervisor is immediately woke up, so this is a problem that I think we would like to avoid. Which is also motivated for some of the other cases you mentioned. So I guess we are free to change how the static supervisor chooses to hibernate but we do still need this to happen for most of them without extra configuration. So not really reaching any conclusion yet, just trying to contribute to the considerations we have to make. Having a an extra idle-timout for a process that will be mostly hibernated is probably ok, but not so much for a busy process that a simple_one_for_one-supervisor normaly is. |
@IngelaAndin Well... I think we have reached a point where we need a decision from you (and/or the OTP team) to progress further. Let me sum up the options we discussed until now:
Open questions:
Re-reading the above, it looks like I'm biased towards "my" or a crossover implementation, even though I did my best to keep it down 🙀 Anyway, feel free to add whatever you think I forgot 😉 |
Hm, even if they do happen occasionally, a supervisor will not go back into hibernation afterwards 🤔 With |
Yes I think this is the sane way to do it, especially since we got which_child/2. Mostly I have seen registered names used, which in some cases lead to really awful naming conventions and unexpected race-condition problems. The only other way I used was to start the children in a more manual fashion in order to be able to use supervisor function return as init-argument to second child. And this will only work for more dynamic parts of a supervisor tree, and I did not like it, I think it became ugly and need temporary processes to work properly. |
I do not think we need to be backwards compatible in the way that we only ever hibernate after init, that is an implementation detail. The thing I think is problematic is to remove it unless explicit configured as we probably would get complaints that memory usage has gone up. I will involve more team members also in discussion but is nice to have more people to discuss with :) |
I updated my PoC by removing the strict backwards compatibility. By default (ie, if no |
Hey there! I was wondering, did anything come out of this discussion in the OTP team? @IngelaAndin I find @Maria-12648430 draft pretty nice, code is clean, I just find unfortunate to have to duplicate the logic that |
This comes from my thoughts on https://erlangforums.com/t/passing-gen-servers-start-opts-to-supervisors/4325.
Supervisors are too
gen
behaviours, and we can also plan for them to for example go into hibernation when appropriate, but currently they don't expose passing start options. One example is for SSL connections, where I can set to hibernate the two SSL connection workers if I know they'll be long-lived but not specially active, but the dynamic supervisor doesn't hibernate and therefore will likely never GC until it dies.