-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] redeployment long-lived actors when node left cluster #290
Comments
@qazwsxedckll. Thank you so much for this issue. There is already an issue I created some time back to tackle that. You can find it here. At the moment, I need to rethink it properly. However I am open to any PR in that regard. |
First of all, I am so impressed by your hard work, the documentation is very clear and the code is very clean. ps. I haven't checked the code entirely yet. I want to know which framwork is more similar, akka, orleans or erlang? |
@qazwsxedckll one can spawn an actor remotely with some caveats. :) |
I have used all these actor framework in the past to be honest and was contributing to proto-actor-go some time before starting this. So I pick ideas from all of them. :) |
@qazwsxedckll the challenge Go-Akt is having is to be able to serialize the |
Sorry, but I don't get it. Why serialization is related to redeployment? Who should send the serialized data and who should receive it? I saw the RemoteSpawn method, all it need is the actor type and the actor name. I do saw that the Register method is not automatically called when starting the actor system. How about adding the registered type to olric, since nodes are aware of the node dead event, all nodes who can spawn the dead actor can decide who should spawn it? Usually, all nodes is accessible to all actor types, heterogeneous cluster is not common, I think. Here is my usecase. The user can create a MQTT client on the web and subscribe to some topics. I want to create an actor for each client and when the node is down I need to recreate the actor and subscribe to the topics again. So recreate the actor whenever it is accessed is not going to work for me. |
@qazwsxedckll I will take a deep look into this use case. However a PR will also be welcomed in case you think of something that you would like to share. However I really don't understand this The redeployment involve a little bit more than creating actors remotely because we need at least cater for the cluster topology and answer the following question: which node is going to receive the actors of the down node in the cluster? |
You have mentioned this in the previous issue. I just want to clearify that in my usecase, the actor should start automatically and no other actor is actually calling it since subscribing topics is an active action. It is the downside of the so called 'virtual actor', so I am looking for other solutions. |
@qazwsxedckll This was what I meant:
Apologies for the misunderstanding. |
@qazwsxedckll I will start some draft work on the redeployment feature. However I cannot promise when it should be ready because I need to dig a bit in Olric capacity and see what can help implementing it or if I have to add some layer on top of it to achieve it. |
Just for your reference, when using actors, I alway load data from database on initialization and constantly save data to database when processing messages. If I store some state in memory, I am assuming it will lose at any time. If what you mean is something like dapr actor store. I don't find it is very useful. For business data, key-value store is far from enough. For actor state, I have not come out with a good use case yet. Anyway, I think it is nice to have it, but not nessary.
Yes, exactly. |
@qazwsxedckll I think I have understood what you meant. That is why I was saying I need to implement some binary mechanism for |
I have not checked Olric yet. Is using plain strings for actor type not enough? |
Design consideration
Lack
|
@qazwsxedckll I hope you are doing well. In case you have any suggestions please bring them on. I hope I will not have to change the design. However I may change the design of the cluster engine. |
Great library, covers many of the things that seem to be missing from protoactor. Though this specific issue is a bit of a blocker. Any idea when this one might be resolved? |
@waeljammal there is draft PR #319. I am still thinking about the best way to implement this due to the underlying cluster engine library I am using. |
@Tochemey have you thought about using Olric locks to have a single instance act as the leader? The leader can catch member left events and then decide if it can re-distribute the actors if there is another instance that can spawn the same actors available? Seeing as if there is only 1 instance of a service that spawns a specific type of actor/s running then you simply can't bring them back to life until either the node that went down comes back or there are other instances of that service that can spawn the actor that are still alive. For this you need to track which node can spawn which actors and have 1 node act as the re-balancing orchestrator, if Olric lock works fine then you can use that to make sure just 1 node at any given time can be the orchestrator. I would even say that each group of microservices should have it's own leader as each microservice hosts specific actors, you would need to provide the ability for dev to set a cluster name and a service id so you can form the overall cluster made up of different types of services and track which service can spawn which actor types. You would also need to cater for software updates, eg. doing a rolling update in kubernetes where each pod is replaced 1 at a time, you might want to calculate a quorum based on existing values so you do not re-balance prematurely, ie. if you had 3 instances before and 2 suddenly went down for a software upgrade then you do not want to re-balance unless at least 2 instances are alive again otherwise if the leader goes down then the first instance that comes back to life would end up with all of the actors being restored there instead of being distributed across at least a few instances. You can use a n-1 formula to decide when it's time to re-balance. eg. you have 3 instances 1 goes down you can safely rebalance straight away across the remaining 2, if you 2 go down wait a little while if the quorum is not recovered you can restore to the remaining instance but if they do come back then you reach n-1 quorum again you can re-balance across multiple instances again. The trick is not rely only on a member left event but rather to have another event after a member leaves that triggers after some configurable period that flags the member as dead, seeing as members will go down cos of a crash or new version being released you need to allow a little time for the instances to recover. You can also look at how nats implements only the leader election part of raft in their graft library, you can do the same using your pub/sub to do more reliable leader election rather than using locks. |
@waeljammal thanks for the suggestions. I hope to have some bandwidth to tackle this in the short term.😉 |
@waeljammal and @qazwsxedckll I believe I have found a simpler way to get this out. Hopefully I will push it in the weekend. I am currently working on a solution that I need to test. |
No description provided.
The text was updated successfully, but these errors were encountered: