Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not manage events on non-consul apps #234

Closed
wants to merge 2 commits into from
Closed

Do not manage events on non-consul apps #234

wants to merge 2 commits into from

Conversation

guilhem
Copy link
Contributor

@guilhem guilhem commented May 19, 2017

This fix a lot of issues when dealing with apps not managed by marathon-consul

@@ -160,6 +161,18 @@ func (fh *eventHandler) handleStatusEvent(body []byte) error {
"TaskStatus": task.TaskStatus,
}).Info("Got StatusEvent")

app, err := fh.marathon.App(task.AppID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operation is quite expensive and could fail. Why you want to cut non consul app on this level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, when a "Killing" or "Failed" event happen, it try to deregister application even if it's not consul.

We have a lot of level=error msg="There was a problem deregistering task" Id={APP}.e41a91e6-3be7-11e7-bcb0-c4346bb6b3b0 error="Couldn't find any service matching task id {APP}.e41a91e6-3be7-11e7-bcb0-c4346bb6b3b0"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to test it, this could double the traffic to Marathon for a deployment. Becouse new tasks will need to be registered and old will check if should be deregistered.

Instead of asking marathon I'll prefer to ask consul if it has specific task and then deregister it. Third solution is to check error and log if it's not error that means there is no such task. If we want to deregister task and it's already gone we are done and this is not an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the code and we already did it. IMO best to change error and check if this is real error. Again if we want to deregister a service that is not registered this is not a problem. The problem was before, why we haven't registered it if we should?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem is that we have task managed by tool mesos-consul when tag consul isn't present. and tasks with tag consul managed by marathon-consul (as a transition step).

We really don't want to see marathon-consul deregistering apps managed by mesos-consul.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marathon-consul sets different service ID then mesos-consul so it won't delete apps registered by mesos-consul.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right :)

Maybe a change on log is a good trade-off.
Best will be to create an internal "cached" list of applications that we managed (sync periodically).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand. Caching applications/tasks that are manages could help with performance (less calls to Marathon and Consul) and can be first step to fix #170

This fix a lot of issues when dealing with apps not managed by marathon-consul
@guilhem
Copy link
Contributor Author

guilhem commented May 19, 2017

@janisz I dig really into code and don't find why tests are failing...

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 87.105% when pulling 476cfee on guilhem:filterTask into 64b04b0 on allegro:master.

@guilhem
Copy link
Contributor Author

guilhem commented May 20, 2017

I'm closing this PR because as seen in comments it's not the right way to do it.

@guilhem guilhem closed this May 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants