-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Flink API to retrieve savepoint name #22
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22 +/- ##
=========================================
- Coverage 55.85% 53.4% -2.45%
=========================================
Files 12 11 -1
Lines 478 440 -38
=========================================
- Hits 267 235 -32
+ Misses 172 168 -4
+ Partials 39 37 -2
Continue to review full report at Codecov.
|
Hi Nick, Thanks for the contribution. Both me and Niels have had quite a few debates about how to properly implement the savepoint retrieval. Referring to your table, option 2 would work if cancel would return the full savepoint-path. If not, restarting from an existing savepoint would require the caller of the deploy action to know the full path to the savepoint. That means that an external system would need to store the savepoint path and pass it to the deployer. This is something we didn't want to impose on our users immediately. This problem also applies to supporting a deploy with a savepoint path without cancelling first. There's no way of knowing the full path to the savepoint. |
@mrooding I think Nicks intentions were correct. We should be retrieving savepoint path from Flink API. There are some edge cases where two consecutive savepoints are triggered, or savepoints are triggered for different jobs (parallel invocation of flink-deployer, or one from flink-deployer and one from flinkUI), where simply retrieving the latest savepoint will not give the correct savepoint required to restart the job. Also, Nick's change support all savepoint schemes that flink supports and will support in the future (s3, hdfs) without having to implement specifically for each scheme like I saw in the other PR created by kerinin. Also currently flink-deployer only supports local file system, where as must flink deployments these days run in k8s where there is no local storage. What we need is part of Nick's changes merged into the latest code base:
What do you think? If Nick is busy I can do it. |
Hi @joshuavijay! |
Thank you for this Project.
The PR changes the code in such a way that the Flink API endpoint
/jobs/:jobid/savepoints/:triggerid
is used to retrieve the name of the latest savepoint when updating a job.How I imagine the deployer API should be used:
update
with job-name-basecancel
(not implemented yet) +deploy
with savepoint-pathdeploy
I removed the logic for retrieval of the most current savepoint through the file system completely. I think using the Flink API is a more flexible approach because you don't need to mount a volume with the snapshots, therefore allowing the usage of different storage solutions for savepoints / checkpoints such as blob stores like AWS S3 or Google Cloud Storage.
Also, the job name base mechanism didn't seem functional yet. It should work as advertised now.
Docker image: https://hub.docker.com/r/nicktriller/flink-deployer/