-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add terraform to setup Dataflow on GCP #2
Comments
This repo is awesome! Thanks for getting this started. I also really like the name. Re: GCP terraform , each time the That |
Now that the runner is using Flink (pangeo-forge/pangeo-forge-runner#21), is any external Beam cluster (Dataflow on GPC) still needed? I am still trying to understand the architecture reading https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html and https://beam.apache.org/documentation/runners/flink and I wonder if Beam is still in the picture or if Flink is enough to handle the jobs? |
Well, I guess Dataflow is still needed, I am still trying to find where Flink is configured to use it. Another question: any appetite to run Beam on Kuberternes and get rid of Dataflow like described in https://python.plainenglish.io/apache-beam-flink-cluster-kubernetes-python-a1965f37b7cb |
Hi @echarles, thanks for chiming in here. This repo is a placeholder that we have not done much work on. Currently I can say we are interesting in supporting Flink in addition to Dataflow, but not as a replacement for it. Some basic Flink configuration can be found in these tests but we do not currently run any Flink in production. All of our production workloads are currently on Dataflow. If you're interested in participating in the conversation, we'd welcome you to join our recurring Pangeo Forge coordination call, which is listed on this calendar and also discussed here for any on-the-fly schedule adjustments. |
Thx @cisaacstern I will join next Monday 2nd Jan meeting. |
Great, @echarles! Looking forward to it. |
Thx for the warm welcome at today meeting. I understand things evolve ATM with the introduction of the new GCP Cloud Runner. I guess my goal is to run on K8S the services and not depend on GCP. Is it already possible/documented? If not, what is missing to make this happen? |
We should add a
gcp
directory in the terraform folder to provision Dataflow on GCP, so the setup of GCP bakeries can be managed within this repository.From chat with @yuvipanda - "it just needs to provision the service account for dataflow" .
The text was updated successfully, but these errors were encountered: