Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX since Airflow 1.10.8 AIRFLOW__ variables are eligible to _CMD behaviour #503

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

NBardelot
Copy link
Contributor

@NBardelot NBardelot commented Feb 14, 2020

See pull request on the official Airflow repository: apache/airflow#6801

It was integrated in 1.10.8. The following variables management are concerned by this change in the image entrypoint:

  • Core fernet_key configuration can now be managed with AIRFLOW__CORE__FERNET_KEY_CMD as well as the usual AIRFLOW__CORE__FERNET_KEY
  • Core sql_alchemy_conn configuration can now be managed with AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD as well as the usual AIRFLOW__CORE__SQL_ALCHEMY_CONN
  • Celery broker_url configuration can now be managed with AIRFLOW__CELERY__BROKER_URL_CMD as well as the usual AIRFLOW__CELERY__BROKER_URL

This PR takes this change into account and fixes errors where the REDIS_ and POSTGRES_ and the Fernet Key variables used in the entrypoint were not computed correctly.

@dinigo
Copy link

dinigo commented Apr 2, 2020

would this allow for a secret to be set like so?

services:
  airflow:
    image: puckel/airflow
    secrets:
      - fernet-key
    environment:
      - AIRFLOW__CORE__FERNET_KEY=$(cat /run/secrets/fernet-key)

@NBardelot
Copy link
Contributor Author

@dinigo yes, but you have to replace AIRFLOW__CORE__FERNET_KEY with AIRFLOW__CORE__FERNET_KEY_CMD and use the correct volumeMount to make sure the /run/secrets/fernet-key file contains the secret's value.

@dinigo
Copy link

dinigo commented Apr 12, 2020

I'm rebasing your changes with my fork, thanks :)

@aalemanq
Copy link

aalemanq commented Apr 28, 2020

I have problems using "AIRFLOW__CELERY__BROKER_URL_CMD" (no more entries in google , just this topic)

Is correct to point this value to hole string like: "pyamqp://airflow:airflow@rabbitmq:5672/airflow" in secret?

AIRFLOW__CELERY__BROKER_URL_CMD=$(cat /run/secrets/broker_url) ?

And never gets de value. I'm here in this topic: #545

@dinigo
Copy link

dinigo commented Apr 28, 2020

@aalemanq from the documentation of airflow v1.10.10

The _cmd config options can also be set using a corresponding environment variable the same way the usual config options can. For example:

export AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD=bash_command_to_run

The idea behind this is to not store passwords on boxes in plain text files.

The universal order of precedence for all configuration options is as follows:

  1. set as an environment variable
  2. set as a command environment variable
  3. set in airflow.cfg
  4. command in airflow.cfg
  5. Airflow’s built in defaults

It could be that this repo is still in the v1.10.9. I don't really know. But you can use the official image apache/airflow:1.10.10 which supports it. The entrypoint changes a little bit though but is easily manageable. You can read about it in the blog post about the release.

Anyway why do you want to store the broker url as a docker secret? Do you use a third party managed redis service which needs passwords? I don't see the point otherwise.

@aalemanq
Copy link

Hello, I have to securize passwords and no password upload to git.. I use rabbitmq.

I try with 1.10.10 and same. I don't know what happens with this #### secrets to apply in airflow. It can't be so complicated to pass secrets in enviroment like other software omg... :(

I I don't know why official doc just put this info:

https://airflow.readthedocs.io/en/stable/howto/set-config.html and not works.

Anybody can pass secrets via enviroments in docker-compose to deploy in swarm????

@dinigo
Copy link

dinigo commented Apr 29, 2020

If you want to get visibility I suggest you file an issue at apache/airflow repo or at StackOverflow (or why not, use both)

@NBardelot
Copy link
Contributor Author

NBardelot commented Apr 30, 2020

@aalemanq
The idea with variables looking like _CMD is the following;

  • you store the confidential value in a secret (Swarm of Kubernetes)
  • you mount the secret as a file in your Airflow containers
  • you set the xxxx_CMD environment variable corresponding to a valid Airflow config as a command that reads the file (for example with cat, or with any mechanism you like best)

Note that not all variables are eligible to _CMD behaviour (see the config documentation). In your case, for the broker endpoint config it's eligible.

This should work with any recent Airflow version (I use 1.10.9), but be aware that the official apache/airflow docker image that is a backport of Airflow 2.0 has got issues with its entrypoint that I fixed in puckel/docker-airflow. I need to take a look at the official image and propose a fix.

@aalemanq
Copy link

aalemanq commented May 4, 2020

Thanks for your repply NBardelot! your work here is awesome so many thanks.

I understand, it is so simple no?

I create my secret in docker-compose and saved in swarm and mounted in container on deploy
I apply enviroment in docker compose like:

AIRFLOW__CELERY__BROKER_URL_CMD=$(cat /run/secrets/broker_url)

(In docker swarm, this fail because you need to scape this $ with another $)= AIRFLOW__CELERY__BROKER_URL_CMD=$$(cat /run/secrets/broker_url)
If I use env_file, with 1 $, same issue

This not works, and broker url empty= is default=redis and I can't connect against my rabbit using secrets.

If you can I can show you all my workflow, applying secrets in docker-swarm, deploy, and logs with _CMD enviroments on docker-compose.yml applied.

I try with last version 1.10.9 puckel too and CMD never gets the value :(. If I use normal enviroments without secrets it's works

@NBardelot
Copy link
Contributor Author

NBardelot commented May 4, 2020

@aalemanq You do not need the $() around the command. The python script in charge of managing the configuration of Airflow already includes the execution of a shell to run the command (see airflow/configuration.py and method run_command if you'd like to take a look at the details).

You can just set AIRFLOW__CELERY__BROKER_URL_CMD="cat /run/secrets/broker_url" and it should work.

In order to debug such a configuration you can:

  • run the container using docker-compose
  • exec into the container (you'll probably need to force the entrypoint to --entrypoint /bin/bash)
  • check the files that are mounted to verify that the secret is correctly present in the file you expect
  • check yourself the command you'd expect Airflow to use
  • eval "$AIRFLOW__CELERY__BROKER_URL_CMD" to test that all is working

@aalemanq
Copy link

aalemanq commented May 4, 2020

Well, I have found several errors that confuse me, thank you very much for your interest and for giving support, I will never tire of repeating it.

As you say, I have applied the environment variable:
AIRFLOW__CELERY__BROKER_URL_CMD="cat /run/secrets/broker_url"

Same error, the variable does not work at the start of airflow.

Debug:

  • Docker-compose running with entrypoint: ["sh", "-c", "sleep 2073600"] and command worker disabled (I'm playing with worker service with rabbitmq and postgres up)
  • inside container:
    root@ac88785b0b0c:/usr/local/airflow# cat /run/secrets/broker_url pyamqp://airflow:airflow@rabbitmq:5672/airflow

First of all, before execute /entrypoint.sh manually, I check if environments are correct:

-If I execute eval "$AIRFLOW__CELERY__BROKER_URL_CMD" I get:

airflow@0e6c1e5913d8:~$ eval "$AIRFLOW__CELERY__BROKER_URL_CMD"
bash: cat /run/secrets/broker_url: No such file or directory

I have to restart docker-compose again with next value in cmd broker_url env:
AIRFLOW__CELERY__BROKER_URL_CMD=cat /run/secrets/broker_url #without [""]
and now, inside container, I have this env working:

airflow@7ebed0b08ca7:~$ eval "$AIRFLOW__CELERY__BROKER_URL_CMD"
pyamqp://airflow:airflow@rabbitmq:5672/airflow

So far everything perfect. but....

The next step is to run the entrypoint.sh manually to simulate the start of thepuckel/docker-airflow:1.10.9

When I run this entrypoint / entrypoint.sh, airflow update pip and avoid my _CMD enviroment :( connecting to default: redis (I imagine that enter in some case ignoring _CMD enviroments)

Mon 04 May 2020 01:26:21 PM UTC - waiting for Redis... 1/20 Mon 04 May 2020 01:26:26 PM UTC - waiting for Redis... 2/20 Mon 04 May 2020 01:26:31 PM UTC - waiting for Redis... 3/20

And it never takes my CMD variable from broker_url in the start using entrypoint

For another hand, if I execute command directly and no entrypoint.sh manually, I get this ¿?

`airflow@7ebed0b08ca7:~$ airflow worker
[2020-05-04 13:29:18,910] {{settings.py:253}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=115
[2020-05-04 13:29:19,669] {{cli_action_loggers.py:107}} WARNING - Failed to log action with (psycopg2.errors.UndefinedTable) relation "log" does not exist
LINE 1: INSERT INTO log (dttm, dag_id, task_id, event, execution_dat...
                    ^

[SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (%(dttm)s, %(dag_id)s, %(task_id)s, %(event)s, %(execution_date)s, %(owner)s, %(extra)s) RETURNING log.id]
[parameters: {'dttm': datetime.datetime(2020, 5, 4, 13, 29, 19, 645785, tzinfo=<Timezone [UTC]>), 'dag_id': None, 'task_id': None, 'event': 'cli_worker', 'execution_date': None, 'owner': 'airflow', 'extra': '{"host_name": "7ebed0b08ca7", "full_command": "[\'/usr/local/bin/airflow\', \'worker\']"}'}]
(Background on this error at: http://sqlalche.me/e/f405)
 
 -------------- celery@7ebed0b08ca7 v4.4.0 (cliffs)
--- ***** ----- 
-- ******* ---- Linux-4.15.0-99-generic-x86_64-with-debian-10.3 2020-05-04 13:29:19
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         airflow.executors.celery_executor:0x7f134c1d4ed0
- ** ---------- .> transport:   amqp://airflow:**@rabbitmq:5672/airflow%0A
- ** ---------- .> results:     postgresql://airflow:**@postgres/airflow
- *** --- * --- .> concurrency: {min=12, max=16} (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> default          exchange=default(direct) key=default
                

[tasks]
  . airflow.executors.celery_executor.execute_command

[2020-05-04 13:29:22,139] {{settings.py:253}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=122
[2020-05-04 13:29:22,935: ERROR/MainProcess] consumer: Cannot connect to amqp://airflow:**@rabbitmq:5672/airflow%0A: Connection.open: (530) NOT_ALLOWED - vhost airflow
 not found.
Trying to reconnect...
`

¿?¿? what is this?

amqp://airflow:**@rabbitmq:5672/airflow%0A < ---- ¿?¿ %0A ¿?

It's normal that inside entrypoint of 1.10.9 image there is no reference to enviroments _CMD?

airflow@7ebed0b08ca7:~$ grep -i CMD /entrypoint.sh 
airflow@7ebed0b08ca7:~$ 

edit:
Another test... :
I'm trying with entrypoint that exist in this topic ( 0f5b8b1 )

Because I don't understand why actually entrypoint in 1.10.9 don't have CMD enviroments. I get same error of my last post. Airflow gets the value but apply "%0A" at the final of sring O_O. But at least. eval "works" and my enviroment exists pointing to secrets.

Apologies for my english and skill, I'm trying to be clear and no disturb u!

Regards!

@NBardelot
Copy link
Contributor Author

NBardelot commented May 4, 2020

@aalemanq pleasure to help, I've worked on this so if I can share...

The %0A is a newline character (\n). When you read with cat, the last character is a newline and Airflow thinks it's part of the config.

You can check this with wc -l /run/secrets/broker_url to count the number of lines of your secret file. It should be 1 in order to be OK. If you have 2, then the extra \n is an issue.

Be careful when you create the secret you probably insert a newline at the end of the string without wanting one. Example with the newline (due to echo in this case):

echo "the sensitive data" | docker secret create mysecret -

Without the newline (-n switch of the echo command):

echo -n "the sensitive data" | docker secret create mysecret -

@NBardelot
Copy link
Contributor Author

NBardelot commented May 4, 2020

It's normal that inside entrypoint of 1.10.9 image there is no reference to enviroments _CMD?

It depends on the Airflow image. Are you using the image from puckel/docker-airflow (this project) or the image from the Airflow official project?

The puckel/docker-airflow should be OK.

But the official image is not. See the following issue and PR i'm currently proposing to Airflow :

@aalemanq
Copy link

aalemanq commented May 5, 2020

Hello, I'm using puckel/docker-airflow:1.10.9 and if you grep CMD on entrypoint you can't see anything related enviroments CMD inside, it's normal? O_o, I had to override this entrypoint 0f5b8b1 in my compose to avoid airflow change my broker_url for redis. Do you think that I'm in the correct way? It's works....

About secrets...

airflow@8b0ecb633439:~$ wc -l /run/secrets/broker_url 1 /run/secrets/broker_url

Its ok!! O_O,

I create a secret with docker-compose not by hand 3.7 version. For another hand I create manually this secrets with echo -n and use it like external,and IT'S WORKS!!!! It's fking works! I can't believe that , 1 month wasted O_O.

Resume,
to setup airflow with CMD env via docker-compose I have to do:

  • Use this sintaxis (maybe if airflow put and example in their doc..):

` In enviroments section:

  AIRFLOW__CORE__FERNET_KEY_CMD=cat /run/secrets/fernet_key
  AIRFLOW__CELERY__BROKER_URL_CMD=cat /run/secrets/broker_url
  AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD=cat /run/secrets/sql_alchemy_conn
  AIRFLOW__CELERY__RESULT_BACKEND_CMD=cat /run/secrets/result_backend

`

  • Override entrypoint in my compose using entrypoint: /entrypoint2.sh from here 0f5b8b1 because if not, broker_url use default (redis) and not my rabbitmq

  • Create broker_url secret by hand and use it like external from compose (another secrets like fernet/sql_alchemy..., works automatically via docker-compose/secret without use external)... if not, error for this character inside secret: %0A

Really, I'm astonished, When I go to official doc you can read this: https://airflow.readthedocs.io/en/stable/howto/set-config.html and cut your veins trying shit O_O

And if I use official image...bye bye, it's worst, same entrypoint with no CMD O_O, and this guys put in their official docu but entrypoint can't manage it?¿? I don't understand nothing. Sorry for last words talking about my life X)

Regards NBardelot , you save my day, my month,my mind and more things.

@NBardelot
Copy link
Contributor Author

You'll need to rebuild the Docker image for puckel/docker-airflow with the commit I propose in this PR in order to make it work. That's the goal of the PR :)

@aalemanq
Copy link

aalemanq commented May 5, 2020

Thanks a lot! Do you think that is the same if I copy entrypoint like now? or I lost some features?
If I can save to build an image better for my devs...

Thanks!

PD: thanks, I never read about PR :D

@NBardelot
Copy link
Contributor Author

The previous commit concerning the file script/entrypoint.sh is commit a1d70c6 (a commit I made to fix other issues about AIRFLOW__CELERY__BROKER_URL and AIRFLOW__CORE__SQL_ALCHEMY_CONN not being taken into account). It was added in 1.10.8.

So if you use 1.10.9 you can safely replace the whole script using the commit of this PR.

@aalemanq
Copy link

aalemanq commented May 6, 2020

Thanks for stay here ;)

Somebody knows if AIRFLOW__CELERY__FLOWER_BASIC_AUTH_CMD is implemented? I can't see any trace in entrypoint about this environment.

Regards!!

@NBardelot
Copy link
Contributor Author

It is not yet implemented. That's why I propose this PR with a commit to implement it.

AIRFLOW__CORE__LOAD_EXAMPLES \

# Setup the Fernet Key only if the user didn't provide it explicitely as an Airflow configuration
if [[ -z "$AIRFLOW__CORE__FERNET_KEY" && -z "$AIRFLOW__CORE__FERNET_KEY_CMD" ]]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can cause the user to lose data if they forget to keep the Fernet key. For Airflow 1.10, the fernet key is optional. For Airflow 2.0, it is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants