If you use the default entry point of the production image, there are some actions that are automatically performed when the container is started. In some cases, you can pass environment variables to the image to trigger some of these behaviors.
Variables that determine the "execution" behavior start with_AIR FLOW
to distinguish them from the variables used to build the imageAIR FLOW
.
Allow any user to run containers¶
The Airflow image is Open-Shift compatible, which means you can run it with any user ID and group ID0
(bron
). If you want to run an image with another Airflow user, you MUST set the user's GID to0
. If you try to use another group, the access point will fail.
OpenShift randomly assigns a UID when launching a container, but you can also use this flexible UID when manually launching an image. This can be useful, for example, if you want to mountis
eHistorical
hosts system directories on Linux; in that case, the UID should be set the same as your host user ID.
This can be done in several ways: you can change the USER by expanding or resizing the image, or you can dynamically pass the user todocker loop
command, add--from the user
signal in one of these formats (seeDocker Runtime Referencefor details):
`[ user | user group | UID | uid:gid | user guide | van: group ]`
In the case of a Docker Compose environment, this can be changed viafrom user:
No entrydocker-compose.yaml
.VerDocker build referencefor details. In our Quickstart with Docker-Compose, the UID can be passed throughAIRFLOW_UID
variable as described inInitialize the Docker Compose environment.
User can be any UID. If the UID is different from the default valueair flow
(UID=50000), the user is automatically created when entering the container.
To accommodate some external libraries and projects, Airflow will automatically create a random user in (/etc/password) and point to the home directory/house/airflow
Many third-party libraries and packages require the user's home directory to be present because they need to write cache information there, so this dynamic user creation is necessary.
This random user should be able to write to specific directories that require write access, and since it is not recommended to allow write access to "others" for security reasons, the OpenShift guidelines introduce the concept of allowing all these directories to grant writes0
(bron
) group ID (GID). All directories that require writable access on the Airflow production image have their GID set to 0 (and are writable in the batch). We follow this concept and all directories that require write access follow it.
GID=0 is set by default forair flow
user, so all folders created will have their GID default to 0. The entry point is setmasker
are0002
- this means that all directories created by the user also have group write access for groups0
- they can be recorded by other usersbron
group. Additionally, whenever a "random" user creates a folder (for example, on a mounted volume), that folder will have "bulk write" access andGID=0
, so running it with another random user will still work, even if that map is later mounted by another random user.
Omasker
However, the configuration only works at container runtime and is not used during image creation. If you want to extend the image and add your own packages, don't forget to addmasker 0002
before your docker command - this way directories created by installations that need group access can also be written to the group. This can be done, for example, in this way:
LOOP masker 0002; \ do something; \ do something else;
You can read more about this in the chapter “Support for Arbitrary User IDs”.openshift best practices.
Wait for the Airflow DB connection¶
The access point is waiting for a database-independent connection to the database. This allows us to increase the stability of the environment.
Waiting for a connection means executingair flow database check
command, meaning aSelection 1 I It is alive;
the statement is executed. It then repeats until the command succeeds.CONNECTION_CHECK_MAX_COUNT
time and sleepCONNECTION_CHECK_SLEEP_TIME
between checks To disable the check, setCONNECTION_CHECK_MAX_COUNT=0
.
Wait for the Celery broker connection¶
If CeleryExecutor is used and one of theplaner
,celery
When commands are used, the access point waits for the Celery Broker database connection to become available.
The script detects the background type depending on the URL scheme and assigns default port numbers if none are specified in the URL. It then repeats until a connection can be made to the specified host/port.CONNECTION_CHECK_MAX_COUNT
time and sleepCONNECTION_CHECK_SLEEP_TIME
between checks. To disable authentication, setCONNECTION_CHECK_MAX_COUNT=0
.
Supported schemes:
amqp(s)://
(rabbitmq) – default port 5672repeat //
- default port 6379postgres://
- default port 5432mysql://
- default port 3306
While waiting for a connection, it checks if the corresponding port is open. The host computer information is derived from the airflow configuration.
Execution of orders¶
If the first argument is equal to "bash" - you will be taken to the bash shell, or you can run a bash command if you provide additional arguments. For example:
docker loop -to apache/airstream:2.7.0-python3.8 Fun -C "so"in total 16drwxr-xr-x 4 air flow bron 4096 June 5 18:12 .drwxr-xr-x 1 bron bron 4096 June 5 18:12 ..drwxr-xr-x 2 air flow bron 4096 June 5 18:12 dagsdrwxr-xr-x 2 air flow bron 4096 June 5 18:12 Historical
If the first argument is equalPython
- you will be placed in a python shell or python commands will be executed if you pass additional parameters. For example:
> docker loop -to apache/airstream:2.7.0-python3.8 Python -C "print('test')"test
If the first argument is equal to "airflow" - the remaining arguments are treated as the airflow command to be executed. Example:
docker loop -to apache/airstream:2.7.0-python3.8 air flow web server
If there are any other arguments, they are simply passed to the "airflow" command.
> docker loop -to apache/airstream:2.7.0-python3.8 staff use: air flow [-H] GROUP_OR_COMMAND ... positional arguments: GROUP_OR_COMMAND groups: celery celery components institution Look institution connections Govern connections time of day Govern AND database Data bank activities path Govern path Kubernetes Always for staff loop O Kubernetes executor swimming pools Govern swimming pools service providers Show service providers paper Govern paper undertaken Govern undertaken Users Govern Users variables Govern variables Commandos: Shalabakhter Show issue bed sheet information Show Information they Right now Air flow e Environment Cerberus Start A Cerberus ticket innovator accessories Throw Information they packed accessories draai-fernet-chave Turn encrypted connection reference e variables planer Start A planer example sink.-perm Update Law for exist paper e optional AND version Show O version web server Start A Air flow web server example optional arguments: -H, --staff Show are staff message e Exit
Run the custom code before the airflow entry point¶
If you want to run custom code before the Airflow entry point, you can use a custom script and call the Airflow entry point lastmanagerial
instructions in your custom. However, you must remember to use itidiotically hot
just as it is used with the airflow entry point, otherwise you may have problems with proper signal propagation (see next section).
COMBI air flow: 2.7.0COPY OF meu_pontodeentrada.sh /ENTRY POINT ["/usr/bin/domme-init", "--", "/meu_pontodeentrada.sh"]
For example, your access point can modify or add variables directly. For example, the entry point below sets the maximum number of database checks starting from the first parameter passed as the image launch parameter (a somewhat pointless example, but should give the reader an example of how to use it).
#!/bin/bashexport CONNECTION_CHECK_MAX_COUNT=${1}to changemanagerial /entry point "${@}"
Make sure the airflow entry point is followingmanagerial /entry point "${@}"
as the last command in your custom access point. This way, signals are passed correctly and arguments are passed to the access point as usual (you can useto change
as above if you need to pass some additional arguments. Note that passing secrets or storing secrets in an image in this way is a bad idea from a security point of view, since the image and parameters to launch the image are available to anyone who has access to your Kubernetes logs or registry. images.
Also note that the code that runs before the Airflow entry point should not create any files or directories in the container and does not do everything the same when it runs. Before starting the Airflow entry point, the following functions are not available:
umask is not properly configured to allow this
group
approach to writingthe user has not yet been created on
/etc/password
if any user is used to run the imagedatabase and brokers may not be available yet
Add custom image behavior¶
The Airflow image has many steps in the access point and setting up the correct environment, but you may want to run additional code after the access point creates the user, sets the umask, sets the variables, and verifies that the database is working.
Instead of running the usual commands -planer
,web server
you can walkamended and supplementedscript that you can embed in an image. You can even use common airflow components -planer
,web server
in the custom script when you are done with the custom installation. Like the custom in point, it can be added to the image by expanding it.
COMBI air flow: 2.7.0COPY OF meu_after_entrypoint_script.sh /
Build your image and then you can run this script by running the command:
docker build . --Voices --marks my image: 0.0.1docker loop -to my-picture:0.0.1 Fun -C "/mijn_na_entrypoint_script.sh"
signal propagation¶
Using air flowidiotically hot
run as "init" on the access point. This is the correct signal forwarding and collection of underlying processes. This means that the process you are running does not need to install any signal handlers in order to run properly and terminate when the container is shut down properly.DUMB_INIT_SETSID
variable defined as1
default - which means signals are passed to the entire process group, but you can set this as well0
allow thatthe only child
behavior ofidiotically hot
which forwards signals to only one child process.
The table below summarizes thisDUMB_INIT_SETSID
possible values and scenarios of their use.
variable amount | Use case |
1 (default) | Passes signals to all processes in the process group of the main process running in the container. If you run your processes via |
Pass signals only to the main process. This is useful if your main process normally handles signals. A good example is the hot shutdown of Celery employees. O For the Airflow Celery worker you need to set the variable to 0 and use it |
Extra fast testing capabilities¶
The options below are mainly used to quickly test an image - for example with docker-compose quickstart or when you want to run a local test with newly added packages. They should not be run in a production environment because they add overhead to run extra commands. These options in production should be done as database maintenance or included in the used custom image (when you want to add new packages).
Update the airflow database¶
ako define_AIRFLOW_DB_MIGRATE
variable to a non-empty value, the access point will useair flow database migrate
right after you confirm the connection. You can also use this when running airflow with an internal (default) SQLite database to update the database and create admin users on the entry point so you can start the web server immediately. Note: SQLite is used for testing purposes only. Never use SQLite in production as it has severe limitations when it comes to concurrency.
Create an administrator user¶
The access point can also automatically create a web server user when you log on. you have to define_AIRFLOW_WWW_USER_CREATE
to a non-empty value to do so. This is not intended for production, it is only useful if you want to quickly test a production image. To create such a user, you must enter at least one password_AIRFLOW_WWW_USER_PASSWORD
from_AIRFLOW_WWW_USER_PASSWORD_CMD
in the same way as for others*_CMD
variables, content*_CMD
will be evaluated as a shell command and its output will be set to password.
User creation fails if none of thePASSWORD
variables are defined - there is no default password for security reasons.
Parameter | Standard | environment variable |
---|---|---|
username | administrator |
|
password |
| |
To do | Air flow |
|
last name | Administrator |
|
airflowadmin@exemplo.com |
| |
paper paper | Administrator |
|
If a password is provided, it will attempt to create a user, but the access point will fail if the attempt fails (this takes into account the case where the user has already been created).
For example, you can run a web server on a production image by initializing an internal SQLite database and creatingadministrator/administrator
Administrator user with the following command:
docker loop -to -P 8080:8080 \ --omg "_AIRFLOW_DB_MIGRATE=waar" \ --omg "_AIRFLOW_WWW_USER_CREATE=waar" \ --omg "_AIRFLOW_WWW_USER_PASSWORD=administrator" \ apache/airstream:2.7.0-python3.8 web server
docker loop -to -P 8080:8080 \ --omg "_AIRFLOW_DB_MIGRATE=waar" \ --omg "_AIRFLOW_WWW_USER_CREATE=waar" \ --omg "_AIRFLOW_WWW_USER_PASSWORD_CMD=echobeheerder" \ apache/airstream:2.7.0-python3.8 web server
The above commands will initialize the SQLite database, create an admin user with admin password and admin role. They also forward the local port8080
to the web server port and finally start the web server.
Installing additional requirements¶
Notice
Installing prerequisites this way is a very convenient way to run Airflow, very useful for testing and debugging. However, don't be fooled by its practicality. Never use it in a production environment. We've deliberately chosen to make it a development/test dependency and print a warning when it's used. There is an inherent security issue when this method is used in production. Installing requirements this way can literally happen at any time: when your containers are restarted, when your machines in the K8S cluster are restarted. In the K8S cluster, these events can literally happen at any time. This exposes you to a serious vulnerability where your production environment could be brought down by removing a single PyPI dependency - or even a dependency of your dependency. This means that you are putting the availability of your production service in the hands of a third-party developer. At any time, including weekends and holidays, these third-party developers can shut down your production Airflow instance without your knowledge. This is a serious vulnerability similar to the infamous oneleft keyboardan issue. You can completely protect yourself from this by creating your own immutable custom image with dependencies built into it. You have been warned.
Additional requirements can be installed by specifying_PIP_ADDITIONAL_REQUESTS
The variable should contain a list of prerequisites that must be additionally installed when entering the containers. Note that this option slows down Airflow's startup, as it has to install new packages every time the container is started, and opens up a large potential security vulnerability when used in production (see below). Therefore, this option should only be used for testing. When testing is complete, you need to build your custom image with built-in dependencies.
Example:
docker loop -to -P 8080:8080 \ --omg "_PIP_ADDITIONAL_REQUIREMENTS=lxml==4.6.3 normalizator skupa znakova==1.4.1" \ --omg "_AIRFLOW_DB_MIGRATE=waar" \ --omg "_AIRFLOW_WWW_USER_CREATE=waar" \ --omg "_AIRFLOW_WWW_USER_PASSWORD_CMD=echobeheerder" \ apache/airstream:2.7.0-python3.8 web server
This method is only available from the Airflow 2.1.1 Docker image and later.