DAGs go missing after a while
Apache Airflow version
2.9.3
If "Other Airflow 2 version" selected, which one?
2.9.3
What happened?
I am using the airflow locally using a custom Dockerfile and a docker-compose from the official URL with some small customization. I usually have a work flow like Extras, Transform and Load in separate DAGs and the las task for the ET are calling the next DAG in the flow.
My issue is that when I start to develop new DAGs locally, random tags start to go missing from the Webserver UI. when I go in the container and run the command "airflow tags list" my dogs are shown there (same with "airflow tags report"), but they are not present in the UI. If I run the command "airflow db init" or "airflow db migrate" the DAGs go back to show in the Webserver UI for a short time (around 30 seconds) and then go missing again.
What you think should happen instead?
The DAGs should be showing in the Webserver UI.
How to reproduce
Honestly, I have no idea how to reproduce the errors, since I can't find anything in the logs.
Operating System
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else?
This problem seems to happen when I run the "docker compose down && docker compose up -d" often when developing.
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Thank you for reporting this issue. To help us diagnose and reproduce the problem, could you please provide:
- Example DAGs that you are using when the issue occurs.
- The custom Dockerfile you're using.
- The docker-compose.yml file with your customizations.
- Any specific steps or operations that lead to the issue, it would be helpful if some attached screenshots are possible.
This information will help to better understand and address the problem. Thanks!
I noticed similar thing on my installation with version 2.9.2. It's possible that the problem have been present for some time. It definitely doesnt sound like expected behaviourr. I suspect there's some kind of race condition due to very long parsing as in my case I deal with over 2k dags setup. Unable to see exact conditions that cause this.
The dags may suddenly reappear and then disaplear all over for the course of day. It almost seems like data gets removed for little period instead of "update" operation this way causing conditions when dag isn't in database so webserver doesn't retrieve it. From user experience it looks like setup with over 2k dags with frequently running scheduler if you spam F5 while looking of dashboard of webserver the number of dags you get as visible changes each time.
@josix:
1 - The DAGs doesn't really matter since they disappear randomly. But here is one example:
import logging
from airflow.decorators import dag, task
from airflow.models.param import Param
from airflow.operators.python import get_current_context
from common.tasks.general import trigger_another_dag
from common.tasks.teams import notify_failure
from common.services.etl import ETLService
from common.settings.car import *
from common.settings.dags import default_args
from common.settings.monitoring import UPDATE_RUNNING, WEEKLY
from common.settings.envs import START_DATE
logger = logging.getLogger("airflow.task")
etl_service = ETLService()
@dag(
default_args=default_args,
schedule_interval="@weekly",
start_date=START_DATE,
catchup=False,
tags=["public"],
params={
"ignore_discrepancy": Param(False, type="boolean"),
"emergency_mode": Param(None, type=["null", "string"])
},
)
def update_car_extract():
@task(on_failure_callback=notify_failure)
def main_run_attrs() -> dict:
"""Core function Create monitoring instance
:return: dict with data to monitoring this project
"""
from common.models.car import DataModel
context = get_current_context()
return etl_service.start_monitoring(
DataModel, context, UPDATE_RUNNING, WEEKLY
)
@task(on_failure_callback=notify_failure)
def download_file_to_s3() -> str:
"""Core function to downloads files from source directly to S3.
It can be set to run in emergency mode by a DAG conf.
:return: string with s3 path to raw data
"""
from common.models.sema_mt_car import DataModel
context = get_current_context()
# Set the url to download
source_urls = {"zip": SOURCE}
logger.info(f"The Source URLS: {source_urls}")
result_download = etl_service.download_file(
context, DataModel, source_urls, use_raw=False
)
return result_download
dag_conf = {
"main_run_attrs": main_run_attrs(),
"raw_data_zip_path": download_file_to_s3(),
}
trigger_another_dag(
dag_conf,
"trigger_transform_dag",
"update_car_transform",
)
update_car_extract()
2 - Dockerfile
FROM apache/airflow:2.9.3-python3.11
COPY requirements.txt /requirements.txt
RUN pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org
RUN pip install --no-cache-dir -r /requirements.txt --trusted-host pypi.org --trusted-host files.pythonhosted.org
USER root
RUN apt-get update && \
apt-get install --allow-downgrades -y libpq5=15.6-0+deb12u1 libmariadb3=1:10.11.6-0+deb12u1
RUN apt-get install -y libgdal-dev \
gdal-bin \
gcc \
g++
RUN sudo apt-get install unrar-free -y
RUN sudo pip install geopandas --trusted-host pypi.org --trusted-host files.pythonhosted.org
RUN sudo pip install --global-option=build_ext --global-option="-I/usr/include/gdal" GDAL==`gdal-config --version` --trusted-host pypi.org --trusted-host files.pythonhosted.org
RUN sudo pip install --no-cache-dir rasterio --trusted-host pypi.org --trusted-host files.pythonhosted.org
RUN apt-get clean
USER airflow
3 - docker-compose.yaml
x-airflow-common:
&airflow-common
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
# Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
# and uncomment the "build" line below, Then run `docker-compose build` to build the images.
image: my-tag:latest
# build: .
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: LocalExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
AIRFLOW__WEBSERVER__SHOW_TRIGGER_FORM_IF_NO_PARAMS: 'true'
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: 'true'
AIRFLOW__CORE__DEFAULT_TIMEZONE: 'America/Sao_Paulo'
AIRFLOW__WEBSERVER__DAG_ORIENTATION: 'TB'
AIRFLOW__LOGGING__COLORED_CONSOLE_LOG: 'true'
AIRFLOW__SCHEDULER__SCHEDULER_ZOMBIE_TASK_THRESHOLD: 600
# yamllint disable rule:line-length
# Use simple http server on scheduler for health checks
# See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server
# yamllint enable rule:line-length
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
# WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks
# for other purpose (development, test and especially production usage) build/extend Airflow image.
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
# The following line can be used to set a custom config file, stored in the local config folder
# If you want to use it, outcomment it and replace airflow.cfg with the name of your config file
# AIRFLOW_CONFIG: '/opt/airflow/config/airflow.cfg'
volumes:
- ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
- ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
- ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
- ${AIRFLOW_PROJ_DIR:-.}/common:/opt/airflow/plugins/common
- $HOME/.aws:/home/airflow/.aws
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
&airflow-common-depends-on
postgres:
condition: service_healthy
services:
postgres:
image: postgis/postgis:13-3.4
platform: linux/amd64
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
ports:
- "5432:5432"
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 10s
retries: 5
start_period: 5s
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8974/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-triggerer:
<<: *airflow-common
command: triggerer
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-init:
<<: *airflow-common
entrypoint: /bin/bash
# yamllint disable rule:line-length
command:
- -c
- |
if [[ -z "${AIRFLOW_UID}" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
echo "If you are on Linux, you SHOULD follow the instructions below to set "
echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
echo "For other operating systems you can get rid of the warning with manually created .env file:"
echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user"
echo
fi
one_meg=1048576
mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
disk_available=$$(df / | tail -1 | awk '{print $$4}')
warning_resources="false"
if (( mem_available < 4000 )) ; then
echo
echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
echo
warning_resources="true"
fi
if (( cpus_available < 2 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
echo "At least 2 CPUs recommended. You have $${cpus_available}"
echo
warning_resources="true"
fi
if (( disk_available < one_meg * 10 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
echo
warning_resources="true"
fi
if [[ $${warning_resources} == "true" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
echo "Please follow the instructions to increase amount of resources available:"
echo " https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin"
echo
fi
mkdir -p /sources/logs /sources/dags /sources/plugins /sources/common
chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins,common}
exec /entrypoint airflow version
# yamllint enable rule:line-length
environment:
<<: *airflow-common-env
_AIRFLOW_DB_MIGRATE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
_PIP_ADDITIONAL_REQUIREMENTS: ''
user: "0:0"
volumes:
- ${AIRFLOW_PROJ_DIR:-.}:/sources
airflow-cli:
<<: *airflow-common
profiles:
- debug
environment:
<<: *airflow-common-env
CONNECTION_CHECK_MAX_COUNT: "0"
# Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
command:
- bash
- -c
- airflow
# You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
# or by explicitly targeted on the command line e.g. docker-compose up flower.
# See: https://docs.docker.com/compose/profiles/
volumes:
postgres-db-volume:
4 - There are no specific conditions in where the dogs go missing. I do suspect thought on the docker compose down and up too frequently.
In the moment I do not have screenshots showing the how the files goes missing in the web server. But it literally just goes missing, from 16 DAGs for example, I refresh the page (F5) and it's now with 14 DAGs.
For context: in our dev and production environment this does not occur. Only in the local environment. Usually in the local I have around 30 DAGs and in production we have around 300+ with codes going to 2k lines.
Here I have the problema again. I have 19 DAGs in my local airflow in the moment. 7 just went missing.
But if I run "airflow dags list" I can see all the 19 DAGs:
After running "airflow db migrate" the DAGs show up in the Webserver again:
I have no idea how to reproduce it, but it seems its always after I stop the containers and run them again.
Hi @gabriel-attie, do you find any error in the docker container logs?
Hi @gabriel-attie, do you find any error in the docker container logs?
Nothing related to any missing DAGs. (2 import errors which I'm aware of - not an issue in local).
This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.
This issue has been closed because it has not received response from the issue author.