AIRFLOW 2.5.0 with celeryExecutor has experienced frequent schedule crashes, with crashes occurring every few days recently. I noticed that there are many time-wait socket processes in the background, such as 750 time-wait out of 800, and only 50 are established. Has anyone encountered a similar situation? #46581
Replies: 2 comments
-
looks like you have connectivity problems with your Mysql - proxies, networking etc. Look to that. this is looks like environmental problem for your installation. |
Beta Was this translation helpful? Give feedback.
-
In addition, the following information was found when restarting the scheduler. Looks like the executor has gone down? scheduler log: and the file airflow-worker.err appear |
Beta Was this translation helpful? Give feedback.
-
2025-02-07 07:59:44,195 ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3361, in _wrap_pool_connect
return fn()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 325, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 888, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 491, in checkout
rec = pool._do_get()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 256, in _do_get
return self._create_connection()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 271, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 386, in init
self.__connect()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 685, in connect
pool.logger.debug("Error on connect(): %s", e)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in exit
compat.raise(
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 210, in raise
raise exception
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 680, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/create.py", line 578, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 598, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/python3/lib/python3.9/site-packages/MySQLdb/init.py", line 123, in Connect
return Connection(*args, **kwargs)
File "/usr/local/python3/lib/python3.9/site-packages/MySQLdb/connections.py", line 185, in init
super().init(*args, **kwargs2)
MySQLdb.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1' (110)")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 759, in _execute
self._run_scheduler_loop()
File "/usr/local/python3/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line 887, in _run_scheduler_loop
self.executor.heartbeat()
File "/usr/local/python3/lib/python3.9/site-packages/airflow/executors/base_executor.py", line 175, in heartbeat
self.sync()
File "/usr/local/python3/lib/python3.9/site-packages/airflow/executors/celery_executor.py", line 331, in sync
self.update_all_task_states()
File "/usr/local/python3/lib/python3.9/site-packages/airflow/executors/celery_executor.py", line 443, in update_all_task_states
state_and_info_by_celery_task_id = self.bulk_state_fetcher.get_many(self.tasks.values())
File "/usr/local/python3/lib/python3.9/site-packages/airflow/executors/celery_executor.py", line 597, in get_many
result = self._get_many_from_db_backend(async_results)
File "/usr/local/python3/lib/python3.9/site-packages/airflow/executors/celery_executor.py", line 617, in get_many_from_db_backend
tasks = session.query(task_cls).filter(task_cls.task_id.in(task_ids)).all()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2772, in all
return self._iter().all()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2915, in _iter
result = self.session.execute(
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1713, in execute
conn = self._connection_for_bind(bind)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
return self._transaction._connection_for_bind(
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 747, in _connection_for_bind
conn = bind.connect()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3315, in connect
return self._connection_cls(self, close_with_result=close_with_result)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 96, in init
else engine.raw_connection()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3394, in raw_connection
return self._wrap_pool_connect(self.pool.connect, _connection)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3364, in _wrap_pool_connect
Connection.handle_dbapi_exception_noconnection(
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2198, in handle_dbapi_exception_noconnection
util.raise(
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 210, in raise
raise exception
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 3361, in _wrap_pool_connect
return fn()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 325, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 888, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 491, in checkout
rec = pool._do_get()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 256, in _do_get
return self._create_connection()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 271, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 386, in init
self.__connect()
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 685, in connect
pool.logger.debug("Error on connect(): %s", e)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in exit
compat.raise(
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 210, in raise
raise exception
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 680, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/create.py", line 578, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/python3/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 598, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/python3/lib/python3.9/site-packages/MySQLdb/init.py", line 123, in Connect
return Connection(*args, **kwargs)
File "/usr/local/python3/lib/python3.9/site-packages/MySQLdb/connections.py", line 185, in init
super().init(*args, **kwargs2)
sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2003, "Can't connect to MySQL server on '127.0.0.1' (110)")
(Background on this error at: https://sqlalche.me/e/14/e3q8)
2025-02-07 07:59:45,235 INFO - Sending Signals.SIGTERM to group 78299. PIDs of all processes in the group: [121567, 121569, 78299]
2025-02-07 07:59:45,236 INFO - Sending the signal Signals.SIGTERM to group 78299
2025-02-07 07:59:45,609 INFO - Process psutil.Process(pid=121567, status='terminated', started='07:59:43') (121567) terminated with exit code None
2025-02-07 07:59:45,942 INFO - Process psutil.Process(pid=78299, status='terminated', exitcode=0, started='11:50:03') (78299) terminated with exit code 0
2025-02-07 07:59:45,942 INFO - Process psutil.Process(pid=121569, status='terminated', started='07:59:43') (121569) terminated with exit code None
2025-02-07 07:59:45,943 INFO - Exited execute loop
Beta Was this translation helpful? Give feedback.
All reactions