Replies: 1 comment
-
BackroundTask id within a DAG file could be duplicate, and this introduces error during its initialization in Airflow. In Optimus context, within a single DAG, we can have:
Among the above components, both hook task and upstream sensor task can cause duplication task id error if configured in a certain way. Hook TaskThis happens when the user configures more than one hooks of the same type. for example, the user adds two hooks of the same type Agreed ApproachCurrently, we have Example: # format: hook_{type}_{name}
hook_bq2bq_notify_slack = SuperKubernetesPodOperator(
# ...
task_id='hook_bq2bq_notify_slack'
)
hook_bq2bq_write_status_to_table = SuperKubernetesPodOperator(
# ...
task_id='hook_bq2bq_write_status_to_table'
) Upstream Sensor TaskThis happens when the user has more than one internal dependency jobs with the same name, regardless of the project. This is because the way we generate internal upstream sensor task is by using the job name only, without taking into account the project name. So, let's say Agreed ApproachThe proper fix was actually implemented. However, due to some business requirements, it was reverted back. The proper way to fix it is to add project name (along with the job name) of the dependencies as task id. Much similar to the one for external dependency. We just need to re-implement it only after all the business requirements allow it. Sensor migration will be needed. |
Beta Was this translation helpful? Give feedback.
-
This discussion is started by issue mentioned here. However, rather than solving it partially on the specific bug, it's better to solve it as a whole from end-to-end. Meaning, the user's requirement needs to be reconsidered.
Beta Was this translation helpful? Give feedback.
All reactions