Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move DAG bundle config into config, not db #44924

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jedcunningham
Copy link
Member

This moves the DAG bundle config into the Airflow config, instead of being in the db. This:

  • makes it much easier to configure a fresh Airflow instance - no api/cli calls required
  • avoids some security concerns by ensuring only deployment managers, with direct access to the instance, can configure these

The primary downside is this does mean you cannot reconfigure an existing bundle in a running Airflow instance.

This moves the DAG bundle config into the Airflow config, instead of
being in the db. This:

- makes it much easier to configure a fresh Airflow instance - no
  api/cli calls required
- avoids some security concerns by ensuring only deployment managers,
  with direct access to the instance, can configure these

The primary downside is this does mean you cannot reconfigure an
existing bundle in a running Airflow instance.
@jedcunningham jedcunningham reopened this Dec 14, 2024
@@ -2654,3 +2654,16 @@ usage_data_collection:
example: ~
default: "True"
see_also: ":ref:`Usage data collection FAQ <usage-data-collection>`"
dag_bundles:
description: |
Configuration for the DAG bundles. This allows Airflow to load DAGs from different sources.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describe that the section is important, and airflow will consume any new option you add.

add examples on how to define them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstandish updated



class DagBundleModel(Base):
"""A table for DAG Bundle config."""
"""A table for DAG Bundle information."""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a few reasons why we have this in the first place:

  • place for cli/api commands to know what the latest version Airflow knows about is, when it was last refreshed
  • gives us a entity to FK against, and have some info about no-longer-configured bundles

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstandish same here :)

airflow/config_templates/config.yml Outdated Show resolved Hide resolved
airflow/dag_processing/bundles/manager.py Show resolved Hide resolved
@@ -2028,7 +2027,7 @@ class DagModel(Base):
fileloc = Column(String(2000))
# The base directory used by Dag Processor that parsed this dag.
processor_subdir = Column(String(2000), nullable=True)
bundle_id = Column(UUIDType(binary=False), ForeignKey("dag_bundle.id"), nullable=True)
bundle_name = Column(StringID(), ForeignKey("dag_bundle.name"), nullable=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to preserve history, we should use an association table so that when a dag object is assigned a new bundle object, the history will be preserved. Example: If a dag 'A' is in dag-bundle 'DA', and 'DA' is no longer configured or the name was changed, which triggers a new dag-bundle object, say 'DB', which now has dag 'A' in it. The DAG bundle_name will update to the new dagbundle object 'DB', causing us to lose the previous bundle name. With an association table, we can have an is_active in the table that tells whether the bundle has been removed. However, there will be more complex queries.

Another thing I thought of is using a history table like in TIH, but DAG changes more often.

Co-authored-by: Ephraim Anierobi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-66: DAG Bundle/Manifest area:db-migrations PRs with DB migration area:Scheduler including HA (high availability) scheduler kind:documentation
Projects
Development

Successfully merging this pull request may close these issues.

2 participants