-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move DAG bundle config into config, not db #44924
base: main
Are you sure you want to change the base?
Conversation
9e253a0
to
d28caa7
Compare
This moves the DAG bundle config into the Airflow config, instead of being in the db. This: - makes it much easier to configure a fresh Airflow instance - no api/cli calls required - avoids some security concerns by ensuring only deployment managers, with direct access to the instance, can configure these The primary downside is this does mean you cannot reconfigure an existing bundle in a running Airflow instance.
@@ -2654,3 +2654,16 @@ usage_data_collection: | |||
example: ~ | |||
default: "True" | |||
see_also: ":ref:`Usage data collection FAQ <usage-data-collection>`" | |||
dag_bundles: | |||
description: | | |||
Configuration for the DAG bundles. This allows Airflow to load DAGs from different sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
describe that the section is important, and airflow will consume any new option you add.
add examples on how to define them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstandish updated
airflow/models/dagbundle.py
Outdated
|
||
|
||
class DagBundleModel(Base): | ||
"""A table for DAG Bundle config.""" | ||
"""A table for DAG Bundle information.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a few reasons why we have this in the first place:
- place for cli/api commands to know what the latest version Airflow knows about is, when it was last refreshed
- gives us a entity to FK against, and have some info about no-longer-configured bundles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstandish same here :)
@@ -2028,7 +2027,7 @@ class DagModel(Base): | |||
fileloc = Column(String(2000)) | |||
# The base directory used by Dag Processor that parsed this dag. | |||
processor_subdir = Column(String(2000), nullable=True) | |||
bundle_id = Column(UUIDType(binary=False), ForeignKey("dag_bundle.id"), nullable=True) | |||
bundle_name = Column(StringID(), ForeignKey("dag_bundle.name"), nullable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to preserve history, we should use an association table so that when a dag object is assigned a new bundle object, the history will be preserved. Example: If a dag 'A' is in dag-bundle 'DA', and 'DA' is no longer configured or the name was changed, which triggers a new dag-bundle object, say 'DB', which now has dag 'A' in it. The DAG bundle_name will update to the new dagbundle object 'DB', causing us to lose the previous bundle name. With an association table, we can have an is_active in the table that tells whether the bundle has been removed. However, there will be more complex queries.
Another thing I thought of is using a history table like in TIH, but DAG changes more often.
Co-authored-by: Ephraim Anierobi <[email protected]>
This moves the DAG bundle config into the Airflow config, instead of being in the db. This:
The primary downside is this does mean you cannot reconfigure an existing bundle in a running Airflow instance.