Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add check to find if common.compat provider is changed in a non-compatible way #44913

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mobuchowski
Copy link
Contributor

This hook ensures API stability of common.compat provider by validating method signatures between staged and committed versions of Python files

  • detect removed methods
  • prevent removal of existing arguments
  • require default values for new arguments

Few things missing: how to handle imports? Important part of common.compat provider is handling different import paths for different airflow version - for example dataset->asset rename.
However, at some point we want to drop old version and remove old import.
Also, we can just import something for internal use in some method, it should not be an API surface.

Another is nested classes/methods: this wouldn't detect change in NoOpCollector:

def get_hook_lineage_collector():
    # Dataset has been renamed as Asset in 3.0
    if AIRFLOW_V_3_0_PLUS:
        from airflow.lineage.hook import get_hook_lineage_collector

        return get_hook_lineage_collector()

    # HookLineageCollector added in 2.10
    if AIRFLOW_V_2_10_PLUS:
        return _get_asset_compat_hook_lineage_collector()

    # For the case that airflow has not yet upgraded to 2.10 or higher,
    # but using the providers that already uses `get_hook_lineage_collector`
    class NoOpCollector:
        """
        NoOpCollector is a hook lineage collector that does nothing.

        It is used when you want to disable lineage collection.
        """

        # for providers that support asset rename
        def add_input_asset(self, *_, **__):
            pass

        def add_output_asset(self, *_, **__):
            pass

        # for providers that do not support asset rename
        def add_input_dataset(self, *_, **__):
            pass

        def add_output_dataset(self, *_, **__):
            pass

    return NoOpCollector()

@mobuchowski mobuchowski force-pushed the verify-method-signatures-common-compat branch from 6b819e1 to ed58588 Compare December 13, 2024 13:43
@potiuk
Copy link
Member

potiuk commented Dec 13, 2024

Nice .. I like the idea. Tried to do similar thing to common.sql with stubgen (after initial attempt of doing it similarly to you) - but this explicit approach is I think better - even if requires more code.

One other option I saw in Android OS (and maybe that is a good idea for you to explore) - was to turn the signatures into a plain-text file where you keep signatures of public methods - and keep it in the repository next to the code. Then you do not have to check-out the previous version of the code (Which is not always available - in CI we only retrieve single commit and we do not have parent commit by default) - you just generate a new version of such signature "dump" file - then pre-commit framework will automatically see that the dump has changed and will fail precommit (and you would have to commit such modifed file to accept changed signatures)

@potiuk
Copy link
Member

potiuk commented Dec 13, 2024

Smth like that:

method1(a:string, b:string) -> int
method2(a:string, b:int) -> bool

@potiuk
Copy link
Member

potiuk commented Dec 13, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants