Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery Materialized View will be recreated each time dataform project is run #1822

Open
p13rr0m opened this issue Aug 28, 2024 · 1 comment

Comments

@p13rr0m
Copy link

p13rr0m commented Aug 28, 2024

BigQuery Materialized View Issue

We have a very large table in BigQuery and have created a filtered, smaller materialized table for analysts. Each day new data gets added to the large table and subsequently to the small view as well.

We are using the dataform CLI to run the models. However, even though we haven't changed the materialized view, every time we run the dataform project, the materialized view will be recreated and we have to process the whole data of the large table again.

We would expect that the materialized view keeps the previously processed data.

This is how we create the materialized view:

config { 
  type: "view", 
  materialized: true,
  bigquery: {
    additionalOptions: {
        enable_refresh: "false"
    },
    partitionBy: "DATE(ingestion_time)",
    clusterBy: ["column_a"]
  }
}
SELECT
    *
FROM
    ${ref("large_table")}
WHERE
    column_b

Thanks for your help!

@justinaugust
Copy link

Agreed, this somewhat defeats the purpose of the materialized view.
Would be great to see some smart checking ie:

  • Does the same materialized view / schema / partition / cluster exist? If so do not recreate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants