-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery final type suggestion should always allow null #153
Comments
I tested the fix in #154 . But we have decided to not merge it, because of the potential impact on existing pipelines. The loader after the fix is incompatible with the loader before the fix. I will use this schema to explain:
And imagine two Snowplow events using these two valid objects as the unstruct event:
If the BigQuery column was created with the old loader, then column With the old version of the loader, both events would get loaded, i.e. we get two rows in the BigQuery table. But the for the second event, the unstruct event is silently dropped, because of the bug. After upgrading the loader to the version with the fix (and assuming we do not alter the existing table), the first event would get loaded, but the second event would fail to get loaded, and would go to the failed inserts bucket. We will leave this issue un-fixed, until we have a better way to assert if existing pipelines are affected by this change. |
When schema-ddl cannot find a BigQuery type for a field, it falls back to a string type. This basically means we stringify whatever value we get (number, object, array, whatever). Currently the "nullability" of the string field is set by whether the field is listed as a required field. However, there are examples of schemas where a field is listed as required but it can also be null.
Example 1:
Example 2:
Example 3:
Schema DDL should suggest a nullable string for these examples. Otherwise, Snowplow events with these weird types can fail to get loaded.
The text was updated successfully, but these errors were encountered: