Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release R137 #1238

Merged
merged 4 commits into from
Nov 7, 2022
Merged

Release R137 #1238

merged 4 commits into from
Nov 7, 2022

Conversation

istreeter
Copy link
Contributor

No description provided.

@snowplowcla
Copy link

Thanks for your pull request. Is this your first contribution to a Snowplow open source project? Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://docs.snowplowanalytics.com/docs/contributing/contributor-license-agreement/ to learn more and sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

@istreeter
Copy link
Contributor Author

The yauaa schema already got thoroughly reviewed in a previous PR. It was not included in the R136 release, because of concerns about breaking RDB loader migrations.

Since then, RDB Loader 4.3.0 was released, which fixes the bug in which Redshift columns were not altered for the new field lengths. So we are safe to release a new version of yauaa schema with longer field lengths.

We have not, however, fixed a different RDB Loader bug which causes problems when evolving a schema to have new fields. This bug is going to take longer to fix. It is a race condition, so it will affect some pipelines, but not others. Given the high number of pipelines that use the yauaa schema, I think it is dangerous to add a new field to this schema.

This new PR therefore drops the idea of adding a new field webviewAppNameVersion. This field is generated by newer versions of the yauaa lib, but we will configure Enrich so it filters out this field when we update the yauaa version.

VALIDATOR_URI="https://github.com/snowplow-incubator/igluctl/releases/download/${VALIDATOR_V}/igluctl"
VALIDATOR_V="0.10.1"
VALIDATOR_ZIP="igluctl_${VALIDATOR_V}.zip"
VALIDATOR_URI="https://github.com/snowplow-incubator/igluctl/releases/download/${VALIDATOR_V}/$VALIDATOR_ZIP"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how long GitHub maintains redirecting old repo addresses, it works now but the repo lives under snowplow now, you might want to change that to

VALIDATOR_URI="https://github.com/snowplow/igluctl/releases/download/${VALIDATOR_V}/$VALIDATOR_ZIP"

@voropaevp
Copy link
Contributor

LGTM

@istreeter istreeter force-pushed the feature/yauaa-1-0-4 branch from 1e0ada7 to 60e8667 Compare October 25, 2022 14:01
@istreeter
Copy link
Contributor Author

Update: I also removed format: email from the field agentInformationEmail in the yauaa schema. Because the new yauaa lib outputs "Unknown" for this field. I can't imagine this will upset any downstream consumers of this data.

@paulboocock
Copy link
Contributor

LGTM, one note, I think we should make it very clear in the Enrich release which includes this new schema that they should be running RDB Loader 4.3+ to ensure they don't hit the length issue.

@istreeter
Copy link
Contributor Author

@paulboocock it's a bit more complicated: The RDB Loader migrates tables immediately when the schema is published. So if a pipeline has events containing 1-0-3 and tomorrow we release 1-0-4 then the loader would tomorrow migrate the table. It's always been this way, and it's a horrible design decision.

So for anyone running a version of RDB Loader released between April 2022 and 6th September 2022, the migration will "run" but won't actually change the column lengths. Nothing will break, and open source users won't immediately notice that anything happened.

It could only affect pipelines later, when they upgrade to an Enrich with a newer version of yauaa. Some events might exceed the maximum lengths in their warehouse, and then loading would break. When it gets this far, it will not even help to update their RDB loader version.

It's a horrible mess. My only suggestions to alleviate this problem are:

  • Put big warnings on the next Enrich release, so Redshift users know to check their warehouse columns before updating
  • Or release this schema as version 2-0-0. But that sets precedent for using major bump for all iglu-central schema patches.

@paulboocock
Copy link
Contributor

I think:

Put big warnings on the next Enrich release, so Redshift users know to check their warehouse columns before updating

is likely the best result we've got here in terms of balancing the situation. Bumping to 2-0-0 is also going to be extremely disruptive for users (data models everywhere) and I'd hope that anyone who is running an upgraded RDB Loader since April 2022 is reasonably on top of their upgrades.

@istreeter istreeter changed the title Add yauaa 1-0-4 Release R137 Oct 31, 2022
@istreeter
Copy link
Contributor Author

This is now tested against RDB Loader 4.3.0 and against the pending Enrich PR

Copy link

@mharvey83 mharvey83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the changes made.

@istreeter istreeter force-pushed the feature/yauaa-1-0-4 branch from 4272b14 to 85ca48c Compare November 2, 2022 08:10
Release 137 (2022-11-01)
------------------------
Add nl.basjes/yauaa_context/jsonschema/1-0-4 (#1216)
Bump igluctl version to allow schema lists in dev registry (#1236)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add ?
Add dev.amp.snowplow/amp_session/jsonschema/1-0-0 (#1235)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @igneel64 -- I had changed it locally and then I forgot to push it!! Fixed now.

@istreeter istreeter force-pushed the feature/yauaa-1-0-4 branch from 85ca48c to 0ce84c7 Compare November 2, 2022 14:21
Copy link
Member

@jbeemster jbeemster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this one but going to wait to merge it until closer to EMEA coming online.

On top of the warning message can we also prepare some SQL statements for Redshift that would allow a user to fix their YAUAA table in case they do fall in that awkward loader version window that doesn't perform the update correctly?

That means that when we merge this in we can post a message on Discourse with an advisory and a way to fix it if they are unlucky enough.

Wdyr @istreeter ?

@istreeter
Copy link
Contributor Author

To be clear... merging this PR is not the dangerous merge. The dangerous step (for some pipelines) will be upgrading enrich at a later date.

Having said that, it's sensible to merge this at a time of day when people are on hand to trouble shoot something unexpected. And it's courteous to put a notice on discourse just in case.

Bear with me until I prepare a notice. Then hopefully you can merge this either Friday morning, or a morning early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants