Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add load_tstamp field #572

Closed
wants to merge 5 commits into from
Closed

Add load_tstamp field #572

wants to merge 5 commits into from

Conversation

spenes
Copy link
Contributor

@spenes spenes commented Sep 9, 2021

No description provided.

@spenes spenes changed the title RDB Loader: Add load_tstamp field (close #571) Add load_tstamp field Sep 9, 2021
Copy link
Contributor

@chuwy chuwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@spenes spenes force-pushed the enhancement/load-tstamp branch from 5024381 to 175f811 Compare September 14, 2021 11:04
case s: DSchemaList.Single =>
val message = s"Illegal State: updateTable called for a table with known single schema [${s.schema.self.schemaKey.toSchemaUri}]\ncolumns: ${columns.mkString(", ")}\nstate: $state"
case _: DSchemaList.Single =>
val message = s"Illegal State: updateTable called for a table with known single schema [${current.toSchemaUri}]\ncolumns: ${columns.mkString(", ")}\nstate: $state"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chuwy I just changed the s.schema.self.schemaKey to current as you described in the issue. Couldn't be sure what to do improve it further. If you have any idea, just let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can change Schema case class there to JSON as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find where Schema case class is used in here. Could you show me, please ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We realized that Schema case class of SchemaDDL has derived encoder and it outputs lots of null values. We decided with @chuwy that best way is updating encoder in SchemaDDL initially then use it in here. Related issue on SchemaDDL can be found in here: snowplow/schema-ddl#105

@spenes spenes requested a review from chuwy September 14, 2021 11:23
@spenes
Copy link
Contributor Author

spenes commented Sep 14, 2021

Hey @chuwy! Could you give it a look one more time ?

Comment on lines 112 to 115
.fold(_ => (f::l, r), {
case time if since.fold(true)(_.isBefore(time)) && until.fold(true)(_.isAfter(time)) => (l, f::r)
case _ => (l, r)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think pattern match would work better here:

Common.parseFolderTime(time) match {
  case Right(time) if since.fold(true)(_.isBefore(time)) && until.fold(true)(_.isAfter(time)) => (l, f :: r)
  case Right(_) => (l, r)
  case Left(_) => (f :: l, r)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized using filter is better than foldLeft therefore change the function a bit. I think fold is going better in here with the new version of this function but let me know if you still think otherwise.

})
} else (f::l, r)
}
inInterval.reverse ++ nonTimeFormatted.reverse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points on performance:

  1. My hunch is that first concatenating, then reversing would be faster, given how bad the concatenation of two linked lists?
  2. Also IIRC (can be wrong - worth to check) list concatenation is much faster if your first list is smaller - it does a lot less iterations.

I wouldn't be usually concerned about it, but these lists could have tens of thousands of elements and it can cause big unnecessary delays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, definitely agree. I will look into it 👍 It will look like this in the end: (nonTimeFormatted ++ inInterval).reverse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized using filter is much more better than foldLeft therefore change the function a bit. We don't have to use list concat or reverse with this way which is a big plus I guess.

case s: DSchemaList.Single =>
val message = s"Illegal State: updateTable called for a table with known single schema [${s.schema.self.schemaKey.toSchemaUri}]\ncolumns: ${columns.mkString(", ")}\nstate: $state"
case _: DSchemaList.Single =>
val message = s"Illegal State: updateTable called for a table with known single schema [${current.toSchemaUri}]\ncolumns: ${columns.mkString(", ")}\nstate: $state"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can change Schema case class there to JSON as well?

@spenes spenes changed the base branch from master to develop October 13, 2021 09:10
@spenes spenes changed the base branch from develop to master October 13, 2021 09:25
@spenes
Copy link
Contributor Author

spenes commented Oct 28, 2021

Moved to #604, closing.

@spenes spenes closed this Oct 28, 2021
@spenes
Copy link
Contributor Author

spenes commented Dec 9, 2021

Reopened since it is removed from 2.1.0 release.

@spenes spenes reopened this Dec 9, 2021
@spenes spenes closed this Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants