-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support query deletion vector for delta lake #53766
Conversation
6c1fde6
to
64ef676
Compare
Signed-off-by: Youngwb <[email protected]>
@@ -315,6 +315,15 @@ struct TPaimonDeletionFile { | |||
3: optional i64 length | |||
} | |||
|
|||
// refer to https://github.com/delta-io/delta/blob/master/PROTOCOL.md#deletion-vector-descriptor-schema | |||
struct TDeletionVectorDescriptor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is iceberg dv going to reuse this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Signed-off-by: Youngwb <[email protected]>
Quality Gate passedIssues Measures |
uint64_t bitmap_cardinality = roaring64_bitmap_get_cardinality(bitmap); | ||
std::unique_ptr<uint64_t[]> bitmap_array(new uint64_t[bitmap_cardinality]); | ||
roaring64_bitmap_to_uint64_array(bitmap, bitmap_array.get()); | ||
need_skip_rowids->insert(bitmap_array.get(), bitmap_array.get() + bitmap_cardinality); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since eventually need_skip_rowids
will be converted to flat bitmap, we should construct the flat bitmap directly from roaring64_bitmap_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 14 / 14 (100.00%) file detail
|
[BE Incremental Coverage Report]✅ pass : 130 / 145 (89.66%) file detail
|
offset += MAGIC_NUMBER_LENGTH; | ||
|
||
int64_t serialized_bitmap_length = length - MAGIC_NUMBER_LENGTH; | ||
std::unique_ptr<char[]> deletion_vector(new char[serialized_bitmap_length]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just use std::vector.
unique_ptr<char[]> is little weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will modify this next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@Youngwb great job. We can support deletion vector on delta lake finally! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mergify backport branch-3.4 |
✅ Backports have been created
|
Signed-off-by: Youngwb <[email protected]> (cherry picked from commit 66b78c4)
…) (#53944) Co-authored-by: Youngwb <[email protected]>
Why I'm doing:
What I'm doing:
Support query deletion vector for delta lake
Fixes #53767
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: