Fixed ArrayV2Metadata parameter names #2270

TomAugspurger · 2024-09-30T01:45:16Z

For things like dataclasses.repalce to work, we need the parameter names to match the attribute names.

All the name change between what we have in memory and what the specs requires should happen during serialization / deserialization.

Closes #2269

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

For things like dataclasses.repalce to work, we need the parameter names to match the attribute names. All the name change between what we have in memory and what the specs requires should happen during serialization / deserialization. Closes zarr-developers#2269

d-v-b · 2024-09-30T10:17:09Z

Can you show the problem that this PR solves? From my POV it's a good thing that the metadata classes have the same structure as the underlying zarr metadata document(s).

TomAugspurger · 2024-09-30T11:40:48Z

#2269 has the issue. Doing something like a.attrs.put({"key": 0}) or anything that uses dataclasses.replace will fail with

TypeError: ArrayV2Metadata.__init__() got an unexpected keyword argument 'chunk_grid'

From my POV it's a good thing that the metadata classes have the same structure as the underlying zarr metadata document

Would that mean changing the dataclass fields on ArrayV2Metadata to chunks and dtype? I'm guessing we have it this way currently is to support writing code that's generic over ArrayV2Metadata and ArrayMetadata. IMO, that's more important than having the Python data model exactly match the on-disk representation, since the translation should be isolated to just to_dict and from_dict.

d-v-b · 2024-09-30T12:17:21Z

Would that mean changing the dataclass fields on ArrayV2Metadata to chunks and dtype? I'm guessing we have it this way currently is to support writing code that's generic over ArrayV2Metadata and ArrayMetadata. IMO, that's more important than having the Python data model exactly match the on-disk representation, since the translation should be isolated to just to_dict and from_dict.

oof, thanks for reminding me that ArrayV2Metadata doesn't model the .zarray document completely. TBH I think we should fix that, i.e. change data_type to dtype and chunk_grid to chunks. I will open a PR shortly to do this.

My preference would be that ArrayV3Metadata and ArrayV2Metadata exist for one purpose: to model the metadata documents. It is the job of classes (like AsyncArray) that wrap those metadata documents to provide the generic array-like API. For this reason I'm not a fan of to_dict or from_dict doing any attribute name remapping. Also, remapping the v2 .zarray dtype key to data_type doesn't help the bigger semantic friction between compressor / filters in v2 and codecs in v3. So IMO we should normalize the v2 and v3 data models completely, in one place, and that's exactly what AsyncArray is for.

d-v-b · 2024-09-30T12:27:35Z

I think a design goal for ArrayV2Metadata and ArrayV3Metadata is that they should be as independent as possible from whatever array API we layer on top of them. This ensures that zarr-python can evolve that array API in just one place (the array classes), and also ensures that zarr-python is more useful to libraries that want to work with the Zarr data model, but don't want to use the zarr-python array runtime classes. IMO both of these are weakened if our in-memory array abstraction leaks onto the metadata classes.

TomAugspurger · 2024-09-30T13:49:22Z

I think a design goal for ArrayV2Metadata and ArrayV3Metadata is that they should be as independent as possible from whatever array API we layer on top of them

Mmm, I'm not sure where I stand on that :) I think I agree that it's nice to have something that directly models the stored representation of the data, or is at least trivially convertible to it (and I'd count renaming the key as trivial). But IMO the most important thing is the ability to write code that's generic over v2 and v3. I don't have a strong preference for whether that's achieved by having subtypes with mostly the same interface (ArrayV2Metadata and ArrayV3Metadata using the same field names) or some wrapper class.

I'll leave this on hold for now.

d-v-b · 2024-09-30T13:57:43Z

But IMO the most important thing is the ability to write code that's generic over v2 and v3.

I'm hoping we can have both close models of the metadata documents and achieve this goal. The basic problem is that v2 and v3 are fundamentally different data models, largely due to the filters / compressor vs codecs divide. Because there is no unambiguous mapping from codecs to filters / compressor, I think we can never have a metadata model that offer a truly generic API. But that's OK -- we only have 2 cases to handle, so if metadata is v2 do v2 stuff, else do v3 stuff in the array class seems fine to me. If we had 100 different zarr versions to handle then it would be a different story.

TomAugspurger mentioned this pull request Sep 30, 2024

Zarr Python 3 tracking issue pydata/xarray#9515

Open

4 tasks

jhamman requested a review from d-v-b September 30, 2024 05:29

TomAugspurger marked this pull request as ready for review September 30, 2024 11:40

TomAugspurger mentioned this pull request Sep 30, 2024

Use implicit fill values for zarr v2 #2274

Draft

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed ArrayV2Metadata parameter names #2270

Fixed ArrayV2Metadata parameter names #2270

TomAugspurger commented Sep 30, 2024

d-v-b commented Sep 30, 2024

TomAugspurger commented Sep 30, 2024

d-v-b commented Sep 30, 2024

d-v-b commented Sep 30, 2024

TomAugspurger commented Sep 30, 2024

d-v-b commented Sep 30, 2024

Fixed ArrayV2Metadata parameter names #2270

Are you sure you want to change the base?

Fixed ArrayV2Metadata parameter names #2270

Conversation

TomAugspurger commented Sep 30, 2024

d-v-b commented Sep 30, 2024

TomAugspurger commented Sep 30, 2024

d-v-b commented Sep 30, 2024

d-v-b commented Sep 30, 2024

TomAugspurger commented Sep 30, 2024

d-v-b commented Sep 30, 2024