Should `ak.unzip` return both the field names and the values? #1043

agoose77 · 2021-08-02T14:59:45Z

agoose77
Aug 2, 2021
Maintainer

Description of new feature

By name, ak.unzip is the logical inverse of ak.zip. However, at the moment, the API is not symmetrical; ak.zip can produce tuples or named records from the input argument, whereas ak.unzip always returns a tuple of arrays.

If this can be improved upon, then one option is to return a mapping or a tuple according to the array type. This would work, but I'm not convinced that returning different types is the best design.

Perhaps a better solution would be to return two tuples; one for the keys, and one for values?

fields, values = ak.unzip(array)

Although users would still need to handle the two cases differently (named records vs tuples), this would be as simple as testing len(keys) (or just bool(keys) if one is being Pythonic). This is also true of using ak.fields with ak.unzip, so I don't think it's a strong counterargument.

This issue proposes to clean-up ak.unzip, but clearly it would entail a breaking change.

jpivarski · 2021-08-02T15:58:15Z

jpivarski
Aug 2, 2021
Maintainer

ak.zip and ak.unzip were not intended to be perfect inverses of each other. ak.flatten and ak.unflatten are not perfect opposites either, and for the same reason: to invert ak.zip, you need both ak.fields and ak.unzip; to invert ak.unflatten, you need both ak.num and ak.flatten.

Personally, I feel like it's enough to document this. As a user, I wouldn't expect perfect symmetry. (Though I'm not a user; actual users are invited to chime in below!) I would, however, want convenience, and having to unpack a pair of fields and values from ak.unzip when I usually only want the values would be a stumbling block (and at this point, a breaking change). Interfaces that return a tuple to unpack presuppose that it's going to be used as a separate line of code, as a statement, since in Python 3 only assignment statements can unpack tuples, which is great for a demonstrative example, but it interferes with using a function in an expression. (Note: Python 2 could unpack tuples in function arguments!)

The main use-case for ak.unzip is to turn the "everything is in one array" result from something like ak.cartesian or ak.combinations into "parts of that calculation are in different arrays so that I can use them in separate formulas." At times, I've thought that ak.cartesian and ak.combinations had the wrong interface—that they should return results in split form—but adding ak.unzip eliminated that need, since it can be a two-part idiom. Unlike an extra function argument in ak.cartesian and ak.combinations, a separate ak.unzip function can be used in other contexts as well. That, anyway, was the motivation.

1 reply

agoose77 Dec 23, 2021
Maintainer Author

I'm just adding to this after some time has elapsed. It occurs to me now that we have ak.fields which is a dict-like metaphor for the keys of the RecordArray. I wonder whether ak.unzip should really be ak.values and ak.unzip. However, at this juncture, this would be a major API breakage for "aesthetics", which is clearly poor motivation. I think I was using ak.unzip with ak.fields in a particularly niche case of re-zipping some fields, e.g. with jagged RecordArrays from uproot. This is the only use-case that I have encountered thus far.

So, I think I'm happy with the current API for now!

Additionally, I do miss the extending unpacking that Py2 supported, although I can see the motivations for removing it.

nikoladze · 2024-05-31T14:30:08Z

nikoladze
May 31, 2024

From time to time i end up having arrays where i want to turn a RecordArray(ListOffsetArray) into a ListOffsetArray(RecordArray). Usually i try to avoid creating it like that, but just for an example:

>>> import awkward as ak
>>> array = ak.Array([{"a": [1, 2], "b": [3, 4]}, {"a": [5], "b": [6]}])
>>> array.typestr
'2 * {a: var * int64, b: var * int64}'

to make it into a '2 * var * {a: int64, b: int64}' i have to resort to sth like

>>> ak.zip(dict(zip(array.fields, ak.unzip(array))))

which is a bit clumsy.

In case others have that use case as well - would it make sense for ak.zip to support having an awkward array as input where it would attempt to zip the top level fields?

3 replies

jpivarski May 31, 2024
Maintainer

It's too late to change ak.unzip, but if you can think of a memorable name for this operation, it would be a nice addition. I've recommended the idiom that you described as well. Since it has to be done fairly often, a helper function (that propagates depth_limit) would be nice. This kind of function would not be too complicated to implement and add to src/awkward/operations.

With ak.transform, you could even make it apply at some axis depth, not always the top level.

Perhaps one function or a pair of similarly named functions could convert in both directions between records-of-lists and lists-of-records. Names starting with ak.from_ and ak.to_ are very common in the codebase. Perhaps

ak.to_lists_of_records
ak.from_lists_of_records

would be a good pair? The ak.to_lists_of_records direction is not guaranteed: it depends on all fields in the records-of-lists having the same list lengths (up to depth_limit). The other direction always works.

nikoladze Jun 4, 2024

I like this idea, so something like

import awkward as ak

def to_lists_of_records(array, axis=0, depth_limit=None):
    def transform(layout, depth, **kwargs):
        if depth == axis + 1:
            if not layout.is_record:
                raise ValueError(f"No record at axis={axis}")
            return ak.zip(
                dict(zip(layout.fields, ak.unzip(layout, highlevel=False))),
                depth_limit=depth_limit,
                highlevel=False,
            )

    return ak.transform(transform, array)

def from_lists_of_records(array, axis=0):
    def transform(layout, depth, **kwargs):
        if depth == axis + 1:
            if not layout.is_list:
                raise ValueError(f"No list at axis={axis}")
            return ak.contents.RecordArray(
                ak.unzip(layout, highlevel=False),
                layout.fields,
            )

    return ak.transform(transform, array)

a = ak.Array([{'a': [1, 2, 3], 'b': [4, 5, 6]}, {'a': [7, 8], 'b': [9, 10]}, {'a': [], 'b': []}])
b = ak.Array([[{'a': [1, 2, 3], 'b': [4, 5, 6]}], [{'a': [7, 8], 'b': [9, 10]}, {'a': [], 'b': []}]])

>>> to_lists_of_records(a)
<Array [[{a: 1, b: 4}, {...}, {...}], ..., []] type='3 * var * {a: int64, b...'>
>>> to_lists_of_records(b, axis=1)
<Array [[[{a: 1, b: 4}, ..., {...}]], ...] type='2 * var * var * {a: int64,...'>
>>> from_lists_of_records(to_lists_of_records(a))
<Array [{a: [1, ..., 3], b: [...]}, ..., {...}] type='3 * {a: var * int64, ...'>
>>> from_lists_of_records(to_lists_of_records(b, axis=1))
<Array [{a: [[1, ...]], b: [[...]]}, ...] type='2 * {a: var * var * int64, ...'>
>>> from_lists_of_records(to_lists_of_records(b, axis=1), axis=1)
<Array [[{a: [1, ..., 3], b: [...]}], ...] type='2 * var * {a: var * int64,...'>

would also need treatment for tuple-type records and test if depth_limit works

If you think that would make sense i can submit a PR working this out

jpivarski Jun 5, 2024
Maintainer

Yes, that's it exactly! The depth_limit would take on a new meaning: it would become a limit on the depth after the chosen axis, so if axis=2 and depth_limit=3, then it would apply to axes 2, 3, and 4. That's still intuitive/sensible, though it might need to be explicitly documented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should `ak.unzip` return both the field names and the values? #1043

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Should ak.unzip return both the field names and the values? #1043

agoose77 Aug 2, 2021 Maintainer

Description of new feature

Replies: 2 comments · 4 replies

jpivarski Aug 2, 2021 Maintainer

agoose77 Dec 23, 2021 Maintainer Author

nikoladze May 31, 2024

jpivarski May 31, 2024 Maintainer

nikoladze Jun 4, 2024

jpivarski Jun 5, 2024 Maintainer

Should `ak.unzip` return both the field names and the values? #1043

agoose77
Aug 2, 2021
Maintainer

Replies: 2 comments 4 replies

jpivarski
Aug 2, 2021
Maintainer

agoose77 Dec 23, 2021
Maintainer Author

nikoladze
May 31, 2024

jpivarski May 31, 2024
Maintainer

jpivarski Jun 5, 2024
Maintainer