Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the name attribute for derived data variables #321

Open
niksirbi opened this issue Oct 10, 2024 · 2 comments
Open

Set the name attribute for derived data variables #321

niksirbi opened this issue Oct 10, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@niksirbi
Copy link
Member

Describe the bug

When assigning a derived array (e.g. velocity) as a dataset variable, it's name attribute is automatically set to the name we assign to that variable.

However, if we keep it as a standalone array, it's name stays the same as that of the input array from which it was derived.

To Reproduce

>>> from movement import sample_data
>>> from movement.analysis import kinematics as kin
>>>
>>> ds["velocity"] = kin.compute_velocity(ds.position)
>>> ds.velocity.name
'velocity'
>>> velocity = compute_velocity(ds.position)
>>> velocity.name
'position'

Expected behaviour
Both methods in the above example should return 'velocity'. This also affects every other derived variable, such as displacement, acceleration, head_direction etc.

The matter can me easily fixed by setting the name attribute inside the function that computes the variable.
Having an appropriate name is quite handy for printing, plotting, etc.

@niksirbi niksirbi added the bug Something isn't working label Oct 10, 2024
@sfmig
Copy link
Contributor

sfmig commented Oct 18, 2024

From dev meeting today:

  • we can check what is xarray doing, and do something similar?
  • or should we simply set it to empty? (does a variable need a name attribute? this would mean less maintenance)

@niksirbi
Copy link
Member Author

niksirbi commented Oct 18, 2024

Arguments in favour of setting an appropriate name for every derived variable:

  • The name will appear when using built-in xarray plots
  • We will reduce probability of conflicts when using xr.merge (merging objects that have the same name may cause issues).
  • For some built-in saving functions, like to_netcdf() (an .h5-based format) the variable names in the saved file will be meanigful (though we don't really use that file format)
  • We get the chance to nudge users towards "standardised" naming conventions (users may be inclined to stick with these when adding arrays to datasets)

Arguments in favour of setting an empty name for every derived variable:

  • users have full freedom to set that to whatever they like later, we make no assumed choices for them (though they could always override our choice anyway)
  • we reduce the risk of setting the "wrong" name, or these names becoming outdated after downstream operations. No plots with inadvertently wrong names.
  • we don't have to decide what that name should be when writing a new function

Full disclosure:
I personally currently favour the first approach, but both are better than the status quo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🤔 Triage
Development

No branches or pull requests

2 participants