-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clashes in code terminology #293
Comments
@jkbhagatio I'm not sure I follow the rationale and unless this is causing some fundamental issues I would avoid introducing breaking changes, since all the examples we have so far would stop working. Do you have maybe a specific example of how this can be a problem? Maybe looking over some concrete code would help me to better understand the issue. |
I mean, it is very typical in object oriented design if you have a property that is a collection of "Stream" objects, to call that property "streams". Same for modules that manipulate entities with the same name as the module, e.g. from datetime import datetime, from scipy.fft import fft, etc. I feel it will make the code more obscure and harder to learn if we introduce "catch-all" terms such as "registry" or "binder" which don't immediately refer to anything in Aeon terminology. |
I've noticed when explaining Aeon data architecture conventions to others, current terminology has caused some confusion - specifically when the same name refers to multiple meanings, as mentioned here. I've been writing a tutorial going over Aeon data architecture which will be useful
Happy to go over this to show how using distinct terminology makes things a bit clearer for at least myself and a couple others. To address a couple comments:
Actually the refactoring for this in aeon_mecha and aeon_analysis was quite quick. For any other floating scripts, I imagine it wouldn't be hard to make the switch - it's just a name change on an import typically.
Indeed, but in this case there is conflation between a software object and a file / data source. We typically refer to the former as a 'Reader' object, and the latter as a 'stream.' A |
I don't understand this issue. Files / data sources are always represented as software objects in every API that I know of. What exactly do you mean here by "software object"? I don't think we represent the file streams in conflicting ways anywhere in the aeon_mecha api, so I'm still not sure where the confusion comes from. Access to individual file streams is always performed via the streams property in the We could in theory rename
Sure, I would also be happy to go over this, if anything to ensure that everyone understands how the object model was designed and meant to be used. I feel we need to be careful of refactoring the whole API for the purposes of explaining. It's not clear to me that "streams" should be restricted to just files on disk. A video stream being acquired in Bonsai is still a "stream", even though it's not a file yet. Maybe the confusion here is coming from conflating storage details with the concept of a stream. A stream is a sequence of data, and it can be live, frozen in a file, stored in a DB, or generated procedurally, any of these would still be a stream, it's not really a problem, and in fact being storage-independent is usually one of the most important properties of streams whenever I see them included in software data architectures.
I think breaking old scripts is always problematic, no matter how tiny or quick the change is, and especially in a dynamic language like python. We designed this API together over quite some time and with a lot of discussion about exactly these points, so I think we should probably sit and walk through that design process again carefully to review and understand what exactly the problem is. |
Linking this notebook here as relevant for this convo: https://github.com/SainsburyWellcomeCentre/aeon_mecha/blob/social_schema/docs/examples/understanding_aeon_data_architecture.ipynb |
Currently in Aeon terminology, I believe all of the following are referred to as 'streams':
A single file type on ceph.
(e.g. each of the
Patch1_90*
,CameraTop_200*
, andEnvironment_SubjectVisits*
files)A dictionary pairing a name and a
Reader
object.(e.g. the
Device
class has astream
attribute that contains a dict of this type of pairing that could look something like{'EnvironmentState': <aeon.io.reader.Csv object at 0x7f22a95ee650>}
)A function that returns the above dictionary.
(e.g. the functions in
aeon/schema/core.py
https://github.com/SainsburyWellcomeCentre/aeon_mecha/blob/main/aeon/schema/core.py)Similarly, 'schema' refers to both:
A set of 'streams' (2nd definition above) grouped into a list of
Device
objects within aDotMap
object.A Python module that contain 'streams' (3rd definition above) (e.g.
aeon/schema/core.py
, the file itself)Due to this confusion, I'd suggest renaming some of these terms:
I'd call the 2nd definition of 'stream' a 'registry' (it implies a registry of
Reader
objects that can be looked up by name).I'd call the 3rd definition of 'stream' a 'binder' function (it implies creating the dictionary that binds a name to a
Reader
object)I'd call the 2nd definition of 'schema' a 'reader_binder' module (it implies that the module contains 'binder' functions for
Reader
objects).Subsequently, I'd move
aeon/schema/
toaeon/io/reader_binder/
The text was updated successfully, but these errors were encountered: