Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add table & reader for Environment Active Configuration #397

Conversation

ttngu207
Copy link
Contributor

Fix #391

@@ -6,3 +8,19 @@ class Pose(Stream):

def __init__(self, path):
super().__init__(_reader.Pose(f"{path}_202_*"))


class EnvActiveConfigReader(_reader.JsonList):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling out specific attributes from the JSON into columns might be a common enough operation that we could provide it at the level of the general reader, i.e. provide a columns property to specify which values to pull out from each record.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a common operation that we should support more generally.

The tricky part is from where do we pull these additional columns out?
In the ActiveConfig jsonl file, we know we want to pull from the column named value from the file, but more generally, we wouldn't necessarily know this.

Perhaps we would then pass instead of just columns, but a tuple of (col_name_to_pull, attr_to_pull_from)
But my concern is that this logic is not longer at the level of general reader.

Any other thoughts/suggestions would be great

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, we can do something like this:

class JsonList(Reader):
    """Extracts data from json list (.jsonl) files, where the key "seconds"
    stores the Aeon timestamp, in seconds.
    """

    def __init__(self, pattern, columns=None, col_tuples=[], extension="jsonl"):
        super().__init__(pattern, columns, extension)
        self.col_tuples = col_tuples

    def read(self, file):
        """Reads data from the specified jsonl file."""
        with open(file, "r") as f:
            df = pd.read_json(f, lines=True)
        df.set_index("seconds", inplace=True)
        for new_col, from_attr in self.col_tuples:
            df[new_col] = df.apply(lambda x: x[from_attr])
        return df

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(with better naming... col_tuples is a horrible name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glopesdev I don't have a strong preference on whether the pulling out of the column names occurs (whether in JsonList or in the lower-level Reader), so I guess you can make a final call here and then we can just go with that?

@@ -6,3 +8,19 @@ class Pose(Stream):

def __init__(self, path):
super().__init__(_reader.Pose(f"{path}_202_*"))


class EnvActiveConfigReader(_reader.JsonList):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class EnvActiveConfigReader(_reader.JsonList):
class ActiveConfigurationReader(_reader.JsonList):

The names of the reader and stream don't line up, since one is EnvActiveConfig and the other is ActiveConfiguration. Why not just rename the reader as ActiveConfigurationReader? I think the Env prefix might be unnecessary in this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll rename

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I actually like the Env prefix because it's more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed to EnvActiveConfigurationReader - keeping the Env here to communicate more explicitly that this is for the Environment Active Configuration, just ActiveConfiguration may be too vague or too general

@@ -563,6 +563,47 @@ def make(self, key):
)


@schema
class EnvironmentActiveConfig(dj.Imported):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we can agree on whether to use Config or Configuration as a common suffix for all classes dealing with these files. I think I am fine either way.

return data


class ActiveConfiguration(Stream):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would actually add Env here too

@@ -6,3 +8,9 @@ class Pose(Stream):

def __init__(self, path):
super().__init__(_reader.Pose(f"{path}_202_*"))


class EnvActiveConfiguration(Stream):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks great, I would just rename this stream to be called EnvironmentActiveConfiguration rather than the shorthand version, but not critical either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ttngu207
Copy link
Contributor Author

@jkbhagatio should be ready to merge now

@jkbhagatio jkbhagatio merged commit 9fedabc into SainsburyWellcomeCentre:datajoint_pipeline Sep 20, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants