-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
df
property of AtlasMapTopics does not include topic ids, only topic labels
#188
Comments
You can access the IDs through the metadata component, labels are unique
across an entire map.
This brings a good question, would you prefer the IDs or labels in the data
frame representation?
Also, what do you think of the new state access patterns? Do they make
sense?
…On Thu, Jun 29, 2023, 12:02 PM Michael Robinson ***@***.***> wrote:
@Property
def df(self) -> pandas.DataFrame:
"""
A pandas dataframe associating each datapoint on your map to their topics as each topic depth.
"""
return self.tb.to_pandas()
@Property
def tb(self) -> pa.Table:
"""
Pyarrow table associating each datapoint on the map to their Atlas assigned topics.
This table is memmapped from the underlying files and is the most efficient way to
access topic information.
"""
return self._tb
print(topic_data.df[0:1])
id topic_depth_1 topic_depth_2 topic_depth_3
0 18963 Music videos Youtube, bitchute, youtube video
—
Reply to this email directly, view it on GitHub
<#188>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADJ4TBT65P3LNSNJK5BH2QLXNWRHRANCNFSM6AAAAAAZYXADVA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Depends on the use case. If you want to find the labels for a given datum, then you want the labels in the data frame. If you want to find the datums matched to a given topic, then you want the IDs. I think given that it is a data frame, there shouldn't be any obstacle to simply adding a column for the topic ID to the current representation? That way one representation can be used for either use case.
There was a fair amount of inconsistency in the previous API. The new data frames as properties has the advantage of consistency, and leveraging the pandas ecosystem. It has the disadvantage of an additional learning hurdle for developers who are coming from plain vanilla Python without pandas experience. I think this can be mitigated with good step-by-step documentation, and I like the idea of re-doing the documentation as |
You left a previous issue on 1.x about accessing datapoints by topics: #183 Are you still facing this. from nomic import AtlasProject
project = AtlasProject(name='My Project')
map = project.maps[0]
print(map.topics.group_by_topic(3)) |
No, I still get the same error as #183 using the 2.0.0 version of the library:
However, with the topic id included in the data frame, as discussed above, the |
The text was updated successfully, but these errors were encountered: