Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-unique gene names on AnnData affects some usages #278

Open
pcm32 opened this issue Jan 12, 2023 · 3 comments
Open

Non-unique gene names on AnnData affects some usages #278

pcm32 opened this issue Jan 12, 2023 · 3 comments
Labels
persist-seq Requests from Persist-Seq

Comments

@pcm32
Copy link
Member

pcm32 commented Jan 12, 2023

Sometimes the use of some annotations yields non-unique gene names in our AnnData files. It should be easy to add an option in https://github.com/ebi-gene-expression-group/container-galaxy-sc-tertiary/blob/develop/tools/tertiary-analysis/scanpy/anndata_operations.xml that makes the gene names unique in the specified field (either in the same column where they are or in a new column) in the var object. This allows our AnnData file to be used more nicely with viewers like cellxgene, which ask for a unique index for genes (and that is what they show as symbols).

@pcm32 pcm32 added the persist-seq Requests from Persist-Seq label Jan 12, 2023
@pcm32
Copy link
Member Author

pcm32 commented Jan 12, 2023

This could be done with:

adata.var.set_index("gene_symbols")
adata.var_names_make_unique()

@pcm32
Copy link
Member Author

pcm32 commented Jan 12, 2023

or, if we don't want to mess with the index:

def rn(df, field, suffix = '-duplicate-'):
     appendents = (suffix + df.groupby(field).cumcount().astype(str).replace('0','')).replace(suffix, '')
     df[f"{field}_u"] = df[field].astype(str) + appendents.astype(str)
     return df

g_symbols_field = "gene_symbols"
adata.var = rn(adata.var, g_symbols_field, suffix = "_d")

@pcm32
Copy link
Member Author

pcm32 commented Feb 15, 2023

This is going to be resolved as part of #266

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
persist-seq Requests from Persist-Seq
Projects
None yet
Development

No branches or pull requests

1 participant