Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up annotations without ids #2

Open
wbazant opened this issue Nov 2, 2017 · 0 comments
Open

Clean up annotations without ids #2

wbazant opened this issue Nov 2, 2017 · 0 comments

Comments

@wbazant
Copy link

wbazant commented Nov 2, 2017

Sometimes BioMart returns annotations of the form

id<tab><empty space>

We want to remove these in almost all cases - the only places where we don't is:

  • array designs
  • gene id to gene name files
    The reason we don't: we use them for decorating files, and we assume that they'll be complete.

The benefit from this is operational efficiency: about 30% less space, and quite a few processes will run by this much faster. The resulting files will be also slightly more "correct" in the abstract sense of representing the annotations we want.

To implement this functionality you will need to add a new piece in the file Transform.sc, and then test it, and then take the annotation update part of atlasprod for a run with the resulting annotations. I am fairly certain about array design files and gene id to gene name files being the only ones where we want the blanks but it would need verifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant