Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make example data sets public #9

Open
armish opened this issue Jul 18, 2018 · 3 comments
Open

Make example data sets public #9

armish opened this issue Jul 18, 2018 · 3 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested

Comments

@armish
Copy link
Member

armish commented Jul 18, 2018

Here are the ones that we are familiar with and trust the most:

$ gsutil ls gs://musc-codex/datasets/ \
     | grep "20180706\|20180614" \
     | xargs -I@ -P1 bash -c "gsutil du -sh @"

6.65 GiB    gs://musc-codex/datasets/20180614_D22_RepA_Tcell_CD4-CD8-DAPI_5by5
9.88 GiB    gs://musc-codex/datasets/20180614_D22_RepB_Tcell_CD4-CD8-DAPI_5by5
9.4 GiB     gs://musc-codex/datasets/20180614_D23_RepA_Tcell_CD4-CD8-DAPI_5by5
8.81 GiB    gs://musc-codex/datasets/20180614_D23_RepB_Tcell_CD4-CD8-DAPI_5by5
5.55 GiB    gs://musc-codex/datasets/20180706-Donor22-R2-Tcell-CODEX_CD3CD4CD85BY5
5.38 GiB    gs://musc-codex/datasets/20180706-Donor23-R2-Tcell-CODEX_CD3CD4CD85BY5

These would also make testing the framework easier for everybody. The only thing we have to make sure is that making these data sets won't be that costly. We can go with a service like figshare but not sure how their downloading bandwidth scales if we need to download it over and over again.

@hammer: any suggestions?

@armish armish added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Jul 18, 2018
@armish armish self-assigned this Jul 18, 2018
@hammer
Copy link
Member

hammer commented Aug 3, 2018

Those are a bit large for Google Drive.

Nature recommends the IDR: cf. their submission guidelines.

If IDR doesn't want our images, maybe Dryad? I dunno. We should ask Anne Carpenter.

@hammer
Copy link
Member

hammer commented Aug 3, 2018

The Cell Image Library is another alternative

@armish
Copy link
Member Author

armish commented Nov 1, 2018

We now have a dedicated public bucket: gs://cytokit. Added data sets that are relevant to the manuscript:

$ gsutil ls gs://cytokit/datasets/*
gs://cytokit/datasets/cellsize/:
gs://cytokit/datasets/cellsize/20181024-d38-act-20X-5by5/
gs://cytokit/datasets/cellsize/20181024-d38-unstim-20X-5by5/
gs://cytokit/datasets/cellsize/20181024-d39-act-20x-5by5/
gs://cytokit/datasets/cellsize/20181024-d39-unstim-20x-5by5/
gs://cytokit/datasets/cellsize/20181024-jurkat-20X-5by5/
gs://cytokit/datasets/cellsize/20181024-jurkat2-20X-5by5/
gs://cytokit/datasets/cellsize/20181026-pmel-act-20x-5by5/
gs://cytokit/datasets/cellsize/20181026-pmel-act-60x-1by1/
gs://cytokit/datasets/cellsize/20181026-pmel-act-60x-5b5/
gs://cytokit/datasets/cellsize/20181026-pmel-us-20x-5by5/
gs://cytokit/datasets/cellsize/20181026-pmel-us-60x-1by1/

gs://cytokit/datasets/cellular-marker/:
gs://cytokit/datasets/cellular-marker/20180614_D22_RepA_Tcell_CD4-CD8-DAPI_5by5/
gs://cytokit/datasets/cellular-marker/20180614_D22_RepB_Tcell_CD4-CD8-DAPI_5by5/
gs://cytokit/datasets/cellular-marker/20180614_D23_RepA_Tcell_CD4-CD8-DAPI_5by5/
gs://cytokit/datasets/cellular-marker/20180614_D23_RepB_Tcell_CD4-CD8-DAPI_5by5/
gs://cytokit/datasets/cellular-marker/20180927-Tcell-CD3_CD4_CD8_DAPI-20X-5by5/

Will add pointers from the README before closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants