Dealing with more data #16

petebachant · 2016-02-05T00:11:00Z

Very cool and helpful project! I have created similar ones that I don't make installable since they contain more data, which I setup to be downloaded from figshare as needed within the code, i.e., most analyses only require the smaller processed data CSVs. I'm still not totally happy with this however, since users can't really use the package without being in the project root directory.

Do you think maybe the data should be put in each user's home directory (assuming the data doesn't change) under a folder like $HOME/.shablona/data? This would help save space if users are using the package in conda or virtual envs, right?

I was also considering having users install such that the code is used in place, i.e., python setup.py develop or pip install -e shablona. This way, the data directory would always be known relative to the package directory (I see you've already implemented something similar), and the Python directory won't become bloated with data.

Any thoughts on how to effectively work with more data?

The text was updated successfully, but these errors were encountered:

arokem · 2016-02-05T03:38:20Z

Thanks for taking a look and for the question. Yes - putting the data under the user home directory is a good idea. On other projects, we've developed systems for fetching large(ish) data from urls, validating the hash, and storing it in the user's home directory. That does seem to work well. For details see: https://github.com/nipy/dipy/blob/master/dipy/data/fetcher.py. It would actually be a good idea to refactor the data part here to do that, with the data repository on our library repository, or even better in Figshare. Maybe I will leave this issue open, until we get around to doing that.

arokem mentioned this issue Feb 11, 2016

Added data path example. #3

Open

arokem mentioned this issue Feb 22, 2016

WIP: Started a data module, to eventually replace the data folder. #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with more data #16

Dealing with more data #16

petebachant commented Feb 5, 2016

arokem commented Feb 5, 2016

Dealing with more data #16

Dealing with more data #16

Comments

petebachant commented Feb 5, 2016

arokem commented Feb 5, 2016