You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Very cool and helpful project! I have created similar ones that I don't make installable since they contain more data, which I setup to be downloaded from figshare as needed within the code, i.e., most analyses only require the smaller processed data CSVs. I'm still not totally happy with this however, since users can't really use the package without being in the project root directory.
Do you think maybe the data should be put in each user's home directory (assuming the data doesn't change) under a folder like $HOME/.shablona/data? This would help save space if users are using the package in conda or virtual envs, right?
I was also considering having users install such that the code is used in place, i.e., python setup.py develop or pip install -e shablona. This way, the data directory would always be known relative to the package directory (I see you've already implemented something similar), and the Python directory won't become bloated with data.
Any thoughts on how to effectively work with more data?
The text was updated successfully, but these errors were encountered:
Thanks for taking a look and for the question. Yes - putting the data under the user home directory is a good idea. On other projects, we've developed systems for fetching large(ish) data from urls, validating the hash, and storing it in the user's home directory. That does seem to work well. For details see: https://github.com/nipy/dipy/blob/master/dipy/data/fetcher.py. It would actually be a good idea to refactor the data part here to do that, with the data repository on our library repository, or even better in Figshare. Maybe I will leave this issue open, until we get around to doing that.
Very cool and helpful project! I have created similar ones that I don't make installable since they contain more data, which I setup to be downloaded from figshare as needed within the code, i.e., most analyses only require the smaller processed data CSVs. I'm still not totally happy with this however, since users can't really use the package without being in the project root directory.
Do you think maybe the data should be put in each user's home directory (assuming the data doesn't change) under a folder like
$HOME/.shablona/data
? This would help save space if users are using the package in conda or virtual envs, right?I was also considering having users install such that the code is used in place, i.e.,
python setup.py develop
orpip install -e shablona
. This way, the data directory would always be known relative to the package directory (I see you've already implemented something similar), and the Python directory won't become bloated with data.Any thoughts on how to effectively work with more data?
The text was updated successfully, but these errors were encountered: