Fix pooch retrieval of file registry #260
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
What is this PR?
Why is this PR needed?
The file registry file is not overwritten everytime we run tests as we intend.
Right now the
files-registry.txt
file is not always downloaded fresh from GIN.The current comment next to the
pooch.retrieve
call incorrectly states that ifknown_hash=None
the file is always downloaded. But in fact whenknown_hash = None
and the file already exists, the file is NOT downloaded. See the chain of events below:If
known_hash = None
and the file already exists:- >
hash_matches
returns TRUE (see here)- >
download_action
returns "fetch, Fetching" (see here)->
pooch.retrieve
does nothing (see here)What does this PR do?
To force the download of this file everytime, we need to ensure that the file does not exist before downloading it from GIN. So this PR deletes any files with the expected name at the expected location.
Additionally,
pooch.retrieve
for the file. Otherwise If filename is None, it is set as<hash-of-the-url>
+<last-part-of-url>
.wheel
as an additional dependency to thecheck-manifest
precommit, so that it runs correctly in an environment with Python 3.12.MWE to reproduce
I assume initially
(Path.home() / ".crabs-exploration-test-data" / "files-registry.txt").is_file()
isFalse
.In a conda environment with
pooch
, andipython
(for convenience), start an interactive Python session and run the snippet below.Snippet 1
When we run Snippet 1 several times, the file is not re-downloaded. We can verify this because we get the following message only the first time we run the snippet, but not any subsequent ones:
Instead if we add a few lines to remove the "files-registry.txt" file before fetching it from GIN, we see the expected download message everytime.
Snippet 2
References
\
How has this PR been tested?
Tests pass locally and in CI.
Does this PR require an update to the documentation?
\
Checklist: