Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Risk of losing ephys data on robocopy race condition #547

Open
glopesdev opened this issue Jun 5, 2024 · 0 comments
Open

Risk of losing ephys data on robocopy race condition #547

glopesdev opened this issue Jun 5, 2024 · 0 comments
Labels
bug Something isn't working critical Urgent, high-priority issues

Comments

@glopesdev
Copy link
Collaborator

We observed an unexpected mismatch in the size of one of the ephys chunks, as shown below:

image

Such a mismatch should never happen at the level of logging, even if there were to be dropped ephys data, since chunking is done based on number of samples, and not on time.

The most likely explanation is that this was a race condition when running robocopy with the /MOVE option. As described in the docs, this command "Moves files and directories, and deletes them from the source after they're copied."

From the size of the partial file, we estimate around 4 minutes of data at the end were lost. If robocopy was called to copy the file close to the end of the chunk, the partial ~75GBytes of data could have taken 4 minutes to copy over (taking into account all the other experimental files potentially in the folder). If the file were to be closed during this process of copy, then it could happen that by the time robocopy tries to delete the file, the handle is already free and the final flushed file would be deleted without subsequent copy.

This does not happen with other AEON data since chunking is aligned to hourly boundaries, while copy happens in the half-hours, thereby preventing this kind of race conditions. On ephys data because the current grouping is by sample count exclusively we do not have such guarantees.

To prevent this, we would need to find a way to guarantee non-overlap in time between closing of ephys files and robocopy transfers.

@glopesdev glopesdev added bug Something isn't working critical Urgent, high-priority issues labels Jun 5, 2024
@glopesdev glopesdev added this to the Neuropixels probe recordings milestone Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working critical Urgent, high-priority issues
Projects
None yet
Development

No branches or pull requests

1 participant