You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently on the main EBI SC Expression Atlas tertiary pipeline we run scrubblet as a single process for all samples. When we define a batch variable, then samples are run per batch (which is more correct). Maybe the batch case is acceptable (when it happens, not usually the case for most SC Expression Atlas datasets), but certainly the base case where the whole dataset is used at once is not ideal.
At a first approach, this could be fixed by the galaxy wrapper receiving both a sample_variable (the header in the obs where the samples are defined) and a batch_variable, and when the second is given this overrides the first one. In that case, if the batch variable is not given, then scrubblet is run by default per sample. If none is given of course scrubblet should run as it is (as it has n way of knowing how to partition the dataset). In this setup, scanpy will run scrubblet serially (it would be great scanpy could do this in parallel, but that means code upstream that we don't control).
The text was updated successfully, but these errors were encountered:
Currently on the main EBI SC Expression Atlas tertiary pipeline we run scrubblet as a single process for all samples. When we define a batch variable, then samples are run per batch (which is more correct). Maybe the batch case is acceptable (when it happens, not usually the case for most SC Expression Atlas datasets), but certainly the base case where the whole dataset is used at once is not ideal.
At a first approach, this could be fixed by the galaxy wrapper receiving both a sample_variable (the header in the obs where the samples are defined) and a batch_variable, and when the second is given this overrides the first one. In that case, if the batch variable is not given, then scrubblet is run by default per sample. If none is given of course scrubblet should run as it is (as it has n way of knowing how to partition the dataset). In this setup, scanpy will run scrubblet serially (it would be great scanpy could do this in parallel, but that means code upstream that we don't control).
The text was updated successfully, but these errors were encountered: