Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Seurat divided in several modules #2195

Closed
wants to merge 11 commits into from
Closed

[WIP] Seurat divided in several modules #2195

wants to merge 11 commits into from

Conversation

bellenger-l
Copy link

@bellenger-l bellenger-l commented Dec 3, 2018

As discussed in the Galaxy Europeans days, ARTbio divided the Seurat tool in different modules :
* seurat-filtering : Filter out low quality cells and genes
* seurat-normalization : Cell level normalization
* seurat-feature-selection : Find variable genes
* seurat-confounder-removal : Scale data and confounder removal
* seurat-dim-reduction : Dimensional reduction (PCA and tSNE)
* seurat-clustering : Cluster cells
* seurat-diff-expression : Differential expression analysis

The first tool seurat-filtering takes an expression matrix as input, the other tools take a rdata file containing a single Seurat R object. All return at least a rdata file containing the updated Seurat object and if requested the Rscript used by the tool, some return a PDF file containing plots (Filtering, Normalization, Feature selection, Dimensional reduction, Clustering and Differential expression module) and one a tabular file (seurat-diff-expression).

FOR CONTRIBUTOR:

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection
  • - This PR updates an existing tool or tool collection
  • - This PR does something else (explain below)

FOR REVIEWER:

  • .shed.yml file ok
    • Toolshed user iuc has access to associated toolshed repo(s)
  • Indentation is correct (4 spaces)
  • Tool version/build ok
  • <command/>
    • Text parameters, input and output files 'single quoted'
    • Use of <![CDATA[ ... ]]> tags
    • Parameters of type text or having optional="true" attribute are checked with if str($param) before being used
  • Data parameters have a format attribute containing datatypes recognised by Galaxy
  • Tests
    • Parameters are reasonably covered
    • Test files are appropriate
  • Help
    • Valid restructuredText and uses <![CDATA[ ... ]]> tags
  • Complies with other best practice in Best Practices Doc

@mblue9
Copy link
Contributor

mblue9 commented Dec 3, 2018

Hi @bellenger-l I wasn't at the Galaxy Europeans days and hadn't heard there was any update on Seurat. There was already some work on creating Seurat modules here: #2057 (comment)
What was the discussion?

@bellenger-l
Copy link
Author

Hi @mblue9
Following up to the discussion #2057, we decided to split the main scRNAseq analysis tools (scater, Seurat, scanpy, raceID) in 5 modules (Filtering, normalization, dimensional reduction, confounder removal and clustering) with the aim that there are interoperable.

So I contributed to the existing seurat folder in order to follow these guidelines. You are of course welcome to contribute too. In addition, your seurat_scripts tool suite lives in a separate folder so we can also develop this even more modular approache.

Best,
Léa

@mblue9
Copy link
Contributor

mblue9 commented Dec 4, 2018

Ok, communication on that from someone in IUC would have been nice but thanks to you for the update.

@mvdbeek
Copy link
Member

mvdbeek commented Dec 4, 2018

I agree with @mblue9 , deciding (as opposed to discussing) things in a in-person meeting is something we should avoid, to prevent frustration from everyone that couldn't be there. For meetings at the GCC we usually keep notes, I think we should have done that in this case as well (which is not your fault at all @bellenger-l ).

I'm not at all familiar with scRNA-seq, but I understand @pcm32 and @pinin4fjords (and maybe others) have worked on streamlining some seurat scripts in https://github.com/ebi-gene-expression-group/r-seurat-scripts, how much overlap is there with this PR ?

@pcm32
Copy link
Member

pcm32 commented Dec 4, 2018

Hi @bellenger-l we went to some lengths to try discuss this as much as possible with as many members of the community in #2057, we even had a large call with people in the US, Europe and in principle Australia (but I think @mblue9 couldn’t make it), where @bgruening and @bebatut were also present. We really wanted to avoid duplications and join forces. That is why instead of writing our own Galaxy wrappers, we contributed on top of @mblue9's for Seurat. We have made some progress, not only on Seurat, but also along the same lines with Scater (#2117 plus some things I haven't committed - https://github.com/ebi-gene-expression-group/bioconductor-scater-scripts), SC3 (https://github.com/ebi-gene-expression-group/bioconductor-sc3-scripts - we were about to PR things on IUC for this) and Scanpy (#2159 https://github.com/ebi-gene-expression-group/scanpy-scripts). We have not only made Galaxy wrappers, but also intermediate conda packages to expose similar granularity for all these tools outside of Galaxy. It would be great if you could add these additions on top of @mblue9 IUC pull request and we could all join forces and benefit from more man power. Here at the EBI and the Sanger we have 5 part time positions and 1 full time position working on this, so we’re keen to make sure we use the resource effectively.

We would love to benefit as well from your contributions on this, and I think that the best way would be if you could contribute to @mblue9 original PR as we are doing. We would be very happy as well to receive PRs on https://github.com/ebi-gene-expression-group/r-seurat-scripts where you could add any missing functionality and build upon the one we have. The idea of those intermediate packages is to give a coherent usage across different workflows environments and shell usage.

We believe that the best way forward towards making this work available not only to Galaxy but for other workflows environments is to leave the scripting in intermediate conda packages (r-seurat-scripts, scanpy-scripts, etc). If you follow an approach where you put the R scripting logic with the Galaxy wrappers, then to reuse this from other WF environments you need to duplicate that scripting layer. Contributing those additiional functionalities to the intermediate conda packages means your work is reusable by so many more people.

@drosofff
Copy link
Member

drosofff commented Dec 4, 2018

Dear fellows,

I am the boss of @bellenger-l and I feel I should say a few words to release a bit the pressure she is currently experiencing.

It is clear that there is still divergent opinions on the way the things should be done, and I also see that some discussions are still lacking to better develop and avoid duplication of efforts.

This said, I would advocate that a PR is just a Pull Request. In my view, PR is to expose work and to discuss it. Thus nothing that cannot be adjusted or reorganised any more. On a side note, where is the @mblue9 original PR you mentioned ? I missed it.
We do not intend at all to take the lead on anything: consider that we are not even members of the iuc and that we usually develop our tools in https://github.com/ARTbio/tools-artbio.

I would also appreciate if @bgruening who probably have a more general view on the debate as us could say a word on behalf on the group that discussed the scRNAseq tools in Freiburg.

Kind regards

@pcm32
Copy link
Member

pcm32 commented Dec 4, 2018

Esteemed @bellenger-l and @drosofff I apologise if somehow I appeared to be putting any pressure on you guys. I only pursued that we would all work together instead of competing, specially considering that our view at the level of granularity is so congruent. I tried to sound as inviting as I could, but sometimes maybe with the rush of the day I didn't have the time to find the best words. @mblue9 contributions are here #2047. We have some additional commits on top of her work (https://github.com/mblue9/tools-iuc/commits/seurat_scripts/tools/seurat_scripts) but I think that this got there after the PR got merged (which I wasn't aware it was).

@bgruening
Copy link
Member

Hi, I don't feel that well these days, I will try to solve this issue. I thought someone took notes, but if not I will provide a summary from my memories.

Sorrry, not my week.

@mvdbeek
Copy link
Member

mvdbeek commented Dec 4, 2018

To start tests you will need a .shed.yml file, like for instance this one https://github.com/galaxyproject/tools-iuc/blob/master/tool_collections/samtools/sam_to_bam/.shed.yml

@bellenger-l
Copy link
Author

There is the old .shed.yml that I didn't touch, do you suggest to move the different modules in a tool_collection or stay in tools but rearrange .shed.yml like this https://github.com/ARTbio/tools-iuc/blob/master/tools/biom_format/.shed.yml ?

@mvdbeek
Copy link
Member

mvdbeek commented Dec 4, 2018

Oh sorry, didn't notice this, it's fine.

@mblue9
Copy link
Contributor

mblue9 commented Dec 4, 2018

As @mvdbeek and @pcm32 have already covered, it's not good that a decision like this was made in a meeting that people from the community who had already invested time, work and energy weren't at, weren't informed was happening or what had been discussed.

@bellenger-l and @drosofff I'm sorry for any pressure you feel and that you got caught up in this. It must be very frustrating for your group to have put time into contributing this work and to get a response like this. Your work and effort is appreciated.

@bgruening
Copy link
Member

HI all,

first of all, let's clarify some points from my point of view.

  1. There is no reason, that there couldn't be two Galaxy toolsets of the same tool
  2. There is no reason, to have these tools under IUC
  3. It was a mistake to organize this under the IUC account, the general discussion and the organization has nothing to do with the IUC. There are a bunch of people interested in scRNA that are also part of IUC, but that's it imho.
  4. The meeting 2 weeks ago was not an IUC meeting. People at this conference wanted to talk about scRNA-seq and so people did.
  5. We have been traveling the last two weeks and failed to put a summary of this meeting online if someone wants to blame someone - blame me for being too slow and too sick to have done it earlier. @bellenger-l and the entire ART-Bio it is awesome how fast you are.

I will assemble a summary of what we discussed and what is available in various PRs during the day and attach it to #2057

@drosofff I think @mblue9 is pointing to a branch, not a PR, see here: https://github.com/mblue9/tools-iuc/tree/seurat_scripts/tools/seurat_scripts

I would also appreciate if @bgruening who probably have a more general view on the debate as us could say a word on behalf of the group that discussed the scRNAseq tools in Freiburg.

I'm not sure I have, but we will summarize everything today and put it online. Sorry for being later here.

@drosofff
Copy link
Member

drosofff commented Dec 5, 2018

Thanks @bgruening. I agree with your points.
I would also like to take the occasion to stress a difficulty inherent to tool development: even with a fair approach, which I have no doubt that all participants here have, it is difficult to follow or even be aware of efforts by many others in many branches of many repositories. Imho, it is where PRs come: literally, they are explicit invitations to review, comment, amend, and integrate works. Merge is deciding -- no rush ; PR is discussing. Without PRs I feel that issues are more difficult to share, discuss and eventually solve.
Best to all

@mblue9
Copy link
Contributor

mblue9 commented Dec 5, 2018

Thanks for the clarification @bgruening. I hope you get better soon.

@nsoranzo
Copy link
Member

nsoranzo commented Dec 5, 2018

The meeting 2 weeks ago was not an IUC meeting. People at this conference wanted to talk about scRNA-seq and so people did.

I also wanted to stress that this was not an IUC meeting, only 3 persons from IUC (out of 10 attendees) were there. It was organised on the spot and hold during a lunch break. Therefore nothing of what was discussed should be considered binding for the IUC.

The goals of the meeting were to discuss: 1) how to make the various scRNA toolkits interoperable in Galaxy and 2) to coordinate the tool development to not duplicate efforts. It seems we failed spectacularly on the second one, for which I apologise to all the persons involved.

In the future, I'd recommend to open WIP pull requests early in the tool development to make it easier to track who is working on what, and to get feedback on the approach before a lot of work is invested.

@bellenger-l bellenger-l changed the title Seurat divided in several modules [WIP] Seurat divided in several modules Dec 5, 2018
@mblue9
Copy link
Contributor

mblue9 commented Dec 6, 2018

Thanks for the comments @nsoranzo, appreciate it.

In the future, I'd recommend to open WIP pull requests early in the tool development to make it easier to track who is working on what, and to get feedback on the approach before a lot of work is invested.

Just to clarify, that was a reason why the issue here was opened: #2057, as a way to discuss and keep track of the various efforts on Galaxy scRNA work (there are many as we can see), and why it's a shame that it wasn't updated, that could have avoided this kerfuffle. And the reason the Seurat module work was a branch and not a PR was that it was unclear whether part of the wrapper generation could be automated or not. But I agree a PR would have been better, I should have done that.

Just to add, there's already a merged Seurat tool in IUC here: #2047, which made this PR extra surprising and confusing. But hopefully the communication all round on these things (PRs, updates etc) gets better.

@bgruening
Copy link
Member

Please see my comment here: #2057 (comment) with a link to a GDoc, please leave comments in the Doc and keep discussing there.

Just to add, there's already a merged Seurat tool in IUC here: #2047, which made this PR extra surprising and confusing. But hopefully the communication all round on these things (PRs, updates etc) gets better.

Afaik this PR is building on-top of your work. Its in your seurat folder.

@pcm32
Copy link
Member

pcm32 commented Dec 7, 2018

I think that we need to clarify something, our current (@mblue9 + EBI/Sanger) work of Seurat is still on this branch of @mblue9:

https://github.com/mblue9/tools-iuc/tree/seurat_scripts/tools/seurat_scripts

As you can see from those files, these are way anterior to ARTbio's, and it was mostly a lack of coordination that that wasn't PR'ed here after we started using it. As you will see many of the proposed modules by ARTbio where already present there and making use of our r-seurat-script intermediate layer.

I would suggest then that we put the complentary work from ARTbio on top of those scripts; we will be happy to add the extra needed method on r-seurat-scripts (or if @bellenger-l wants to PR it to keep authorship of those modules, that is fine as well). I can take care of merging these two efforts in a single branch and PR it again, keeping the commits of each one of the involved parties. Would that work?

@mblue9
Copy link
Contributor

mblue9 commented Dec 10, 2018

@pcm32 I am fine with your idea or accepting this PR (it's nice work) or trying both. But let's see what the decision makers say.

@pinin4fjords
Copy link

Hi all- I'm a colleague of @pcm32 . Despite a few crossed wires along the way I think this is quite positive, we're clearly converging on a strategy. We all have separate R scripts (rather than embedded code), we're talking about interoperability via Loom, etc etc. I don't think we at the EBI would mind adopting the general granularity decisions made here if it facilitated progress.

The only point at which our approach diverges is that we have the base scripts in external Bioconda packages (so that we can also use them in non-Galaxy ways, which we feel is a great bonus for consistency across platforms). Is that something everyone could get on board with? If so, I think with a bit of work we could converge the approaches fairly quickly in the new year and get our 'many hands' working together.

@bellenger-l
Copy link
Author

We are open to any option that can be tested. I suggest that @mblue9 makes a PR of her work. Then we can peacefully continue the discussion that has been initiated here and in the google doc. There is nothing that can be done with the previous effort duplication. However it will probably easy to merge two PRs in a single one at the end without too much coding effort.

We do not really care of the commit authorships in this particular case. Our main motivation is to get galaxy tools available for better analyses and we will work in line with what was decided by the community.

@pcm32
Copy link
Member

pcm32 commented Dec 12, 2018

Excellent, in that case I'll open a new PR which will include these changes and @mblue9 ones (all original commits) and then we can work on that branch to merge scripts.

@mblue9
Copy link
Contributor

mblue9 commented Dec 13, 2018

Thanks a lot @pcm32.

@bellenger-l I sincerely apologise to you for this very non-peaceful experience. I was quite angry here but not with you, you weren't aware of what had already been discussed. I just really didn't like what had happened - the unilateral decision making and lack of communication in light of the prior discussion, as I don't think it's good in a community. But this is usually an excellent community, and also the omission of making a PR was indeed a major oversight on my part. So I'm sorry if I made this an unpleasant experience for you, I think your work is great and I hope you make more PRs here.

@pcm32
Copy link
Member

pcm32 commented Jan 9, 2019

Hi all, now #2231 includes all known efforts for modular Seurat, I hope that we can keep on working there to merge the functionalities. I will start to add @bellenger-l functionality to our scripts layer, so that we can use what they have in addition from there. I would suggest that work ceases here and is continued at the other branch. I'm happy to give anyone who needs it write access to our iuc checkout or work on a pull request basis.

@pcm32
Copy link
Member

pcm32 commented May 22, 2023

We have a much newer set of tools for this if there is an interest.

@bernt-matthias
Copy link
Contributor

:) .. sure, if they fit for IUC .. closing this for now .. reopen if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants