-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Seurat first version #2047
Add Seurat first version #2047
Conversation
This is another one that passes fine locally but Travis is failing :( The error is
|
Thanks for the new conda package @bgruening! Now there's the
|
Yeah, got stuck with this yesterday. The problem is again the solver :( I need more time for this. |
Ok no worries, I have plenty of other work to keep me busy in the meantime! |
until we can use a newer conda version
Sorry, took me a while and some upstream work. There is currently a huge solver rework in progress from the conda people so things will improve in the next weeks. Thanks @mblue9! |
Great! Thanks a lot for working on it @bgruening !! |
@mblue9 can you check this out https://github.com/ebi-gene-expression-group/r-seurat-scripts (Bjoern, hijacked @pcm32 notebook) |
Nice! Thanks @pcm32 ! Do you mean I should be wrapping these wrapper scripts instead of seurat itself or using those scripts as reference? At the moment Im working on adding the aligning 2 samples method (RunCCA function) which isn't covered in those wrappers yet from what I can see. |
Well this more modular approach of separate scripts could be better for more flexibility and less ugliness. (The way I've added the RunCCA function into the script I wrote is currently ugly I think https://github.com/mblue9/tools-iuc/blob/seurat_add_cca/tools/seurat/seurat.R) and it'd probably be good to add RunMultiCCA for multiple datasets aswell. |
Hi @mblue9! Great that you're also working on this! We are aiming to decompose several tertiary analysistools (Seurat, sc3, scanpy, scatter, etc) into their main separate methods so that in the long term users can cherry pick and match different analysis steps from the different packages. The work that @bgruening pointed to is the substrate for the bioconda package for the proposed seurat scripts (where we could add the step missing that you propose). So yes, we aim to work on writing the Galaxy wrappers for each individual script, but since you are already on it, it would be great to join efforts and avoid duplication. Having functionality modularized like this would also mean that we can easily divide wrappers to write as well 😀. We are open of course to shape the modularisation if you have any thoughts on that as well. I'll redirect the other people that are working on this with me to this thread. |
@pcm32 that sounds like a great plan! Definitely would be keen to join forces and avoid duplicating effort. And working on the wrappers together would be great. Count me in! 👍 |
Hi all! I'm Jon, @pcm32 's colleague, and I've started to publish this work via PRs (thanks @bgruening for the review). See in particular bioconda/bioconda-recipes#10747. We also have some draft documentation at https://tertiary-workflows-docs.readthedocs.io/en/latest/ if you're interested in our general strategy. We're trying to be very consistent in our approach, so we're making some guidelines- e.g. for the r package wrappers https://tertiary-workflows-docs.readthedocs.io/en/latest/scripts_for_r_packages.html. Very happy to receive feedback and work with people on this. |
@pinin4fjords this is awesome! Just saying :) |
Was wondering... if the aim for these seurat tools, and other single-cell tools, is to create one Galaxy wrapper per R function, then should we try using @blankenberg's r2g2 tool to help automate creating the wrappers and for consistency? I've just tried it out and now have >300 Seurat tools! 😄 I've put just the ones that are currently in the r-seurat-scripts in my repo here if you want to see: https://github.com/mblue9/tools-iuc/tree/seurat_r2g2/tools/seurat_r2g2. They need some cleanup e.g. removing options not available in the r-seurat-scripts, and every argument currently requires the input to be specified e.g integer, see below. But what do people think, should we pursue this automated way to create the wappers? It could maybe be used for seurat, scater, SC3 etc. |
Wow I had no idea that tool existed! I was toying with the idea of scripting the process too after spending too much time manually copying and pasting help text and default parameters into the XML. I am 100% for this. |
The tool might need to be extended to parse a script rather than an entire library though. I made a feature request with this in mind. I will play with this today to see his much it speeds up filling param labels, help text, and default values. |
Great! Thanks @mtekman ! That would be really good to know how it compares to what you've already done. I tried r2g2 as I'd been looking at how many functions I'd already included in Seurat and if I include them all, it would be a lot of wrappers. As it would be ~12 currently in r-seurat-scripts plus another ~20 here, and that's just for Seurat, not even other single cells tools, so if you've any ideas to automate or speed up the creation of wrappers for these tools that would be great! |
Thanks @mtekman @mblue9, I had no idea about r2g2 either. I'm delighted with any process that makes this easier, and I was wondering if there was a way I could automate the wrapper script creation. r-seurat-scripts and related packages we're working on are designed to be stand-alone, i.e. independent of Galaxy, so the scripts are available as components of any workflow systems. From our point of view we'd have to think about the best way of achieving that alongside the Galaxy wrappers made by r2g2, but consistent with it, but it seems there must be a way of doing that. Obviously not married to the way we're doing this now, if there is a way of doing it in automated way I'll be very happy. |
@mblue9 The r2g2 tool at the moment does not seem to autofill labels or help text or defaults, and so currently is not so helpful in expediting the XML process, which for me is the main time consuming process. I propose the development of a new tool which would parse |
Using r2g2 sounds like a great addition! Besides stressing the fact that we want bioconda packages being Galaxy agnostic (I'm sure we can reuse results from r2g2 for the bioconda packages), so that they can be used with other workflow environments, I would add the following:
So I would harness the power given by r2g2 to produce scripts that we can add to the bioconda packages, but then work on top of that to trim to relevant functions and cherry pick Galaxy wrappers for important functionality instead of just any method available inside. |
What is your opinion on this @bgruening? |
@pcm32 If I can jump in on your comment -- I have also come across this issue of having too many methods floating around for different parts of the analyses. Here #1841 I suggested binding all methods into one of the 4 main stages of processing (Filtering, Normalisation, Confounder Removal, and Clustering). Users would then choose a method via dropdown box. This has the advantage of keeping all tools of the same type within the same wrapper and not cluttering the Galaxy tool search. At the time I was trying to put these 4 stages into one single tool (i.e. the user chooses the filtering method, the normalisation method, the confounder removal method, and the clustering method -- each stage being optional so that the user could skip a stage if they wanted to re-analyse the output RDS), in order to circumvent the problem of creating an exchange format between stages. |
Agreed, those functionalities (give or take a few) are what we should be going after (and then in the same coherent way for other tools, making it easier for users; so you have F, N, CR and Clustering for Seurat, SC3, scanpy, etc where available). I would avoid putting all functionalities together because then you cannot cherry pick the different functionalities from the different tools and used them together (ie. use filtering from seurat with normalization from sc3 and clustering from scanpy, to name an example). |
@mtekman I think that #1841 is great and really aligns with the view that we have, mainly only difference would be to add the R scripts to the bioconda -scripts package instead of having them right next to the Galaxy wrapper, so that the work can be re-used in a similar way on other workflow environments. Would you be happy to move in that direction and collaborate both at the level of Galaxy wrappers and bioconda packages? Here at the Genome campus we are 3 people, possibly 4, working part time on this project, and we will get two more persons at the Sanger. |
We have some advance done for scater I think besides seurat, and maybe some other package. Maybe @pinin4fjords can fill in more details. |
Yep, we've got most of the way through Scater too: https://github.com/ebi-gene-expression-group/bioconductor-scater-scripts/tree/devel. There are differences with @mtekman 's efforts, e.g. we have more content to do with argument parsing etc (since the scripts are intended to be standalone, so need more UI), but hopefully we could converge. |
@mblue9 I have started a small skeleton repo for what an Rscript2Galaxy tool should ideally look like https://github.com/mtekman/rscript2galaxy @pcm32 it makes sense to keep the core R scripts wrapper independent. I would most certainly be happy to collaborate in that direction, and definitely agree that having things set in bioconda first before wrapping them in Galaxy makes the most sense. @pinin4fjords I have also been in discussion with @ethering on continuing the scater modules, but I agree it would be better if we built upon your scripts before resuming any work. |
Hi Guys, interesting things going on here :) With R2G2 the idea was to create a generic tool that creates Galaxy tools that are fully functional. This does create a situation where for each formally declared function option, we can determine a default value and type, but due to overloading we need to allow setting other options; e.g. a Now if you have a priori knowledge of the actual tools and packages that you want to create, it makes sense to do things in a more focus fashion. For example, I have Galaxy tool generated focused directly on anvi'o (50 or so python-based tools), and because of the focused nature, the tools it creates are more-or-less 'production quality', where, as noted, having 300 r2g2 tools for a single r package might not have the same user-friendliness factor. They can often make great starting points, however. I am a huge fan of automated approaches for many reasons, and I think it makes lots of sense to use them when the expected number of tools exceeds a certain count. Arbitrarily, lets just say 10 tools or so: Spend an hour per tool manually creating Galaxy xml files, or spend 10 hours writing a conversion script? If you know what you want the tools to end up looking like, and are familiar with the specific use-cases, writing the script almost certainly wins out -- a huge bonus here is dealing with version updates, simply rerun the script and have new updated versions of the tools in just a few seconds. That being said, let me know what I can do to help. |
Hi all, sorry for being late to the game here, had an unpleasant day. But I'm glad you are all now here! Really awesome!
Totally agree. The gain what Galaxy can bring with really well-designed tools is a great UX and fewer failures in data-analysis if it's done well. We are using scripts for automatically generate Galaxy integrations for years now, for example in OpenMS or with our python-argparse converter. My experience is very mixed. For OpenMS with over 100 tools its really the only valuable option if you don't have a huge community behind it - but the wrappers are not-perfect, the UX is bad and the upgrade process really worse. Especially, if you start fixing bugs. But as @blankenberg already said The ideal solution for me would be, as Mehmet pointed already out, one sc-filtering tool, one sc-normalization tools .... the user should not care about the underlying method, but the user should be able to pick a method out of many. The big picture is what counts I think - and this is That said, I'm happy about supporting whatever we decide here - I'm just super excited about the number of people that would like to push scRNA in Galaxy - this is exciting. For people that are interested in how we can create a matrix, @mtekman has create a pipeline for this and has described the umi-handling and much more in a Galaxy training material: galaxyproject/training-material#969. Comments welcome! |
@bgruening that's quite exciting. Our project actually concerns intermediate formats for tertiary analysis tools as a key objective, but it's a complex issue and I can see it providing quite a few challenges, so a community-driven solution would be really awesome. What you're talking about could dovetail perfectly- I can envisage a 'filtering' galaxy tool that simply calls the appropriate script from the relevant *-scripts bioconda package. Even if we ended up using different single-tool Galaxy wrappers for our own internal objectives, just getting those intermediate formats agreed and working would be a big achievement- and yes, hdf5 had crossed our minds. Should we move this discussion somewhere away from this random PR thread :-P. Slack anyone? |
@pinin4fjords I think a separate issue in this repository would be the best place to keep track of the various ongoing efforts. For quick chats, we use https://gitter.im/galaxy-iuc/iuc |
I've created an issue in the IUC repo here if you want to use that to continue the discussion. I tried to summarise the main points but not sure I got them all so feel free to edit! |
I've created that other issue but in case you don't want to use that I'll respond here to some of the points raised. I was not proposing to add 300 Seurat functions, a few done well is definitely better! With the automated approach I was seeing it as a starting point.
That tool looks cool!
Yes I was also thinking mainly of Bioconductor in terms of automation here. As having worked on a few Bioconductor tools now, copying/pasting info that's already provided by Bioconductor just doesn't seem efficient. Parsing their packages to create a starting point for wrappers could be great for a number of reasons imo. Speed/time-saving is one thing, but consistency and standardisation are more important I think, which that could help with. It could potentially also help any interested Bioconductor tool authors create Galaxy wrappers more easily. As their focus is R, and while creating Galaxy wrappers is 'easy', auto-creating a wrapper to start from could lower the entry point for them. |
Hi all, I'm suhaib(@suhaibMo) working with @pcm32 and @pinin4fjords. Previously I'd written few R-scater wrappers for data processing functions (https://github.com/ebi-gene-expression-group/bioconductor-scater-scripts) that has been integrated in Bioconda recipes (https://github.com/bioconda/bioconda-recipes/tree/7d1f13c7f91fc65ed235eb4b860cfdb0287ab082/recipes/bioconductor-scater-scripts). I'm moving to write Galaxy wrappers (newbie) for Scater which I'm getting familiarise with the process and XML schema. However, I understand @mtekman is planning to write galaxy wrapper for scater ?.I aim to have following wrappers for scater
|
This PR adds Seurat, a tool for exploring single-cell data. It takes a gene count matrix as input and outputs a PDF of plots.
FOR CONTRIBUTOR:
FOR REVIEWER:
iuc
has access to associated toolshed repo(s)<command/>
'single quoted'
<![CDATA[ ... ]]>
tagstext
or havingoptional="true"
attribute are checked withif str($param)
before being usedformat
attribute containing datatypes recognised by Galaxy<![CDATA[ ... ]]>
tags