I have been trying to run the DESeq package on some NGS data I have from an experiment. I have the raw data in the form of .fastq files as well as RPKM values stored in an excel sheet. I have two cell types, two treatments, and two replicates for each group and I am trying to get a DE result file and generate a volcano plot. I have the end of the process figured out, I just can't get the program to run and produce a DDS.
I have gone over the workflow a couple times but I am getting stuck on the implementation of Salmon. Do I need to run that program in python/is there a way to skip that step and do the analysis exclusively in R?
So you already know that you cannot use RPKM as input to DESeq2.
There are a number of ways to generate counts, but Salmon is very fast and easy. You also know how to run Salmon because it's part of the workflow.
Alternatively, you could collaborate with someone who can help you with quantification from FASTQ, and then load into R using the various importers described in the workflow.
But we can't do all this for you on the support site. The site is for specific questions about Bioconductor software, but not to avoid having to do the work of trying things out yourself (or finding collaborators to help).
I don't think salmon is a python program, you just run it in Unix in the command line. I don't think there is a R implementation. There might be some aligners that are implemented in R, but the most popular ones, the ones that more people use and can help you with, are not.
thanks! I haven't used Anaconda in a while so I forgot "conda..." things go in the command window. I'm going to see if I can knock the rust off.
Once you get salmon running in a conda environment (see first section HERE), there is then useful information HERE (see section 7.2.4 Salmon quantification) about how to index the reference transcriptome and run the count abundance step. After all of that, you would then go back to the DESeq2 vignette about how to import the count abundances to DESeq2 for normalisation.
As per Michael, the RPKM data cannot be used as input to DESeq2.
It is important to follow the vignettes / tutorials provided by the authors of these programs so that one can then learn how they work, and, ultimately, learn how to apply these programs to your own data.
We also have Salmon indexing + quantification including a Snakemake example for looping over samples in the workflow now:
https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#quantifying-with-salmon
Thanks for both of your help! I am usually more of a lab tech than a coder but covid has me trying to find new ways to help our lab. Unfortunately, I still can't seem to get the Salmon program working. I have uninstalled and reinstalled the anaconda software, installed the binary file from github and tried several variations of conda install salmon and keep getting an error about the channels I have, or the abscence of salmon from the directory; despite the fact that conda-forge and bioconda are at the top of my channels list, and the salmon binary file is listed when I give the dir command. I just did my third un/reinstall of anaconda for today and these steps are still not working. If there is something I may have missed trivial or subtle please let me know if you have the chance. Thanks so much!
I’m actually really swamped now unfortunately so won’t be able to follow up here.
Again, a good approach is to ask someone local to your institute for help getting started beyond the online documentation. This site is really for reporting specific issues or questions to Bioconductor software maintainers.
zal4002, why not just try to download the pre-compiled executables ('binaries') from here and try to run those outside of conda? - https://github.com/COMBINE-lab/salmon/releases (scroll down to find the filename 'salmon-1.2.1linuxx86_64.tar.gz').
As per Michael, though, this website is for support with Bioconductor packages. You could try to make a post on Biostars, but please link back here when / if you do.
Thanks for the tip Re: Biostars. I posted on there and someone told me that I cannot run Salmon on Windows (didn't expect that to be an issue). I'm trying to find a workaround but at least now I'm going in the right direction.
Lots of bioinformatics software is primarily designed for Linux, and happens to work also on Mac usually (or can be extended with some effort to work on Mac) because MacOS is Unix-like.
While trying to get some experience doing typical bioinformatics tasks, you end up spending a lot of time and effort dealing with issues that bioinformatics users on a Linux cluster wouldn't encounter, and having to post questions on forums, etc.
Whereas the typical experience would be to just download the
linux_x86_64
executable and it runs immediately. You end having a rough experience, but it's because no one is aligning or quantifying reads on a Windows machine, they are doing it on Linux clusters or Mac laptops, so there is little to no support.If you are trying to recreate this experience on a Windows machine, you could use a virtual machine, or you could just ask someone with access to a Linux cluster / laptop / Mac laptop to do this step for you.