Hi Community,
I apologise if this question is too simple and the answer is found elsewhere. I am a biologist and new to Bioconductor. I would like to do a start to end RNA-Seq analysis using Bioconductor and R, in an RMarkdown notebook, so that it can be reproducible for other lab members and for publication. We have deposited our RNA-Seq reads, as obtained from the sequencing machine, directly in GEO.
Most of the tutorials I've found start from a table with counts and explain differential expression packages/techniques. But the previous steps are not explained (reads to counts, QC, trim reads, mapping with HISAT or similar, BAM/SAM feature counts, counts to genes). I understand that there are some other tools, such as those used in command line or bash, that help in those previous steps. My questions are:
Is there a tutorial in Bioconductor including the very beginning of reading the data from GEO/SRA reads and processing them to get to gene counts? Ideally using entirely Bioconductor packages
If not, is there a tutorial, (in R or Rmarkdown) that calls all the necessary outside packages? (mostly using bioconductor, but if not available, calling from R other packages)
If not, is there a tutorial of reproducible RNA-Seqs using bash and R from sequencing reads to diff expression? (using a mix of command line tools and then switching to bioconductor at the end) Are there similar tutorials / a reference book / recipes for other techniques (scRNAseq, ChIPseq, etc)? I am always looking for everything explained in the same tutorial, from the very beginning of reading and counting reads to performing differential expression.
As an example, I am looking for something similar to this:
The most newbie-friendly tutorial I found is this: Galaxy Training Material which specifies the three steps (1. reads to counts, 2. counts to genes and 3. genes to pathways) and develops them in detail but obviously this is done in the GUI Galaxy. I would like something like this, step by step, but using R/Bioconductor (or if not available other packages).
Thank you very much!!
Just my two cents, but don't start with the habit of doing NGS preprocessing in R. There might be a few packages which wrap around existing command line tools but this is a) not available for all kinds of required work (such as trimming, fastqc, aligners other than subread...), and b) does not scale well as you cannot easily orchestrate whole workflows from inside R. You also still have to setup and compile software outside of R, so I would defninitely just do the whole preprocessing outside of it. I recommend using existing pipelines which do exist, be it SnakePipes or Nextflow/nf-core ones, and then read the required data (usually just the counts) into R for the downstream. The RNA-seq workflow from Bioc https://bioconductor.org/packages/release/workflows/html/rnaseqGene.html is (if you ask me) up to date, and also covers quantification with tools other than traditional aligners, it is worth taking a look.