Question

Start to End RNA-Seq with R Bioconductor

0

Entering edit mode

sarastew1994 • 0

@6064892b

Last seen 3.3 years ago

Germany

Hi Community,

I apologise if this question is too simple and the answer is found elsewhere. I am a biologist and new to Bioconductor. I would like to do a start to end RNA-Seq analysis using Bioconductor and R, in an RMarkdown notebook, so that it can be reproducible for other lab members and for publication. We have deposited our RNA-Seq reads, as obtained from the sequencing machine, directly in GEO.

Most of the tutorials I've found start from a table with counts and explain differential expression packages/techniques. But the previous steps are not explained (reads to counts, QC, trim reads, mapping with HISAT or similar, BAM/SAM feature counts, counts to genes). I understand that there are some other tools, such as those used in command line or bash, that help in those previous steps. My questions are:

Is there a tutorial in Bioconductor including the very beginning of reading the data from GEO/SRA reads and processing them to get to gene counts? Ideally using entirely Bioconductor packages
If not, is there a tutorial, (in R or Rmarkdown) that calls all the necessary outside packages? (mostly using bioconductor, but if not available, calling from R other packages)
If not, is there a tutorial of reproducible RNA-Seqs using bash and R from sequencing reads to diff expression? (using a mix of command line tools and then switching to bioconductor at the end) Are there similar tutorials / a reference book / recipes for other techniques (scRNAseq, ChIPseq, etc)? I am always looking for everything explained in the same tutorial, from the very beginning of reading and counting reads to performing differential expression.

As an example, I am looking for something similar to this:

The most newbie-friendly tutorial I found is this: Galaxy Training Material which specifies the three steps (1. reads to counts, 2. counts to genes and 3. genes to pathways) and develops them in detail but obviously this is done in the GUI Galaxy. I would like something like this, step by step, but using R/Bioconductor (or if not available other packages).

Thank you very much!!

Bioconductor Workflow • 4.7k views

ADD COMMENT • link updated 3.3 years ago by Gordon Smyth 52k • written 3.3 years ago by sarastew1994 • 0

1

Entering edit mode

Just my two cents, but don't start with the habit of doing NGS preprocessing in R. There might be a few packages which wrap around existing command line tools but this is a) not available for all kinds of required work (such as trimming, fastqc, aligners other than subread...), and b) does not scale well as you cannot easily orchestrate whole workflows from inside R. You also still have to setup and compile software outside of R, so I would defninitely just do the whole preprocessing outside of it. I recommend using existing pipelines which do exist, be it SnakePipes or Nextflow/nf-core ones, and then read the required data (usually just the counts) into R for the downstream. The RNA-seq workflow from Bioc https://bioconductor.org/packages/release/workflows/html/rnaseqGene.html is (if you ask me) up to date, and also covers quantification with tools other than traditional aligners, it is worth taking a look.

ADD REPLY • link 3.3 years ago ATpoint ★ 4.8k

2

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 7 days ago

San Diego

I'm not sure what alignment options there are in pure R. RSubread might be it.

But once you get gene counts, EdgeR or DESeq2 are what everyone uses for differential gene expression.

ADD COMMENT • link 3.3 years ago swbarnes2 ★ 1.4k

score 3 · Accepted Answer · 2022-01-05

3

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

There are a couple. Here is one based on DESeq2, which is a bit dated now, and one based on edgeR, which might be a bit dated as well. There is also systemPipeR, which might be of interest to you.

ADD COMMENT • link 3.3 years ago James W. MacDonald 68k

1

Entering edit mode

I consider the edgeR one to still be current, except for using mm10 instead of the new mm39 mouse genome build released this year. It is the only one of the workflows that is entirely R, including the alignment. The same edgeR workflow run on the latest R and Bioconductor packages is available here.

ADD REPLY • link 3.3 years ago Gordon Smyth 52k