Question

Differential gene expression using R

0

Entering edit mode

lakshmi9c • 0

@lakshmi9c-23931

Last seen 4.6 years ago

I am working on RNA Seq data analysis to get differential gene expression between 2 conditions. I am using ballgown package on R, and successfully loaded the data into R. However, I do have these queries after my progress:

Is it necessary to remove low variance transcripts while doing differential gene expression? And why?
Why do we need to remove low gene abundance & low variance transcripts?
How do I get gene name and gene id without stattest() function on R using ballgown?

Thanks in advance!

deseq2 edger normalization limma • 2.5k views

ADD COMMENT • link updated 4.6 years ago by Gordon Smyth 52k • written 4.6 years ago by lakshmi9c • 0

0

Entering edit mode

this script is based on DNAseq2 and EdgeR bioconductor packages if your aim is DE at genes level

ADD REPLY • link 4.6 years ago thind.amarinder ▴ 10

score 2 · Answer 1 · 2020-08-14

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

I'm a bit puzzled about what analysis you're planning to do. Is this question a followup to your previous question about tuxedo and StringTie?

If you're using ballgown, why not follow the ballgown documentation and ballgown functions?

You've tagged your question to get help from the authors of several DE packages (limma, edgeR and DESeq2), but it isn't clear what relevance these packages have to your question. Ballgown is specifically designed for isoform-level DE whereas limma, edgeR and DESeq2 are specifically not designed for isoform-level DE, so it isn't clear what you're planning to do.

None of the four packages (balldown, limma, edgeR or DESeq2) make any use of variance filtering, so why the questions about it?

If you do want to do a DE analysis using a particular one of the packages mentioned, just follow the DE workflows that are provided for that package. Many workflows explain the filtering recommended in detail, for example:

The edgeR function filterByExpr implements the recommended filtering approach for limma and edgeR.

ADD COMMENT • link 4.6 years ago Gordon Smyth 52k

0

Entering edit mode

Indeed, if you want help with the Ballgown-specific question, then please show a reproducible example. First, be sure that you have followed the Ballgown vignette(s) and that all of your commands have run as expected.

ADD REPLY • link 4.6 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Thank you for the fast reply! I will keep that in mind about proper tags in my future questions. However, my concern is about using ballgown package for differential expression analysis of my RNA-Seq data. I need to find the differentially expressed genes between two sample conditions. I have tried as you suggested by looking up manuals/ protocol for running Ballgown on R. But unfortunately, I can't find a standard protocol. I have run all commands correctly as per paper- https://www.nature.com/articles/nprot.2016.095. But I don't understand the need to remove low variance transcripts. Also I'm unable to proceed after getting the list of gene names and unsure of what the next step is. How should I do normalisation for Ballgown? I'm stuck at this step and would much appreciate any help on how do I proceed in order to get DE genes. Thanks in advance!

ADD REPLY • link 4.6 years ago lakshmi9c • 0

0

Entering edit mode

You don't have to do any normalization per se. From the help for stattest:

Library size adjustment is performed by default by using the sum of the log nonzero expression measurements for each sample, up to the 75th percentile of those measurements. This adjustment can be disabled by setting libadjust=FALSE. You can use mod and mod0 to specify alternative library size adjustments

ADD REPLY • link 4.6 years ago James W. MacDonald 68k

0

Entering edit mode

I think the OP is talking about this part of the paper being referenced:

Filter to remove low-abundance genes. One common issue with RNA-seq data is that genes often have very few or zero counts. A common step is to filter out some of these. Another approach that has been used for gene expression analysis is to apply a variance filter. Here we remove all transcripts with a variance across samples less than one:

>bg_chrX_filt = subset(bg_chrX,″rowVars(texpr(bg_chrX)) >1″,genomesubset=TRUE)

And to be fair there is a function in genefilter for removing low variance genes, so at one point that was a thing that people talked about doing. Although I'm not sure it's much of a thing these days.

ADD REPLY • link 4.6 years ago James W. MacDonald 68k

0

Entering edit mode

Here we remove all transcripts with a variance across samples less than one:

I guess that, by doing this, they are essentially removing genes whose values are virtually constant across all samples, which I am not sure is ideal, unless these are values of 0 or other low count values, in which case a filter for low counts [not variance] would suffice.

I do recall years ago filtering microarray data based on variance, but that was at the level of the probe-set (where relevant to the array design) and used to remove failed probes.

ADD REPLY • link 4.6 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Yes, I think you're right, but the paper was referenced only in a comment added 10 days after my answer. Variance filtering is incompatible with limma, edgeR or DESeq2, as I've pointed out many times over the years.

Anyway, I can't answer questions about ballgown or the associated protocol paper. I only responded to this question originally because it was tagged with limma and edgeR.

ADD REPLY • link 4.6 years ago Gordon Smyth 52k