Question

limma voom and trend

0

Entering edit mode

linouhao ▴ 20

@linouhao-15901

Last seen 3 months ago

United States

hi, thanks in advance I want to ask when should I use voom or trend, because I can not decide the library sizes are quite variable between samples, can you show me the code to detect this,

you once said fpkm differential analysis should use trend=T, robust =T , so should I do cpm about fpkm, if so, how should I prefilter fpkm value, is there a recommendation

and I hear someone said limma will give more false positive compared to edger and deseq2, is that true?

here is the detailed code of trend and voom If the sequencing depth is reasonably consistent across the RNA samples, then the simplest and most robust approach to dierential exis to use limma-trend. This approach will usually work well if the ratio of the largest library size to the smallest is not more than about 3-fold. In the limma-trend approach, the counts are converted to logCPM values using edgeR's cpm function:

> logCPM <- cpm(dge, log=TRUE, prior.count=3)

The prior count is used here to damp down the variances of logarithms of low counts. The logCPM values can then be used in any standard limma pipeline, using the trend=TRUE argument when running eBayes or treat. For example:

> fit <- lmFit(logCPM, design)
> fit <- eBayes(fit, trend=TRUE)
> topTable(fit, coef=ncol(design))

Or, to give more weight to fold-changes in the gene ranking, one might use:

> fit <- lmFit(logCPM, design)
> fit <- treat(fit, lfc=log2(1.2), trend=TRUE)
> topTreat(fit, coef=ncol(design))

Differential expression: voom When the library sizes are quite variable between samples, then the voom approach is theoretically more powerful than limma-trend. In this approach, the voom transformation is applied to the normalized and ltered DGEList object:

v <- voom(dge, design, plot=TRUE)

The voom transformation uses the experiment design matrix, and produces an EList object. It is also possible to give a matrix of counts directly to voom without TMM normalization, by

> v <- voom(counts, design, plot=TRUE)

If the data are very noisy, one can apply the same between-array normalization methods as would be used for microarrays, for example:

> v <- voom(counts, design, plot=TRUE, normalize="quantile")

After this, the usual limma pipelines for dierential expression can be applied, for example:

> fit <- lmFit(v, design)
> fit <- eBayes(fit)
> topTable(fit, coef=ncol(design))

Or, to give more weight to fold-changes in the ranking, one could use say:

> fit <- treat(fit, lfc=log2(1.2))
> topTreat(fit, coef=ncol(design))

limma • 6.4k views

ADD COMMENT • link updated 4.6 years ago by Gordon Smyth 52k • written 4.6 years ago by linouhao ▴ 20

1

Entering edit mode

"and I hear someone said limma will give more false positive compared to edger and deseq2, is that true?" - As an aside, where did you hear or read this?

Also note that FPKM expression units cannot be used as input to DESeq2, EdgeR, or Limma - I trust that you are not doing this. With no cross-sample normalisation employed when deriving them, FPKM expression units are not suitable for any type of differential expression analysis. FPKMs (and RPKMs) represent a primitive form of RNA-seq normalisation when people were generating cDNA libraries for sequencing on just single samples.

ADD REPLY • link 4.6 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

thanks a lot you are right, I am not doing the fpkm for analysis by deseq2 , but a lot published paper in these years, even in 2020, still use limma to do fpkm analysis, and published in sci(IF>=5) , so it maybe suitable in some extent. and I am also wonder if fpkm can be analysised by wilcox to do the differential analysis, will the result be more appropriate than using limma.

my another important question is the trend and voom, how to show the library size difference?

ADD REPLY • link 4.6 years ago linouhao ▴ 20

1

Entering edit mode

Yes, I have also seen published manuscripts whose data are [in part] based on FPKM expression units. If you are prepared to use these FPKM units and cannot obtain any raw counts, then log [base 2] (FPKM + 0.1) these units and then adopt the limma-trend pipeline: https://bioinformatics-core-shared-training.github.io/cruk-autumn-school-2017/DifferentialExpression/rna-seq-de.nb.html#limma-trend

A previous answer here from Gordon: https://support.bioconductor.org/p/56275/#56299

ADD REPLY • link 4.6 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

thanks a lot

yes, I have read the link you post and as it suggests in the past. I want to know the difference between wilcox and limma in FPKM, whether wilcox is better (for example, TCGA data)

and want to know for counts data, limma has voom and trend, not like deseq2 and edger, which just has one method, voom and trend are concered about the sample size difference, but I can not distinguish it.

ADD REPLY • link 4.6 years ago linouhao ▴ 20

0

Entering edit mode

Kevin asked you where you heard that limma gives more false positives but you have refused to answer him. It is a strange thing to claim, and it may be that you have misunderstood what you heard or read.

ADD REPLY • link 4.6 years ago Gordon Smyth 52k

Kevin Blighe · Answer 1 · 2020-09-01

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 5 minutes ago

WEHI, Melbourne, Australia

Sorry, I don't understand where your difficulty lies because the text that you have copied from the limma User's Guide seems to answer your question. Indeed your post consists mostly of text copied from the User's Guide.

Are you saying that you don't know what the library sizes are for your data? Or you do know the library sizes but don't know how to decide whether one library size is more than 3 times another?

I also don't understand your references to fpkm. We strongly recommend against using fpkm for DE analyses, as you already seem to know. Even with fkpm however, limma would still be more powerful that Wilcox tests.

limma does not give more false positives than edgeR or DESeq2 and I am not aware of any publication that makes that claim.

ADD COMMENT • link 4.6 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks a lot. I have both puzzles. don't know what the library sizes are for your data ? don't know how to decide whether one library size is more than 3 times another and decide use trend or voom?

can I use voom directly no matther what is the library size difference?

also for edger, I know I should use logcpm(logcpm is betther than cpm right?), and vst and rlog in deseq2 for downstream analysis except DE, but for limma, what kind of data should be used? is the voom data? voom seems to be much like the cpm function

ADD REPLY • link 4.6 years ago linouhao ▴ 20

1

Entering edit mode

The library sizes are stored in the DGE list, for example:

summary( dge$samples$lib.size )

You can always use voom if you prefer regardless of the library sizes. In a broad range of cases, voom or limma-trend are both acceptable and similar in performance.

We recommend cpm(dge, log=TRUE) for non-DE data exploration, same for both both limma and edgeR.

ADD REPLY • link 4.6 years ago Gordon Smyth 52k

0

Entering edit mode

how is the dge comes, can you give an example? I often use like this， which variable stands for dge

design <- model.matrix(~0+factor(group_list))
colnames(design)=levels(factor(group_list))
rownames(design)=colnames(exprSet)
contrast.matrix<-makeContrasts(paste0(unique(group_list),collapse = "-"),levels = design)

fit <- lmFit(exprSet,design)
##step2
fit2 <- contrasts.fit(fit, contrast.matrix) ##这一步很重要，大家可以自行看看效果
fit2 <- eBayes(fit2)  ## default no trend !!!
##eBayes() with trend=TRUE
##step3
tempOutput = topTable(fit2, n=Inf)

ADD REPLY • link updated 4.6 years ago by Kevin Blighe ★ 4.0k • written 4.6 years ago by linouhao ▴ 20

0

Entering edit mode

linouhao, your posts today tell me that you are processing both RNA-seq and microarray data at the same time. My worry is that you are becoming confused with the correct methods to apply to both. If the code above relates to microarray, then you do not need to use cpm() - your data, likely contained in exprs(exprSet), will be log2-transformed and already ready for downstream analyses.

Can you please 'take a breather' and focus on just one thing at a time.

This particular thread began as RNA-seq, and you alluded to an understanding of what is the DGE object, for example:

logCPM <- cpm(dge, log=TRUE, prior.count=3)

ADD REPLY • link 4.6 years ago Kevin Blighe ★ 4.0k

2

Entering edit mode

Or an alternative interpretation is that OP hasn't actually processed any data. It would appear from this thread that OP doesn't know a library size is, in other words doesn't know what RNA-seq is. OP has posted 33 questions and comments so far but has not mentioned any of their own code in any post. All the code appears to be copied from the User's Guide or from other tutorials. It may be that OP is taking a course and these are all somewhat hypothetical questions.

I've decided I can't help OP any further but feel free to persevere yourself if you feel able to.

ADD REPLY • link 4.6 years ago Gordon Smyth 52k

0

Entering edit mode

I am so sorry for giving you such misconception.
It is precisely because I used my own data and find many questions, so I read carefully with the guide and other questions people posted, you can also found the guide is sometimes misleading.

let me give another quirky example in the limma userguide and also in the link https://ucdavis-bioinformatics-training.github.io/2018-June-RNA-Seq-Workshop/thursday/DE.html.

you even do not know whether dge <- calcNormFactors(dge) is needed in the limma code, which is usually used in the edger code, the guide ud the same variable, which is really misleading, I asked people around me who has done rna-seq for many years, they are also doubtful

so I want to say it maybe not because of myself, it also has some reason with the guide. here i want to know whether limma code need dge <- calcNormFactors(dge)

ADD REPLY • link 4.6 years ago linouhao ▴ 20

1

Entering edit mode

You find it "quirky" and "misleading" that all the guides give the same advice with the same notation?

Please see https://www.bioconductor.org/help/support/posting-guide/ for a guide as to how to ask constructive questions on this forum.

Several people, including me, have volunteered their time to try to help you over the past few months. Unfortunately you have not helped us to help you.

ADD REPLY • link 4.6 years ago Gordon Smyth 52k

0

Entering edit mode

I am so sorry, the official guide limma https://www.bioconductor.org/packages/devel/bioc/vignettes/limma/inst/doc/usersguide.pdf
here is the same, using dge <- calcNormFactors(dge) , v <- voom(dge, design, plot=TRUE) The voom transformation uses the experiment design matrix, and produces an EList object. It is also possible to give a matrix of counts directly to voom without TMM normalization, by

v <- voom(counts, design, plot=TRUE)

you can see it say also, but for us common user, we just need one method the most popular used, we can not get clear details about it, but the guide does not say,

it is reaaly a big pity that you think I am not heling myself and waste much of your time

thanks alot all the same

ADD REPLY • link 4.6 years ago linouhao ▴ 20