Question

Regularization of ballgown object on Limma

0

Entering edit mode

komuracom16 • 0

@komuracom16-22770

Last seen 5.3 years ago

Hi,

I'm doing transcription level analysis with my RNA-seq data while seeing nature protocol(https://www.nature.com/articles/nprot.2016.095).

I did alignment, assembly and quantification expressed genes and transcripts with HISAT2 and Stringtie. However, I found my sample sizes was small (n < 4 per group) as mentioned in nature protocol.

Note that Ballgown’s statistical test is a standard linear model based comparison. For small sample sizes (n < 4 per group) it is often better to perform regularization. This can be done using the limma33 package in Bioconductor.

Therefore, I did regularization on Limma, but I'm not sure if this is correct. So, I want to ask everyone to confirm my command below.

#use data in nature protocal
#create ballgown instance
pheno_data = read.csv("chrX_data/geuvadis_phenodata.csv")
bg_chrX = ballgown(dataDir = "ballgown", samplePattern = "ERR", pData=pheno_data)
bg_chrX_filt = subset(bg_chrX,"rowVars(texpr(bg_chrX)) >1",genomesubset=TRUE)

#make design for limma
design <- model.matrix(~0+sex+population, data=pheno_data)
colnames(design) <- gsub("sex", "", colnames(design))

#regularization on limma
v <- voom(texpr(bg_chrX_filt), design, plot=TRUE)
vfit <- lmFit(v, design)
efit <- eBayes(vfit)
plotSA(efit, main="Final model: Mean-variance trend")
contr <- makeContrasts(female - male, levels = colnames(coef(efit)))
tmp <- contrasts.fit(efit, contr)
tmp <- eBayes(tmp)
top.table <- topTable(tmp, sort.by = "P", n = Inf)

#see DEGs
head(top.table, 10)

I referred to those sites; https://github.com/alyssafrazee/ballgown

https://ucdavis-bioinformatics-training.github.io/2018-June-RNA-Seq-Workshop/thursday/DE.html

https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html#normalising-gene-expression-distributions

Thanks!

limma ballgown RNA-seq • 1.2k views

ADD COMMENT • link 5.3 years ago komuracom16 • 0

score 1 · Answer 1 · 2020-01-27

I am the limma author. I am not familiar with ballgown output so I can't advise you on how to analyse it, but there are several aspects of your question and code that fill me with concern:

limma is designed to analyse either gene level expression or exon level expression but not transcript-level
limma is not designed to analyse FPKM values
limma is incompatible with variance filtering.
voom in particular is only designed to analyse counts, not any other type of expression data..

There has been a lot written about all these points, so I won't repeat it all here. The various limma workflows show how the use limma with RNA-seq data.