EdgeR questions about exactTest and plotMDS function

0

Entering edit mode

Amos Kirilovsky ▴ 50

@amos-kirilovsky-5407

Last seen 7.9 years ago

Dear Bioconductor list, I'm try to get differentially expressed genes from two groups of samples using EdgeR. In the first group I have only 1 sample and in the second there are 3 biological replicates. (I will get more samples per group in the future but for now I have to make an analysis pipeline with only that 4 samples) The sample came from a metatranscriptomic experience. For the comparison I selected only the genes that match to a species of interest. I got error messages doing differential expression test and MDS : 1) *Differential expression*: > et <- exactTest(d) Error in `[.data.frame`(mu1, g) : undefined columns selected or (only if the name of my first group is starting with the letter A !) > et <- exactTest(d) Error in 0:s1[g] : argument NA / NaN I found out that my error message is due to the presence of only one sample in my first group when using exactTest() function. I used glmLRT() function and I obtained many significantly differentially expressed genes. Could you confirm that exactTest() don't work with group of only one sample and that I used correctly glmLRT() ? My code is at the end of the message. 2) *MDS* > plotMDS(d) Error in is.finite(x$counts) : default method unavailable for 'list' type (translated from French) For the second problem I made some tests with the first Case study ("SAGE profiles of normal and tumour tissue") in edgeR user Guide. My code was very similar and this time it worked fine. I couldn't find out why. Do you have any idea? I copy the output of my DGEList object after my message. I don't know if this is helpfull but the function plotBCV() seems ok with my data. During my tests with the first Case study I get different results when doing plotMDS(d) and plotMDS(d$counts). The axis length and the sample position were different. Is it expected? why such a difference? I guess it has to do with the gene.selection option. I hope I was clear in my explanations. Many thanks in advance. Amos Kirilovsky -- Dr Amos Kirilovsky CEA Institut de génomique 2 rue Gaston Cremieux CP 5706 91057 Evry cedex - France Tel: (33) 01 60 87 25 35 #################################################################### > *d* An object of class "DGEList" $samples files group lib.size norm.factors s1 s1 E 271842 0.9414223 s2 s2 b 1310877 1.0216270 s3 s3 b 669731 1.0476370 s4 s4 b 1184754 0.9924584 $counts s1 s2 s3 s4 804851 0 45 19 49 587117 0 27 12 30 427810 0 16 6 10 367020 0 8 16 20 980437 0 2 8 14 14236 more rows ... $common.dispersion [1] 0.4296784 $pseudo.counts s1 s2 s3 s4 804851 0 24.4565653 19.756230 30.425129 587117 0 14.6478020 12.480244 18.638575 427810 0 8.7504091 6.242240 6.135612 367020 0 4.1452733 16.616435 12.407833 980437 0 0.8736513 8.306954 8.780709 14236 more rows ... $logCPM [1] 4.803775 4.158376 3.185720 3.723672 2.957607 14236 more elements ... $pseudo.lib.size [1] 729208.6 $prior.n [1] 10 $tagwise.dispersion [1] 0.3771734 0.3789361 0.3849697 0.4188742 0.4511627 14236 more elements #################################################################### #################################################################### *#script* library(edgeR) #read file filinName = "metaTranscriptome" filin = read.table(filinName, sep ="\t", header = TRUE) x <- list() sets <- c("s1","s2","s3","s4") x$samples <- data.frame(files=as.character(sets),stringsAsFactors=FALSE) x$samples$group <- factor(c("A","B","B","B")) # read count matrix x$counts = filin[,2:5] #Attention s'il y a des NA ça va bugger rownames(x$counts) <- filin[,1] colnames(x$counts) <- sets # tot read number by sample x$samples$lib.size <- colSums(x$counts) # DGEList x$samples$norm.factors <- 1 row.names(x$samples) <- colnames(x$counts) d <- new("DGEList",x) #creation de DGElist à partir de x # normalisation factor d <- calcNormFactors(d) # TMM summary(d$counts) ################ Not working ################### plotMDS(d) ################################################ # estimate common dispersion d <- estimateCommonDisp(d, verbose=TRUE) # estimate Tagwise Disp d <- estimateTagwiseDisp(d, trend="none") # plots the tagwise dispersions against log2-CPM # plotBCV(d, cex=0.4) ################ Not working ################### # Differential expression # et <- exactTest(d) # topTags(et) ################################################ # matrix design design <- model.matrix(~0+group, data=d$samples) colnames(design) <- levels(d$samples$group) design # Differential expression fit <- glmFit(d,design) lrt <- glmLRT(fit, contrast=c(1,-1)) topTags(lrt) #################################################################### #################################################################### sessionInfo() R version 2.15.3 (2013-03-01) Platform: x86_64-pc-linux-gnu (64-bit) other attached packages: [1] edgeR_3.0.8 limma_3.14.4 [[alternative HTML version deleted]]

edgeR edgeR • 4.1k views

ADD COMMENT • link updated 11.9 years ago by Gordon Smyth 52k • written 11.9 years ago by Amos Kirilovsky ▴ 50

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 6.4 years ago

Hi Amos, > 1) *Differential expression*: >> et <- exactTest(d) > Error in `[.data.frame`(mu1, g) : undefined columns selected > > or (only if the name of my first group is starting with the letter A !) >> et <- exactTest(d) > Error in 0:s1[g] : argument NA / NaN > > I found out that my error message is due to the presence of only one sample > in my first group when using exactTest() function. I used glmLRT() function > and I obtained many significantly differentially expressed genes. > Could you confirm that exactTest() don't work with group of only one sample > and that I used correctly glmLRT() ? My code is at the end of the message. I can't reproduce this error on a toy problem (either on edgeR 3.0.8 or on the latest 3.2.3): library(edgeR) y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4) d <- DGEList(counts=y, group=rep(1:2,c(1,3))) d <- estimateCommonDisp(d, verbose=TRUE) d <- estimateTagwiseDisp(d, trend="none") et <- exactTest(d) I'm afraid I can't tell from your code what the issue is, but it is a bit unusual to construct the 'DGEList' object manually that way you do. Why not use the DGEList() constructor? > During my tests with the first Case study I get different results when > doing plotMDS(d) and plotMDS(d$counts). > The axis length and the sample position were different. > Is it expected? why such a difference? I guess it has to do with the > gene.selection option. This is expected. plotMDS(x) is different when x is a DGEList object from when x is a matrix. See ?plotMDS.DGEList for the former and ?plotMDS for the latter. You probably want a count-based MDS plot, i.e. plotMDS(x), whether that be in the method='logFC' or method='bcv' flavour. Read the docs for more details. Best, Mark ---------- Prof. Dr. Mark Robinson Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://tiny.cc/mrobin On 27.05.2013, at 17:36, Amos Kirilovsky <amos.kirilovsky at="" gmail.com=""> wrote: > Dear Bioconductor list, > > I'm try to get differentially expressed genes from two groups of samples > using EdgeR. > In the first group I have only 1 sample and in the second there are 3 > biological replicates. (I will get more samples per group in the future but > for now I have to make an analysis pipeline with only that 4 samples) > The sample came from a metatranscriptomic experience. > For the comparison I selected only the genes that match to a species of > interest. > > I got error messages doing differential expression test and MDS : > > 1) *Differential expression*: >> et <- exactTest(d) > Error in `[.data.frame`(mu1, g) : undefined columns selected > > or (only if the name of my first group is starting with the letter A !) >> et <- exactTest(d) > Error in 0:s1[g] : argument NA / NaN > > I found out that my error message is due to the presence of only one sample > in my first group when using exactTest() function. I used glmLRT() function > and I obtained many significantly differentially expressed genes. > Could you confirm that exactTest() don't work with group of only one sample > and that I used correctly glmLRT() ? My code is at the end of the message. > > > 2) *MDS* >> plotMDS(d) > Error in is.finite(x$counts) : > default method unavailable for 'list' type (translated from French) > > For the second problem I made some tests with the first Case study ("SAGE > profiles of normal and tumour tissue") in edgeR user Guide. > My code was very similar and this time it worked fine. I couldn't find out > why. Do you have any idea? > I copy the output of my DGEList object after my message. I don't know if > this is helpfull but the function plotBCV() seems ok with my data. > > During my tests with the first Case study I get different results when > doing plotMDS(d) and plotMDS(d$counts). > The axis length and the sample position were different. > Is it expected? why such a difference? I guess it has to do with the > gene.selection option. > > > I hope I was clear in my explanations. > Many thanks in advance. > > Amos Kirilovsky > -- > Dr Amos Kirilovsky > CEA > Institut de g?nomique > 2 rue Gaston Cremieux > CP 5706 91057 Evry cedex - France > Tel: (33) 01 60 87 25 35 > > #################################################################### >> *d* > An object of class "DGEList" > $samples > files group lib.size norm.factors > s1 s1 E 271842 0.9414223 > s2 s2 b 1310877 1.0216270 > s3 s3 b 669731 1.0476370 > s4 s4 b 1184754 0.9924584 > > $counts > s1 s2 s3 s4 > 804851 0 45 19 49 > 587117 0 27 12 30 > 427810 0 16 6 10 > 367020 0 8 16 20 > 980437 0 2 8 14 > 14236 more rows ... > > $common.dispersion > [1] 0.4296784 > > $pseudo.counts > s1 s2 s3 s4 > 804851 0 24.4565653 19.756230 30.425129 > 587117 0 14.6478020 12.480244 18.638575 > 427810 0 8.7504091 6.242240 6.135612 > 367020 0 4.1452733 16.616435 12.407833 > 980437 0 0.8736513 8.306954 8.780709 > 14236 more rows ... > > $logCPM > [1] 4.803775 4.158376 3.185720 3.723672 2.957607 > 14236 more elements ... > > $pseudo.lib.size > [1] 729208.6 > > $prior.n > [1] 10 > > $tagwise.dispersion > [1] 0.3771734 0.3789361 0.3849697 0.4188742 0.4511627 > 14236 more elements > > > #################################################################### > #################################################################### > > *#script* > library(edgeR) > > #read file > filinName = "metaTranscriptome" > filin = read.table(filinName, sep ="\t", header = TRUE) > > x <- list() > sets <- c("s1","s2","s3","s4") > x$samples <- data.frame(files=as.character(sets),stringsAsFactors=FALSE) > x$samples$group <- factor(c("A","B","B","B")) > > # read count matrix > x$counts = filin[,2:5] #Attention s'il y a des NA ?a va bugger > rownames(x$counts) <- filin[,1] > colnames(x$counts) <- sets > > # tot read number by sample > x$samples$lib.size <- colSums(x$counts) > > # DGEList > x$samples$norm.factors <- 1 > row.names(x$samples) <- colnames(x$counts) > d <- new("DGEList",x) #creation de DGElist ? partir de x > > # normalisation factor > d <- calcNormFactors(d) # TMM > > summary(d$counts) > > ################ Not working ################### > plotMDS(d) > ################################################ > > # estimate common dispersion > d <- estimateCommonDisp(d, verbose=TRUE) > > # estimate Tagwise Disp > d <- estimateTagwiseDisp(d, trend="none") > > # plots the tagwise dispersions against log2-CPM > # plotBCV(d, cex=0.4) > > ################ Not working ################### > # Differential expression > # et <- exactTest(d) > # topTags(et) > ################################################ > > # matrix design > design <- model.matrix(~0+group, data=d$samples) > colnames(design) <- levels(d$samples$group) > design > > # Differential expression > fit <- glmFit(d,design) > lrt <- glmLRT(fit, contrast=c(1,-1)) > topTags(lrt) > > #################################################################### > #################################################################### > > sessionInfo() > > R version 2.15.3 (2013-03-01) > Platform: x86_64-pc-linux-gnu (64-bit) > > other attached packages: > [1] edgeR_3.0.8 limma_3.14.4 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.9 years ago Mark Robinson ▴ 880

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 11 hours ago

WEHI, Melbourne, Australia

Dear Amos,

This is a quick appendix to Mark's comments. The plotMDS() error is occuring because your d$counts is a data.frame instead of a matrix. I will add a check for this in the plotMDS code.

Had you used

d <- DGEList(counts=filin[,2:5])

(see Mark's comments) or

d$counts <- as.matrix(filin[,2:5])

then the error wouldn't have occured.

Also, please upgrade to the current version.

Best wishes
Gordon

ADD COMMENT • link 11.9 years ago • updated 8.0 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon and Mark, Thank you very much for your answers. You were right. All my problems came from the way I construct my DGElist object. I used DGEList() constructor and It worked perfectly. I first constructed a DGEList object manually because looking how to test the first Case study I found a mail from Davis McCarthy ( https://stat.ethz.ch/pipermail/bioc-sig- sequencing/2011-August/002180.html) describing a way to manipulate and create a DGEList object. It worked fine so I used it for my own data. I now see some mistakes I did. Best wishes, Amos -- Dr Amos Kirilovsky CEA Institut de génomique 2 rue Gaston Cremieux CP 5706 91057 Evry cedex - France Tel: (33) 01 60 87 25 35 [[alternative HTML version deleted]]

ADD REPLY • link 11.9 years ago Amos Kirilovsky ▴ 50

Login before adding your answer.