FW: DESeq analysis
0
0
Entering edit mode
@fatemehsadat-seyednasrollah-5367
Last seen 10.3 years ago
________________________________________ From: Fatemehsadat Seyednasrollah Sent: Friday, June 29, 2012 12:00 PM To: narges [guest] Subject: RE: DESeq analysis Hi, First thanks a lot for your answer. Actually I have used a subset of a public data from Bowtie(the Montgomery) and below are the reduced codes of my work both from edgeR and DEseq. I wanted to know have I done something wrong to obtain very different answers ( 85 from DESeq and 407 from edgeR) or it is natural to have this hude difference and it is related to the algorithms? edgeR: > g1 <- read.delim ("count1.txt", row.names = 1) > head(g1) NA06994M NA07051M NA07347M NA07357M NA07000F NA07037F NA07346F ENSG00000000003 0 0 0 0 1 0 0 ENSG00000000005 0 0 0 0 0 0 0 ENSG00000000419 10 24 19 20 19 8 14 ENSG00000000457 17 15 13 18 21 18 21 ENSG00000000460 2 3 5 2 4 6 8 ENSG00000000938 20 4 35 16 10 17 19 NA10847F ENSG00000000003 0 ENSG00000000005 0 ENSG00000000419 6 ENSG00000000457 15 ENSG00000000460 2 ENSG00000000938 9 > group <- factor(rep(c("Male", "Female"), each= 4)) > dge <- DGEList(counts = g1 , group = group ) Calculating library sizes from column totals. > dge <- calcNormFactors(dge) > dge <- estimateCommonDisp(dge) > sqrt (dge$common.dispersion) [1] 0.3858996 > test <- exactTest(dge) > head(test$table) logFC logCPM PValue ENSG00000000003 -2.3441897 -3.042057 1.0000000 ENSG00000000005 0.0000000 -Inf 1.0000000 ENSG00000000419 0.5777309 3.850993 0.2791539 ENSG00000000457 -0.3054489 4.080866 0.5592668 ENSG00000000460 -0.7792622 1.966865 0.3274528 ENSG00000000938 0.3909100 3.997866 0.4269672 > sum (test$table$PValue <0.01) [1] 407 DESeq: > g1 <- read.table("count1.txt", header = TRUE, row.names = 1) > conds <- factor(rep(c("Male", "Female"), each= 4)) > dataPack <- data.frame(row.names = colnames(g1), condition =rep( c("Male", "Female"), each= 4)) > dataPack condition NA06994M Male NA07051M Male NA07347M Male NA07357M Male NA07000F Female NA07037F Female NA07346F Female NA10847F Female > cds <- newCountDataSet(g1, conds) > head(cds) CountDataSet (storageMode: environment) assayData: 1 features, 8 samples element names: counts protocolData: none phenoData sampleNames: NA06994M NA07051M ... NA10847F (8 total) varLabels: sizeFactor condition varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: > head(counts(cds) + ) NA06994M NA07051M NA07347M NA07357M NA07000F NA07037F NA07346F ENSG00000000003 0 0 0 0 1 0 0 ENSG00000000005 0 0 0 0 0 0 0 ENSG00000000419 10 24 19 20 19 8 14 ENSG00000000457 17 15 13 18 21 18 21 ENSG00000000460 2 3 5 2 4 6 8 ENSG00000000938 20 4 35 16 10 17 19 NA10847F ENSG00000000003 0 ENSG00000000005 0 ENSG00000000419 6 ENSG00000000457 15 ENSG00000000460 2 ENSG00000000938 9 > cds <- estimateSizeFactors(cds) > sizeFactors(cds) NA06994M NA07051M NA07347M NA07357M NA07000F NA07037F NA07346F NA10847F 0.8599841 1.1102643 1.0869086 1.1157556 1.1056726 1.0666049 0.9152017 0.9402086 > head(counts(cds, normalized= TRUE)) > cds <- estimateDispersions(cds) > result <- nbinomTest(cds, "Male", "Female") > nrow(subset(result, result$pval <0.01)) [1] 85 Again thank you so much With Best Regards, Narges________________________________________ From: narges [guest] [guest@bioconductor.org] Sent: Tuesday, June 26, 2012 7:17 PM To: bioconductor at r-project.org; Fatemehsadat Seyednasrollah Subject: DESeq analysis Hi all I am doing some RNA seq analysis with DESeq. I have applied the nbinomTest to my dataset which I know have many differentially expressed genes but the first problem is that the result values for "padj"column is almost NA and sometimes 1. and when I want to have a splice from my fata frame the result is not meaningful for me. -- output of sessionInfo(): res <- nbinomTest(cds, "Male", "Female") > head(res) id baseMean baseMeanA baseMeanB foldChange log2FoldChange 1 ENSG00000000003 0.1130534 0.000000 0.2261067 Inf Inf 2 ENSG00000000005 0.0000000 0.000000 0.0000000 NaN NaN 3 ENSG00000000419 14.3767155 17.162610 11.5908205 0.6753530 -0.5662863 4 ENSG00000000457 17.0174761 15.342800 18.6921526 1.2183013 0.2848710 5 ENSG00000000460 3.9414822 2.855099 5.0278659 1.7610131 0.8164056 6 ENSG00000000938 16.0894945 18.350117 13.8288718 0.7536122 -0.4081058 pval padj 1 0.9959638 1 2 NA NA 3 0.3208560 1 4 0.5942512 1 5 0.4840607 1 6 0.5409953 1 > res1 <- res[res$padj<0.1,] > head(res1) id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj NA <na> NA NA NA NA NA NA NA NA.1 <na> NA NA NA NA NA NA NA NA.2 <na> NA NA NA NA NA NA NA NA.3 <na> NA NA NA NA NA NA NA NA.4 <na> NA NA NA NA NA NA NA NA.5 <na> NA NA NA NA NA NA NA my first question is that why although I know there are some differentially expressed genes in the my data, all the padj values are NA or 1 and the second question is this "NA.1" , "NA.2", ..... which are emerged as the first column of object "res1"instead of name of genes Thank you so much Regards -- Sent via the guest posting facility at bioconductor.org.
edgeR DESeq edgeR DESeq • 1.1k views

Login before adding your answer.

Traffic: 701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6