Entering edit mode
I am using edgeR to analyze RNA-Seq data. This is my script:
library("edgeR")
#############################
#read in metadata & DGE
#############################
composite_samples <- read.csv(file="samples.csv",header=TRUE,sep=",")
counts <- readDGE(composite_samples$CountFiles)$counts
#############################
#Filter & Library Size Re-set
#############################
noint <- rownames(counts) %in% (c("no_feature", "ambiguous",
"too_low_aQual", "not_aligned", "alignment_not_unique"))
cpms <- cpm(counts)
keep <- rowSums(cpms>1)>=3 & !noint
counts <- counts[keep,]
colnames(counts) <- composite_samples$SampleName
d <- DGEList(counts=counts, group=composite_samples$Condition)
d$samples$lib.size <- colSums(d$counts)
#############################
#Normalisation
#############################
d <- calcNormFactors(d)
#############################
#Recording the normalized counts
#############################
all_cpm=cpm(d, normalized.lib.size=TRUE)
all_counts <- cbind(rownames(all_cpm), all_cpm)
colnames(all_counts)[1] <- "Ensembl.Gene.ID"
rownames(all_counts) <- NULL
#############################
#Estimate Dispersion
#############################
d <- estimateCommonDisp(d)
d <- estimateTagwiseDisp(d)
#############################
#Perform a test
#############################
de_ctl_mo_composite <- exactTest(d, pair=c("NY", "N"))
I believe that the variable "all_counts" shall contain the normalized
counts for each sample in each condition. My understanding is also
that
edgeR adds pseudocounts BEFORE performing the library normalisation.
Thus
it is possible that some values revert to being zero after
normalisation.
But I thought that this would happen rarely. Yet in a recent dataset I
find
an improbably large number of zero values in "all_counts" which made
me
think that my understanding of how pseudocounts and normalisation work
in
edgeR might be incorrect. Can, please, somebody comment on this?
[[alternative HTML version deleted]]