Question

Help understanding edgeR code for doing exactTest

0

Entering edit mode

Julia • 0

@1aedfac8

Last seen 3.5 years ago

United States

I have no experience with doing RNAseq analysis and I am having trouble finding a tutorial on how to figure out what commands to use and why. I have the following code that was used by a former postdoc in my lab, but I don't even know how I'm supposed to set up the sample info text file so that I can actually start doing the analysis. I'm also not sure if this is the complete code or if it's just guidelines. I literally have only been able to make it through importing the data and assigning it to a variable.

My experimental design is a 3 bacteria system vs. 2 WT bacteria and 1 mutant, so I am trying to compare the expression of each gene in each bacterium versus the same bacterium in the mutant condition.

K <- read.csv("/Users/Amanda/Desktop/ExtraRNASeqfiles/K_keccounts.csv", header = T, row.names = 1)
names(K)
#here I used K = read.table("/Users/JN/Desktop/htseq_all.csv", header=T, row.names=1, com='') instead of the code provided above*

K.ln1 <- log(K + 1)

pairs.panels(F.ln1, gap = 0)

sampleInfo <- read.csv("/Users/Amanda/Desktop/ExtraRNASeqfiles/Kkecalldesign.csv", header = T)

# make a dgeList matrix
dgeK <- DGEList(K, group=sampleInfo$condition)
dgeK$samples
sampleInfo$file
# use above to check levels of group identification

# visualize cpm
normCounts <- cpm(dgeK)
write.csv(normCounts, file="/Users/Amanda/Desktop/ExtraRNASeqfiles/NormCountsTHOR_K_kec.csv")
pseudoNormCounts <- log2(normCounts + 1)
boxplot(pseudoNormCounts, col="plum", las=3,
        notch = T,
        main = "cpm per Sample",
        xlab = "Sample",
        ylab = "Log2 cpm")

# filter so at least 10 reads per gene
apply(dgeK$counts, 2, sum)
keep <- rowSums(cpm(dgeK)>10) >= 2
d <- dgeK[keep,]
dim(dgeK)
dim(d)

# normalize TMM
d <- calcNormFactors(d, method="TMM")
d$samples
plotMDS(d)
plotMDS(d, col=as.numeric(sampleInfo$condition))

# estimate distribution
d1 <- estimateDisp(d)
plotBCV(d1)
d1$samples$group

# exactTest 
etKkec <- exactTest(d1, pair = c(1,2))
etKkec

#check total number of regulated genes
del1 <- decideTestsDGE(etKkec, adjust.method = "BH", p.value = 0.05, lfc = 1)
summary(del1)

del2 <- decideTestsDGE(etKkec, adjust.method = "BH", p.value = 0.05)
summary(del2)

topTags(etKkec)

etKkectop <- topTags(etKkec, n = nrow(etKkec$table), adjust.method = "BH", p.value = 0.05)$table
write.csv(etKkectop, file="/Users/Amanda/Desktop/etKkectop.csv")

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] edgeR_3.36.0 limma_3.50.0

loaded via a namespace (and not attached):
[1] BiocManager_1.30.16 compiler_4.1.1     
[3] tools_4.1.1         Rcpp_1.0.7         
[5] grid_4.1.1          locfit_1.5-9.4     
[7] lattice_0.20-45

exacttest RNASeqData edgeR diffGeneAnalysis • 1.9k views

ADD COMMENT • link updated 3.5 years ago by Gordon Smyth 52k • written 3.5 years ago by Julia • 0

score 1 · Answer 1 · 2021-10-28

Use of edgeR is described in the edgeR User's Guide:

https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

You can also type edgeRUsersGuide() at the R prompt to access the User's Guide.

Another extensive edgeR tutorial is available here: https://www.bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html