Hi,
I have a question, we have generated pilot gene expression data with one sample per condition per subject. I am using edgeR
package in R by following manual (2.12 What to do if you have no replicates
). I have a control (baseline) sample followed by 5 different samples. I see that the below code tries to compare only two samples at a time (R Code: Two samples at a time - for instance; compares only Control vs Sample_1
), and does not compare others; Control vs Sample_2, Control vs Sample_3, Control vs Sample_4, Control vs Sample_5
). For this purpose, I am sub-setting the data accordingly to set only two samples at a time (both raw counts and sample metadata data.frame
) then use the below code by repeating the same three times for each comparison and exporting the differential analysis results. I felt that is tedious process to do one at a time, hence, I tried to loop to consider and export data table for all comparisons, however, here the problems seems like the logcpm
values are exported same values for all comparisons (which is not correct), but logFC
and p-value
columns are fine.
Basically, I would like to export the consolidated csv
file from the object et$table
or topTags
for each comparison. For instance, Control vs Sample_1, Control vs Sample_2 till Control vs Sample_5.
Is there a better way to do this? Also, is it necessary to run the calcNormFactors
before performing the exactTest
?
print(Counts_Test)
Control Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
Gene1 0 0.0 0.0 0 0 0.0
Gene2 184 140.5 169.0 107 221 120.5
Gene3 60 64.0 45.0 67 44 45.0
Gene4 0 0.0 1.0 0 0 1.0
Gene5 7 4.0 3.0 5 1 1.0
Gene6 0 0.0 0.0 0 0 0.0
Gene7 87 83.0 122.0 99 139 100.0
Gene8 0 0.0 0.0 0 0 0.0
Gene9 0 1.0 0.0 0 0 0.0
Gene10 21 51.5 36.5 63 48 44.5
Gene11 193 199.0 179.0 178 222 202.0
Gene12 29 25.0 20.0 34 24 39.0
Gene13 0 0.0 0.0 0 1 1.0
Gene14 0 0.0 0.0 0 0 0.0
Gene15 3 5.0 1.0 2 5 3.0
Gene16 50 62.0 58.0 60 67 76.0
Gene17 0 0.0 0.0 0 0 0.0
Gene18 325 525.0 494.0 467 612 719.0
Gene19 442 407.0 570.0 283 451 681.0
print(Sample_Grouping)
SampleID ID
Control xx-xx-1551 Control
Sample_1 xx-xx-1548 Sample_1
Sample_2 xx-xx-1549 Sample_2
Sample_3 xx-xx-1550 Sample_3
Sample_4 xx-xx-1552 Sample_4
Sample_5 xx-xx-0093 Sample_5
R Code: Two samples at a time
library(edgeR)
bcv <- 0.4
y <- DGEList(counts=Counts_Test, group=Sample_Grouping$ID)
et <- exactTest(y, dispersion=bcv^2)
View(et$table)
write.csv(et$table, file="./table_Control_vs_Sample_1", sep = ",")
R Code: All samples comparisons at a time
sample_names <- c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")
library(edgeR)
bcv <- 0.4
y <- DGEList(counts=Counts_Test, group=Sample_Grouping$ID)
for(cur_name in sample_names){
et <- exactTest(y, pair=c("Control", cur_name), dispersion=bcv^2)
if(cur_name=="Sample_1"){
# In the first iteration, capture the order
geneOrder <- row.names(et$table)
}else{
# In the subsequent iterations, enforce the order
et$table <- et$table[geneOrder,]
}
# Now, you can write
write.csv(et$table, file=paste0("./table_Control_vs_",cur_name, ".csv"))
}
# The if/else statement will ensure the order is the same for all
Gordon Smyth Hi, thank you very much for the prompt response. Sure, I will create a
DGEList object
from a table of counts and samples >filter non-expressed genes
> performcalcNormFactors
>exactTest
>topTags
.Yes, this was the expect consolidated data table I was looking.
I have two related questions;
I have printed the consolidated the below. The
logcpm
column for all comparisons remains same. Is this known issue for single sample analysis?I have currently used
bcv <- 0.4
value to account for dispersion? Does this value vary from dataset to dataset w.r.t human samples? I tried to calculate dispersion using section "2.10.1 Estimating dispersions". It gives me below message:You questions are answered by the edgeR documentation.
I thought from your question that you already understood that you have no replication and therefore can't estimate the dispersion.