We are performing RNA-seq analysis in R using Rsubread and edgeR. We would like to be able to create a design matrix that allows our samples to be paired together.
We have been following the edgeR quasilikelihood pipeline (Link: Chen, Lu and Smith, 2020. We modified this pipeline to use glmLRT as opposed to the glmQLRT and then attempted to include multiple parameters in our design matrix, both our treatment condition as well as the pair identifier.
Our experimental design included one phytoplankton strain that was diluted to extinction for single cell isolation (Luria-Delbruck protocol) and from that, each of six clonal isolates were used to seed an individual paired treatment sample (400 ppm/800 ppm CO2).
In doing our analysis, we want to be able to compare between present (400ppm) and future(800ppm) conditions. However, we also want to be able to account for the within clonal variability that we may see within each of the paired treatment samples. Our y object has a group column (see below) where we labeled each sample with its treatment and pair ID. Our pairs should be Present.45--Future.45, etc. We understand that the difference in name may be causing an issue but are not sure how to troubleshoot this.
Additionally, our code is working but the design matrix is not structured in a way to allow us to make the desired comparisons. Looking at the output of our design matrix code, the only comparisons we can make are between the columns (TxFuture, TxPresent, Pair46, etc) We are unable to make a comparison between the two samples that are found in Pair 46, or Pair 47. We also are unsure of how to incorporate the within clonal variability in the overall Present-Future comparison.
As a final note, we are not posting any sessionInfo() information because we aren't technically getting any type of error message.
load("~/Desktop/database/Syn_analysis/Syn_9311.Rdata")
#load library
library(Rsubread)
library(edgeR)
#call library--this is an organismal database that we curated ourselves
library(org.Ssp.CC9311.eg.db)
#modifying our DGEList object to include info from organismal database
y$genes$symbol <- mapIds(org.Ssp.CC9311.eg.db, rownames(y),
keytype = "GID", column = "GENENAME")
y$genes$product <- mapIds(org.Ssp.CC9311.eg.db, rownames(y),
keytype = "GID", column = "GENEPRODUCT")
#look at the samples included in our y object.
y$samples
Output:
group lib.size norm.factors
SS0029.S66.L008.R1.001.fastq.gz.subread.BAM Future.45 2723395 0.7895361
SS0030.S67.L008.R1.001.fastq.gz.subread.BAM Future.50 1349356 0.9642628
SS0031.S68.L008.R1.001.fastq.gz.subread.BAM Future.46 1931205 0.8817554
SS0032.S69.L008.R1b.001.fastq.gz.subread.BAM Future.47 2371530 0.9668457
SS0033.S70.L008.R1.001.fastq.gz.subread.BAM Future.48 1662338 1.0713176
SS0034.S71.L008.R1.001.fastq.gz.subread.BAM Present.48 2485131 0.9618421
SS0036.S72.L008.R1.001.fastq.gz.subread.BAM Present.49 2397884 1.1038003
SS0037.S73.L008.R1.001.fastq.gz.subread.BAM Present.45 3102922 1.0451120
SS0038.S74.L008.R1.001.fastq.gz.subread.BAM Present.50 1853192 1.1969441
SS0039.S75.L008.R1.001.fastq.gz.subread.BAM Future.49 689415 1.0852899
SS0040.S20.L002.R1.001.fastq.gz.subread.BAM Present.47 1007293 1.1487667
SS035.S19.L002.R1.001.fastq.gz.subread.BAM Present.46 934291 0.868561
#now we are creating objects of both tx and paired sample ID
Tx <- factor(targets$Tx)
Pair <- factor(targets$Pair)
#create our design matrix
design <- model.matrix(~0+Tx+Pair)
rownames(design) <- levels(group)
design
Output:
TxFuture TxPresent Pair46 Pair47 Pair48 Pair49 Pair50
[1,] 1 0 0 0 0 0 0
[2,] 1 0 0 0 0 0 1
[3,] 1 0 1 0 0 0 0
[4,] 1 0 0 1 0 0 0
[5,] 1 0 0 0 1 0 0
[6,] 0 1 0 0 1 0 0
[7,] 0 1 0 0 0 1 0
[8,] 0 1 0 0 0 0 0
[9,] 0 1 0 0 0 0 1
[10,] 1 0 0 0 0 1 0
[11,] 0 1 0 1 0 0 0
[12,] 0 1 1 0 0 0 0
attr(,"assign")
[1] 1 1 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$Tx
[1] "contr.treatment"
attr(,"contrasts")$Pair
[1] "contr.treatment"
Yunshun,
The interactive model will most likely be the way we have to approach the issue. Our entire sample set includes 6 replicates grown at 400 ppm CO2 and 6 replicates grown at 800 ppm CO2. Each of those 6 replicates were started from a single clonal culture isolated from a parent culture. We were under the impression that you could account for within clone variability by comparing the 400 and 800 sample of each clone. A "pairwise" comparison so to speak.
Thank you for the feedback! We appreciate the help.