I'm trying to run DESeq on a dds (created from a gse using information from tximeta) in which technical replicates from the same library (run on two different lanes) have been collapsed, so instead of 20 samples, I now have 10. When I run DESeq on the collapsed dds (ddsColl), I get an error:
`Error in validityMethod(object) : nrow(design) == ncol(object) is not TRUE>`
I wonder if it's because the colData that I used for my design matrix to create dds has 20 rows, and my ddsColl has only 10 columns. I originally had 20 samples that were collapsed to 10, and my colData rows represented samples. I'm not sure how to fix this besides rerunning DESeqDataSet with a new colData, but the order of operations seems wrong, in that I would want to run DESeqDataSet before I collapse replicates. My design formula doesn't include 'lane' information, so I don't think that would be an issue in running DESeq. It only includes condition and a term for condition:replicates.nested, to account for biological replicates nested within condition (<~SPERM + SPERM:LINE.NESTED>
). My colData is as follows:
`<DataFrame with 10 rows and 8 columns
FASTQ_NAMES names SPERM LINE REP LANE LINE.NESTED runsCollapsed
<factor> <integer> <factor> <factor> <factor> <factor> <factor> <character>
14120X1_170412_D00294_0311_ACAJ3TANXX_6 1 H H08 A 1 1 1,2
14120X2_170412_D00294_0311_ACAJ3TANXX_6 3 H H08 B 2 1 3,4
14120X3_170412_D00294_0311_ACAJ3TANXX_6 5 H H08 C 3 1 5,6
14120X4_170412_D00294_0311_ACAJ3TANXX_6 7 H H20 A 4 2 7,8
14120X5_170412_D00294_0311_ACAJ3TANXX_6 9 H H20 B 5 2 9,10
14120X6_170412_D00294_0311_ACAJ3TANXX_6 11 H H20 C 6 2 11,12
14120X7_170412_D00294_0311_ACAJ3TANXX_6 13 L L08 A 7 1 13,14
14120X8_170412_D00294_0311_ACAJ3TANXX_6 15 L L08 B 8 1 15,16
14120X11_170412_D00294_0311_ACAJ3TANXX_6 17 L L17 B 9 2 17,18
14120X12_170412_D00294_0311_ACAJ3TANXX_6 19 L L17 C 10 2 19,20`
Is there another step I'm missing between collapsing replicates and DESeq? Also, I'd love confirmation that my design formula is appropriate given I'm only interested in genes that are differentially expressed between H and L SPERM, after accounting for biological replicates LINE nested within SPERM and REP nested within LINE.
Thanks Kevin. So if I understand you correctly, you're suggesting that I collapse replicates (if I absolutely have to, which I think is recommended) AFTER running a
vst()
orrld()
transformation on the dds. I will try this, but I am still not sure how to make thenrow(design)==ncol(object)
when runningDESeq()
of the collapsed dds (ddsColl
) with the original coldata (20 rows instead of 10). If I remove the even rows from mycoldata
file (makecoldata2
with 10 rows instead of 20), make a new design matrix based on that (usingmodel.matrix()
), and try to remake thedds
with the new design matrix (DESeqDataSet()
), now thenrows
of the newcoldata
don't match with thencol
of thegse
. I need more explicit steps as to how to move forward withDESeq()
after collapsing the technical replicates.Hi again, perhaps we need to distinguish between biological and technical replicates
If you choose to collapse your technical replicates, then you still start with the metadata in its complete form with all samples and replicates included. DESeq2 will, internally, handle the change in the colData. Take a look at the following example:
Create example dataset
Now collapse the replicates:
Then proceed with the
ddsColl
object.I am proceeding with the ddsColl object, and this is where I am getting the error
Error in validityMethod(object) : nrow(design) == ncol(object) is not TRUE>
If the colData is changed automatically, I'm not sure where this error is coming from then. I run the following:
ddsColl <- collapseReplicates(dds, dds$LANE, dds$names) colData(ddsColl) colnames(ddsColl)
and get
DataFrame with 10 rows and 8 columns FASTQ_NAMES names <factor> <integer> 1 14120X1_170412_D00294_0311_ACAJ3TANXX_6 1 2 14120X2_170412_D00294_0311_ACAJ3TANXX_6 3 3 14120X3_170412_D00294_0311_ACAJ3TANXX_6 5 4 14120X4_170412_D00294_0311_ACAJ3TANXX_6 7 5 14120X5_170412_D00294_0311_ACAJ3TANXX_6 9 6 14120X6_170412_D00294_0311_ACAJ3TANXX_6 11 7 14120X7_170412_D00294_0311_ACAJ3TANXX_6 13 8 14120X8_170412_D00294_0311_ACAJ3TANXX_6 15 9 14120X11_170412_D00294_0311_ACAJ3TANXX_6 17 10 14120X12_170412_D00294_0311_ACAJ3TANXX_6 19 SPERM LINE REP LANE LINE.NESTED <factor> <factor> <factor> <factor> <factor> 1 H H08 A 1 1 2 H H08 B 2 1 3 H H08 C 3 1 4 H H20 A 4 2 5 H H20 B 5 2 6 H H20 C 6 2 7 L L08 A 7 1 8 L L08 B 8 1 9 L L17 B 9 2 10 L L17 C 10 2 runsCollapsed <character> 1 1,2 2 3,4 3 5,6 4 7,8 5 9,10 6 11,12 7 13,14 8 15,16 9 17,18 10 19,20
I double check the collapseReplicates:
matchFirstLevel <- dds$LANE == levels(dds$LANE)[1] stopifnot(all(rowSums(counts(dds[,matchFirstLevel])) == counts(ddsColl[,1])))
and it runs fine.
I filter out counts under 10:
keep <- rowSums(counts(ddsColl)) >=10 ddsColl <- ddsColl[keep,] ddsColl
which gives
class: DESeqDataSet dim: 13267 10 metadata(7): tximetaInfo quantInfo ... txdbInfo version assays(1003): counts abundance ... infRep999 infRep1000 rownames(13267): FBgn0000008 FBgn0000014 ... FBgn0286933 FBgn0286940 rowData names(7): gene_id gene_name ... symbol REFSEQ colnames(10): 1 2 ... 9 10 colData names(8): FASTQ_NAMES names ... LINE.NESTED runsCollapsed
Then proceed with DESeq:
ddsColl <- DESeq(ddsColl)
and get
Error in validityMethod(object) : nrow(design) == ncol(object) is not TRUE
Thoughts? I'm stumped. (sorry for the weird formatting, can't get markdown to work properly for me today)
Just checking back to see if I can get some more help with this. Thanks.
Just checking back to see if I can get some more help with this. Thanks.
I see above that you are specifying a matrix as the
design
.I'd recommend using
~1
as a design for the object before collapsing. Then after you're done collapsing, and you need to make a matrix for the design, you can create it using colData, and provide it to thefull
argument ofDESeq()
.