Question

Multi-factor Comparison in DESeq 2 Issue

0

Entering edit mode

locon833 • 0

@locon833-7230

Last seen 10.0 years ago

United States

Hi all,

I'm trying to normalize my Illumina MiSeq data using Phyloseq and DESeq2. I want to look at the interactions between my site location and by date. I attempted to use the following command:

Laurendds = phyloseq_to_deseq2(Lauren_data_merged, design = ~Location + Date + Location:Date)

But am getting this error:

converting counts to integer mode
Error in DESeqDataSet(se, design = design, ignoreRank) :
the model matrix is not full rank, so the model cannot be fit as specified.
one or more variables or interaction terms in the design formula
are linear combinations of the others and must be removed

Doing some research into this issue I have seen that others have used the DESeqDataSetFromMatrix command (as in this post https://support.bioconductor.org/p/63134/), but my metadata (mapping file) and my count table(.biom file) are already merged into my Lauren_data_merged object. I'm not sure where my variables are interacting to cause this error. I'll post an example of my metadata file below:

head(Thesis_mapping)
Sample Data: [6 samples by 11 sample variables]:
X.SampleID BarcodeSequence LinkerPrimerSequence InputFileName MonthSampleTaken Location Salinity WaterTemperature
PE12 NA NA PE12.fasta July M9 26.0 30
PE15 PE15 NA NA PE15.fasta July M9 14.5 30
PE17 PE17 NA NA PE17.fasta July BB 17.0 30
PE18 PE18 NA NA PE18.fasta July M9 17.0 30
PE20 PE20 NA NA PE20.fasta August BB 22.5 30
PE21 PE21 NA NA PE21.fasta August M9 22.5 30
Season Weather Date
Wet Rain 7/10/13
Wet Rain 7/19/13
Wet Rain 7/23/13
Wet Rain 7/23/13
Wet Rain 8/3/13
Wet Rain 8/3/13

deseq2 phyloseq • 2.5k views

ADD COMMENT • link updated 10.2 years ago by Ryan C. Thompson ★ 7.9k • written 10.2 years ago by locon833 • 0

score 0 · Answer 1 · 2015-02-05

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

hi,

Edited: It looks like you don't have replicates for fitting an interaction term here. To simplify, given the head of the colData which you have show, you have 2 locations and 4 dates:

loc date
 M9    1
 M9    2
 BB    3
 M9    3
 BB    4
 M9    4

You only observe once each combination of location and date, but you are trying to fit a model which has a different estimated mean for each combination, so there would be no way to estimate the dispersion.

You should remove the interaction term, and thereby control for base level location and date effects, or generate more replicates.

ADD COMMENT • link 10.2 years ago Michael Love 43k

0

Entering edit mode

I have a total of 82 samples (I only posted the first 6 as an example). I do have replicates as I took samples weekly for an entire year at the same two locations just on different dates, so essentially it is a time series data set but I don't know what my interaction term would be. Are you aware of how I could go about setting this up in DESeq2 so that I can get variance stabilized data for further downstream analysis?

ADD REPLY • link 10.2 years ago locon833 • 0

0

Entering edit mode

for variance stabilization, just use a design of ~1, for normalization which is not influenced by the experimental design

ADD REPLY • link 10.2 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

This might be a very simple question, but since I'm very new to R and coding in general I don't quite understand everything yet, when you say just use a design of ~1 for normalization, do you mean that when I execute this command: Laurendds = phyloseq_to_deseq2(Lauren_data_merged, design = ~Location + Date + Location:Date) I could use just one of my variables: Location or Date like this: Laurendds = phyloseq_to_deseq2(Lauren_data_merged, design = ~Location) and it would variance stabilize my data?

I also was thinking about looking at my samples by month rather than by date, that way i would have multiple replicates of each site from the same month for further downstream analysis.

Thank you for your help in advance!

ADD REPLY • link 10.2 years ago locon833 • 0

0

Entering edit mode

Read over the vignette section 2.1.1 "Blind dispersion estimation" to see if you want to use blind dispersion estimation.

If you use blind=FALSE, then the design will be used by the transformation functions. If you use blind=TRUE, then it doesn't matter what you put in design, because ~1 will be substituted within the transformation functions.

I'd recommend using ~Location + Date, and blind=FALSE.

(The details are described in the vignette, but summarized: this is still almost unsupervised, because the design is only used to estimate the dispersion, and only the global trend of dispersion~mean is passed on to the transformations.)

ADD REPLY • link 10.2 years ago Michael Love 43k

score 0 · Answer 2 · 2015-02-05

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 6 months ago

Icahn School of Medicine at Mount Sinai…

Do you have multiple samples for each combination of date and location? If not, then you can't fit an interaction term, because you will have as many coefficients as samples. You can reduce the degrees of freedom of the time variable using "ns" from the "splines" package. The limma User's Guide has a section on doing this: Section 9.6.2 "Many time points". I'm not familiar with how to use a custom design matrix in DESeq2, but you can certainly analyze your data in this way using either edgeR or limma-voom.

ADD COMMENT • link 10.2 years ago Ryan C. Thompson ★ 7.9k

1

Entering edit mode

You can just put ns() in the design, or in version >= 1.7, provide the matrix to the 'full' argument of DESeq()

ADD REPLY • link 10.2 years ago Michael Love 43k

0

Entering edit mode

Thanks, much for the suggestions. I will try tomorrow.

ADD REPLY • link 10.2 years ago locon833 • 0

0

Entering edit mode

Since time-of-year data is cyclical, is there a way to tell the ns function about that so that it imposes the right boundary conditions to make it periodic?