perform DESeq2 based on paired sample
2
0
Entering edit mode
@xiaofeiwang18266-13498
Last seen 16 months ago
Singapore

Enter the body of text here

I have 3 questions about Paired Test in DESeq2.

First, I'd like to double check the design for the paired test. Here is the link https://support.bioconductor.org/p/58893/ , I got the answer. This design is for paired test. But, it looks like a two-way factorial without interaction for me. Could you confirm me it is similar to paired t-test?

design(dds) <- ~ patient + condition

Where 'patient' and 'condition' are factors which are columns in
colData(dds), and condition has levels: normal, tumor (where normal is
the
base level).

Then results(dds) will build a result table for tumor vs normal,
controlling for the patient effect.

Here is my colData:

colData(dds_matched)
DataFrame with 144 rows and 3 columns
                                                                                    Barcodes sample_type      patient
                                                                                  <character>    <factor>     <factor>
TCGA-A3-3358-01A-01R-1541-07 TCGA-A3-3358-01A-01R..      tumour TCGA_A3_3358
TCGA-A3-3358-11A-01R-1541-07 TCGA-A3-3358-11A-01R..      normal TCGA_A3_3358
TCGA-A3-3387-01A-01R-1541-07 TCGA-A3-3387-01A-01R..      tumour TCGA_A3_3387
TCGA-A3-3387-11A-01R-1541-07 TCGA-A3-3387-11A-01R..      normal TCGA_A3_3387
TCGA-B0-4700-01A-02R-1541-07 TCGA-B0-4700-01A-02R..      tumour TCGA_B0_4700
...                                             ...         ...          ...

Second, I have 72 patients, and each patient has tumour and normal data, so I have 144 samples in total. I used the design above, but it took a long time to run. Also, I got this as below. While, I also run this design WO considering the matching information "design =~ sample_type", it finished very quickly and WO anything reported.

dds_matched <- DESeqDataSetFromMatrix(expressionData.matched.for.DESeq, colData = condition_table, design =~ sample_type+patient)

dds_DE_matched <- DESeq(dds_matched)


estimating size factors
estimating dispersions
gene-wise dispersion estimates
2021-02-05 21:05:03.118 R[48867:3744773] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (1.25 secs).
2021-02-05 21:05:14.738 R[48867:3744773] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (1.09 secs).
2021-02-05 21:05:34.790 R[48867:3744773] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (1.05 secs).
2021-02-05 21:13:01.764 R[48867:3744773] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (2.30 secs).
2021-02-05 21:13:11.213 R[48867:3744773] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (2.44 secs).
2021-02-05 21:13:14.387 R[48867:3744773] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (2.37 secs).
mean-dispersion relationship
final dispersion estimates
2021-02-05 21:26:30.894 R[48867:3744773] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (2.02 secs).
2021-02-05 21:28:05.292 R[48867:3744773] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (1.93 secs).
2021-02-05 21:33:08.983 R[48867:3744773] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (1.89 secs).
fitting model and testing
15 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Third, if I used this design "design =~ sample_type+patient", how should I feed the value for contrast? If it is 2-way factorial design, this means the effect of sample type. But, my experiment is paired data.

res_DE_matched <- results(dds_DE_matched, alpha=0.05, contrast=c("sample_type","tumour","normal"))

Thanks you so much!

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
DESeq2 • 1.4k views
ADD COMMENT
2
Entering edit mode

Oh, and the IMKClient Stall has nothing to do with DESeq2. I'd recommend to consult with your local IT.

ADD REPLY
3
Entering edit mode
@mikelove
Last seen 5 days ago
United States

Yes, that is the design for a paired test (also it is listed in the FAQ in the DESeq2 vignette).

Yes, it is slower to perform such a paired analysis than to ignore the pairing information.

I recommend to use a design of ~patient + condition, and then either just call results() (which will give you a Wald test of the condition effect, with the reference level in the denominator), or you can use the LRT as shown below in the glmGamPoi code, which will test ~patient + condition vs ~patient.

Here on my machine with simulated data, 500 genes takes 62 seconds on my laptop using a single core, and an alternative method shown below takes 30 seconds.

With a large experimental design, I'd recommend performing filtering to ensure that a good number of samples have a minimal count, before running DESeq. If you were to filter to ~10,000 genes, the analysis should take 20 min (or 10 min with glmGamPoi), which isn't too slow. Also this is just running on a laptop with a single core, and isn't making use of multiple cores which is easy in DESeq2 (see vignette). I think multiple cores is not yet implemented for glmGamPoi analysis though.

E.g.:

keep <- rowSums(counts(dds) >= 10) >= X
dds <- dds[keep,]

where you may pick X to be the minimum number of samples that you want to have a minimal count, in order to be interested in that gene.

> dds <- makeExampleDESeqDataSet(n=500, m=140)
> dds$condition <- factor(rep(1:2,each=70))
> dds$patient <- factor(rep(1:70,times=2))
> design(dds) <- ~patient + condition
> system.time({ dds <- DESeq(dds, minRep=Inf) })
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
   user  system elapsed
 55.505   0.741  62.502

Note, fitting the model is 2x faster with glmGamPoi:

> system.time({ dds <- DESeq(dds, minRep=Inf, fitType="glmGamPoi", reduced=~patient, test="LRT") })
using pre-existing size factors
estimating dispersions
found already estimated dispersions, replacing these
gene-wise dispersion estimates
using 'glmGamPoi' as fitType. If used in published research, please cite:
    Ahlmann-Eltze, C., Huber, W. (2020) glmGamPoi: Fitting Gamma-Poisson
    Generalized Linear Models on Single Cell Count Data. bioRxiv.
    https://doi.org/10.1101/2020.08.13.249623
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Fit reduced model
Calculate quasi likelihood ratio
Preprare results
   user  system elapsed
 29.069   0.340  30.326
ADD COMMENT
0
Entering edit mode

Does it matter, if the order is switched? The results of ~patient + condition is same as ~condition + patient?

ADD REPLY
1
Entering edit mode

This is answered in the vignette. It only matters if you provide no arguments to results.

ADD REPLY
0
Entering edit mode
@xiaofeiwang18266-13498
Last seen 16 months ago
Singapore

Went through the details of vignette and plus your patience to answer the duplicated questions, it is clearly for me now. Thanks a lot!

ADD COMMENT

Login before adding your answer.

Traffic: 395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6