Using DESeq2 for normalization of dataset with no replicates
1
0
Entering edit mode
@hadi-hosseini-23115
Last seen 4.9 years ago
USA/Memphis/University of Tennessee HSC

Hello dare,

I am a beginner in using Bioconductor and DESeq2. I have a dataset of RNAseq, which gets from RSEM tool, for different samples (no replicate). I want to use DESeq2 for the normalization of the gene counts. Here is the RNAseq dataset:

> setwd("./rsem_export_dataset")
> RSEM_output_gene_files <- file.path(getwd(), paste0(samples$sample_ID, ".genes.results"))
> names(RSEM_output_gene_files) <- samples$sample_ID
> txi.rsem <- tximport(RSEM_output_gene_files, type="rsem", txIn=FALSE, txOut=FALSE) 
> txi.rsem$length[txi.rsem$length == 0] <- 1
> dds <- DESeqDataSetFromTximport(txi.rsem, samples, ~sample_ID)

> head(assay(dds),5)[,1:3]

                                       s_00077E7AC5     s_00077E7B8A     s_00077E7B99   
ENSRNOG00000000001_AABR07013255.1            0                1                0                            
ENSRNOG00000000007_Gad1                   5780              633             4914                     
ENSRNOG00000000008_Alx4                      0                1               25                          
ENSRNOG00000000009_Tmco5b                    0                0                0                           
ENSRNOG00000000010_Cbln1                   231              219              160

My samples table has some information about 53 samples, like Age, Sex, Batch_ID, and IOP. IOP is a disease feature, which I want to calculate the correlation of IOP with some limit genes per each sample.

> head (samples,2)

                sample_ID         Sex    Batch     AgeInDays  Class_IOP   Avg_IOP   Class_Age
s_00077E7AC5    s_00077E7AC5      F      B14       151        High        24        A2_Adult
s_00077E7B8A    s_00077E7B8A      F      B12       228        Normal      12        A4_Aged

The goal is to calculate the correlation between AvgIOP and some genes like ENSRNOG00000000007Gad1 for all 53 samples. The question is I have no replicates, and I need to put ~sample_ID as a design formula. However, when I run the rlog function, as below, I have an error:

> rld <- rlog(dds, blind=FALSE)

using 'avgTxLength' from assays(dds), correcting for library size
Error in estimateDispersionsGeneEst(object, quiet = TRUE) : 
  the number of samples and the number of model coefficients are equal,
  i.e., there are no replicates to estimate the dispersion.
  use an alternate design formula

To solve this problem, I used blind=TRUE, but as I understood it is not a correct solution. Could you please guide me. Thank you so much in advance for your time.

Sincerely, Hadi

normalization deseq2 • 1.6k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Just use design=~1 for estimating rlog, to get past this error.

Alternatively, you could specify, just for example design=~Sex + Batch to account for variance in sex and batch effects.

If you want to use age, I'd recommend to scale age first.

samples$age.scaled <- scale(samples$age)

Then you could add age.scaled to the design as well if you want.

ADD COMMENT

Login before adding your answer.

Traffic: 609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6