Hi everyone,
I've been trying to use DESeq2 to look at expression differences across different gene, with 17 samples across 2 groups.
My count data looks as follows:
> head(phylum_top10)
NGm1 NGm2 NGm3 NGm4 NGm6 NGm7 NGm8 NGm9 NGs1 NGs2 NGs3 NGs4 NGs5 NGs6 NGs7 NGs8 NGs9
gen1 18185 9787 15737 21988 15669 8490 13135 6740 9324 23279 14517 17634 21207 16377 10719 10721 6750
gen2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
gen3 8 0 332 678 0 0 0 0 0 0 0 0 0 0 0 79 0
gen4 0 0 175 203 123 0 0 0 0 0 0 64 0 0 0 0 0
gen5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
gen6 0 0 162 22 616 0 0 0 0 0 0 0 0 0 0 6 0
> colData
Site Species
NGm1 Site1 Gm
NGm2 Site1 Gm
NGm3 Site1 Gm
NGm4 Site2 Gm
NGm6 Site2 Gm
NGm7 Site3 Gm
NGm8 Site3 Gm
NGm9 Site3 Gm
NGs1 Site1 Gs
NGs2 Site1 Gs
NGs3 Site1 Gs
NGs4 Site2 Gs
NGs5 Site2 Gs
NGs6 Site2 Gs
NGs7 Site3 Gs
NGs8 Site3 Gs
NGs9 Site3 Gs
I tried:
dds <- DESeqDataSetFromMatrix(phylum_top10, colData, design = ~Species)
htseq_rawData <- counts(dds, F)
dds <- dds[rowSums(counts(dds)) > 0]
head(htseq_rawData)
NGm1 NGm2 NGm3 NGm4 NGm6 NGm7 NGm8 NGm9 NGs1 NGs2 NGs3 NGs4 NGs5 NGs6 NGs7 NGs8 NGs9
gen1 18185 9787 15737 21988 15669 8490 13135 6740 9324 23279 14517 17634 21207 16377 10719 10721 6750
gen3 8 0 332 678 0 0 0 0 0 0 0 0 0 0 0 79 0
gen4 0 0 175 203 123 0 0 0 0 0 0 64 0 0 0 0 0
gen6 0 0 162 22 616 0 0 0 0 0 0 0 0 0 0 6 0
gen10 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0
the above is normal, but the following normalization is not.
> dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 3 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
Warning message:
In lfproc(x, y, weights = weights, cens = cens, base = base, geth = geth, :
Estimated rdf < 1.0; not estimating variance
> deseq_rawData <- counts(dds, T)
> deseq_rawData <- as.data.frame(deseq_rawData)
> head(deseq_rawData)
NGm1 NGm2 NGm3 NGm4 NGm6 NGm7 NGm8 NGm9 NGs1 NGs2 NGs3
gen1 13174.911005 13174.91 13174.9110 13174.91101 13174.9110 13174.91 13174.91 13174.91 13174.91 13174.91 13174.91
gen3 5.795947 0.00 277.9482 406.24839 0.0000 0.00 0.00 0.00 0.00 0.00 0.00
gen4 0.000000 0.00 146.5088 121.63484 103.4217 0.00 0.00 0.00 0.00 0.00 0.00
gen6 0.000000 0.00 135.6253 13.18210 517.9491 0.00 0.00 0.00 0.00 0.00 0.00
gen10 0.000000 0.00 0.0000 11.38454 0.0000 0.00 0.00 0.00 0.00 0.00 0.00
NGs4 NGs5 NGs6 NGs7 NGs8 NGs9
gen1 13174.91101 13174.91 13174.91 13174.91 13174.91101 13174.91
gen3 0.00000 0.00 0.00 0.00 97.08217 0.00
gen4 47.81639 0.00 0.00 0.00 0.00000 0.00
gen6 0.00000 0.00 0.00 0.00 7.37333 0.00
gen10 0.00000 0.00 0.00 0.00 0.00000 0.00
I don't know why the count of gene 1 in all samples are the same after normalization. I can't figure out what the issue is, and would really appreciate any help! I'd just like to investigate differential expression across different species. Thanks so much.
Thank you so much for your help. This parameter minReplicatesForReplace=Inf doesn't seem to work either. I'll try again with some other software. Best wishes!