Question

DESeq2 filter out my expression gene

0

Entering edit mode

jarod_v6@libero.it ▴ 40

@jarod_v6liberoit-6654

Last seen 6.2 years ago

Italy

Hi!

I have a set of experiments where I try to detect the differential expression genes between the over-expression of a particular fusion genes. One of the gene of the chimera are differential express the other no. This is strange because we have some evidence on the wet lab,

Another think is I found only few differential expression genes. This is strange.

out of 15361 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 21, 0.14%
LFC < 0 (down)   : 3, 0.02%
outliers [1]     : 14, 0.091%
low counts [2]   : 0, 0%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

in some comparison I obtain only this:

out of 15530 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 2, 0.013%
LFC < 0 (down)   : 0, 0%
outliers [1]     : 100, 0.64%
low counts [2]   : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I know one of the gene of the chimera are probably filter out as outliers for this reason:

ENSG ... 1991.4851208834

2.2929902325

0.3341579938

6.8619942512

NA

Even if I use this comand:

dds <- DESeq(dds,minReplicatesForReplace=Inf)

I obtain the same problem.

What Can I do?

R version 3.2.3 (2015-12-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] edgeR_3.12.0               pheatmap_1.0.8             genefilter_1.52.1          ggplot2_2.0.0             
 [5] limma_3.26.7               RColorBrewer_1.1-2         gplots_2.17.0              org.Hs.eg.db_3.2.3        
 [9] RSQLite_1.0.0              DBI_0.3.1                  annotate_1.48.0            XML_3.98-1.3              
[13] AnnotationDbi_1.32.3       DESeq2_1.10.1              RcppArmadillo_0.6.500.4.0  Rcpp_0.12.3               
[17] SummarizedExperiment_1.0.2 Biobase_2.30.0             GenomicRanges_1.22.4       GenomeInfoDb_1.6.3        
[21] IRanges_2.4.6              S4Vectors_0.8.11           BiocGenerics_0.16.1        biomaRt_2.26.1            

loaded via a namespace (and not attached):
 [1] statmod_1.4.24       gtools_3.5.0         locfit_1.5-9.1       splines_3.2.3        lattice_0.20-33     
 [6] colorspace_1.2-6     yaml_2.1.13          survival_2.38-3      foreign_0.8-66       BiocParallel_1.4.3  
[11] lambda.r_1.1.7       plyr_1.8.3           zlibbioc_1.16.0      munsell_0.4.2        gtable_0.1.2        
[16] futile.logger_1.4.1  caTools_1.17.1       labeling_0.3         latticeExtra_0.6-26  geneplotter_1.48.0  
[21] acepack_1.3-3.3      KernSmooth_2.23-15   xtable_1.8-0         scales_0.3.0         gdata_2.17.0        
[26] Hmisc_3.17-1         XVector_0.10.0       gridExtra_2.0.0      digest_0.6.9         grid_3.2.3          
[31] tools_3.2.3          bitops_1.0-6         RCurl_1.95-4.7       Formula_1.2-1        cluster_2.0.3       
[36] futile.options_1.0.0 rpart_4.1-10         nnet_7.3-12

deseq2 • 3.6k views

ADD COMMENT • link updated 9.2 years ago by Michael Love 43k • written 9.2 years ago by jarod_v6@libero.it ▴ 40

score 1 · Answer 1 · 2016-02-09

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

Can you show an MA plot as well? I would suggest trying betaPrior=FALSE here as the prior assumption may not hold.

ADD COMMENT • link 9.2 years ago Michael Love 43k

1

Entering edit mode

Everything looks fine in your MA plot. Most genes in your experiment have small fold change. As your sample size increases you could detect more differences as statistically significant, but with 3 vs 6 you only detect the very strongly induced genes, of which there are few.

ADD REPLY • link 9.2 years ago Michael Love 43k

0

Entering edit mode

Here you have plotMA. (http://imgur.com/a/d2KwN)

I use also collectRNaseq metrics of picards tools. Do not seem to have strange situation.

What do you suggest?

ADD REPLY • link 9.2 years ago jarod_v6@libero.it ▴ 40

1

Entering edit mode

It helps to widen the ylim to see what's going on. You should be fine with betaPrior=FALSE and cooksCutoff=FALSE.

ADD REPLY • link 9.2 years ago Michael Love 43k

0

Entering edit mode

using this condition I obtain the genes as differential expression but I have however only 21 up and 3 down.

thanks so much!

ADD REPLY • link 9.2 years ago jarod_v6@libero.it ▴ 40

score 0 · Answer 2 · 2016-02-08

First thing is to see if you can convince yourself that the data tells you that your gene of interest is differentially expressed.

If you use the plotCounts function from DESeq2 to plot the expression of your gene of interest, does the data support your intuition?

Have you looked PCA or MDS plots of your data to how strong your treatment effect is? From the results you show, it seems like the effect is rather minimal.

How many replicates do you have? When effect sizes are minimal, higher replication can give you more power to detect differential expression.

You say you have "some evidence in the wet lab". Presumably you've done some qPCR on a panel of genes to look at the effect of over-expression of your fusion gene on a handful of transcripts? Does your RNA-seq show similar changes in expression as the evidence you are referring to across the same genes?

score 0 · Answer 3 · 2016-02-08

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 9 weeks ago

EMBL European Molecular Biology Laborat…

I suggest doing quality assessment / control to see whether all of the libraries are good. Garbage in → garbage out.

The DESeq2 vignette is one place to find suggestions for quality related plots, but also consider statistics such as number of reads, number of uniquely aligned reads, etc.

ADD COMMENT • link 9.2 years ago Wolfgang Huber ★ 13k

score 0 · Answer 4 · 2016-02-08

0

Entering edit mode

jarod_v6@libero.it ▴ 40

@jarod_v6liberoit-6654

Last seen 6.2 years ago

Italy

Thanks so much for the suggestions:

So I have 3 replicated for condition and 6 controls. In the ctrl I have no expression of my fusion gene or over-expression of GFP. The chimera are composed by both gene. One is differentialy expressed the other are not reported. Programs for the detection of fusion like FusionCatcher are able to detect the fusion

For the evidence of wet lab I have qpcr and western blot of the expression of this chimera and this genes. Here you have a plot counts show the value of that gene are not considered http://imgur.com/tOYJx9Y :

Here you can found a pca and show what do you suggest but I have 3 replicates. http://imgur.com/wl9rGVK

All the samples have more that 27 ML of unique reads aligned. I did one set of experiments using 40ML of reads but no increase power was detected.

Any suggestion?

ADD COMMENT • link 9.2 years ago jarod_v6@libero.it ▴ 40

0

Entering edit mode

I'm sorry, but I don't understand: when you say this:

For the evidence of wet lab I have qpcr and western blot of the expression of this chimera and this genes. Here you have a plot counts show the value of that gene are not considered

What do you mean by "the value of that gene are not considered"?

ADD REPLY • link 9.2 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

The filtering of p-values occurs in results(), so if you want to turn off filtering here (if you think it is not working on this dataset) you should use:

results(dds, cooksCutoff=FALSE)

ADD REPLY • link 9.2 years ago Michael Love 43k

0

Entering edit mode

I have another question because not in all samples work. I mean some samples I have only 2 differential expression gene and are my genes I have use in my experiments.

In other case this number of replicates works and Now I want to understand when is important to use cooksCutoff=False and betaPrior=FALSE?

If I use only coding sequencing genes can help to have more differential genes?

ADD REPLY • link 9.2 years ago jarod_v6@libero.it ▴ 40

0

Entering edit mode

Detecting differences depends on the size of the difference, the amount of variation with group, and sample size. 3 is basically the minimum recommended sample size for a condition so it's expected that your only detect a few of the largest differences. Other experiments may have bigger differences, less variation, or more samples, and so find more differences. Here I recommended turning off beta prior because you have two very very large fold changes with almost all other genes having small fold changes. The beta prior doesn't work well in this one situation. I recommended turning off filtering based on a Cooks distance cutoff because it seemed to be too aggressive in this case. I can't give a general suggestion when to turn it off though, just to say you should inspect the genes which are filtered.

ADD REPLY • link 9.2 years ago Michael Love 43k

0

Entering edit mode

thanks!!!

And for the use of only coding genes?

ADD REPLY • link 9.2 years ago jarod_v6@libero.it ▴ 40

0

Entering edit mode

I don't really have an opinion or experience on using this information.

ADD REPLY • link 9.2 years ago Michael Love 43k

0

Entering edit mode

Sorry ,I mean I have this results on the results file . I don't have any pvalue or padjust

ENSG ... 1991.4851208834

2.2929902325

0.3341579938

6.8619942512

NA

This I obtained before introduce modification suggested by Michael. Using this Work

ADD REPLY • link 9.2 years ago jarod_v6@libero.it ▴ 40

0

Entering edit mode

Things are getting a bit confused and hard to follow. Can you show me exactly the DESeq() and results() lines that produced an NA pvalue? If you use the arguments i suggested, there should not be filtering.

ADD REPLY • link 9.2 years ago Michael Love 43k

0

Entering edit mode

........................

ADD REPLY • link 9.2 years ago Michael Love 43k

0

Entering edit mode

When I use what you suggest I have the value and not NA. Sorry for the confusion I refer on the starting of the discussion when I use dds<-DESeq(dds) . Now it is ok.

ADD REPLY • link 9.2 years ago jarod_v6@libero.it ▴ 40