Question

DESEQ2 log2FC is not following Normalized read count

0

Entering edit mode

sandeep • 0

@035d4074

Last seen 3.1 years ago

India

Enter the body of text here

Code should be placed in three backticks as shown below


Hi 
I am engaged in the analysis of bacterial infected macrophages cell-line RNA-seq data. I did mapping by using STAR aligner over human genome and used DESeq2_1.24.0. My Data example is given below.

SampleName  condition   Replicates  SampleNumber
THPI-T1 UnTr    Rep1    1
THPI-T2 UnTr    Rep2    1
THPI-T3 UnTr    Rep3    1
THPI_RGA-T1 Tr  Rep1    2
THPI_RGA-T2 Tr  Rep2    2
THPI_RGA-T3 Tr  Rep3    2

Geneid  THPI_RGA-T1 THPI_RGA-T2 THPI_RGA-T3 THPI-T1 THPI-T2 THPI-T3
ENSG00000284662 36  61  48  75  62  51
ENSG00000186827 4   11  1   10  16  19
ENSG00000186891 7   10  6   8   26  18
ENSG00000160072 36  26  32  53  92  77
ENSG00000041988 24  19  33  28  78  59


I am getting result where log2FC is not matching with respective gene raw and normalized read counts

baseMean    **log2FoldChange**  lfcSE   stat    pvalue  padj    Geneid  THPI_RGA-T1 THPI_RGA-T2 THPI_RGA-T3 THPI-T1 THPI-T2 THPI-T3
17.62892166 **2.02400919**  0.629276733 3.21640557  0.001298072 0.017763668 ENSG00000162441 **5.230429087** **3.435717614** **12.44675869** 36.62135601 15.62968585 32.40958272
64.80556587 **2.704573019** 0.40069459  6.749711843 1.48139E-11 2.83812E-09 ENSG00000131686 **9.153250902** **21.75954489** **20.74459782** 76.81552723 139.3646989 120.9957755


Here by table, it is clear, Up-regulated genes in treatment have lower normalized count than untreated samples. Even raw counts following same

Geneid **THPI_RGA-T1 THPI_RGA-T2 THPI_RGA-T3** THPI-T1 THPI-T2 THPI-T3
ENSG00000162441 **4 3 9** 41 24 45

I wish to know interpretation/mistake

Thank you in advance

print(sessionInfo())
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rJava_0.9-12                dplyr_1.0.0                 reshape2_1.4.4             
 [4] xlsx_0.6.2                  readxl_1.3.1                DESeq2_1.24.0              
 [7] SummarizedExperiment_1.14.1 DelayedArray_0.10.0         BiocParallel_1.18.1        
[10] matrixStats_0.56.0          Biobase_2.44.0              GenomicRanges_1.36.1       
[13] GenomeInfoDb_1.20.0         IRanges_2.18.3              S4Vectors_0.22.1           
[16] BiocGenerics_0.30.0         RColorBrewer_1.1-2          ggplot2_3.3.2              
[19] gplots_3.0.4                Glimma_1.12.0               edgeR_3.26.8               
[22] limma_3.40.6               

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           bit64_4.0.5            tools_3.6.1            backports_1.2.1       
 [5] R6_2.4.1               rpart_4.1-15           KernSmooth_2.23-15     Hmisc_4.4-0           
 [9] DBI_1.1.0              colorspace_2.0-0       nnet_7.3-12            withr_2.2.0           
[13] tidyselect_1.1.0       gridExtra_2.3          bit_4.0.4              compiler_3.6.1        
[17] htmlTable_2.0.1        caTools_1.18.2         scales_1.1.1           checkmate_2.0.0       
[21] genefilter_1.66.0      stringr_1.4.0          digest_0.6.25          foreign_0.8-71        
[25] XVector_0.24.0         base64enc_0.1-3        jpeg_0.1-8.1           pkgconfig_2.0.3       
[29] htmltools_0.5.0        htmlwidgets_1.5.1      rlang_0.4.7            rstudioapi_0.11       
[33] RSQLite_2.2.0          generics_0.0.2         jsonlite_1.7.0         gtools_3.8.2          
[37] acepack_1.4.1          RCurl_1.98-1.2         magrittr_1.5           GenomeInfoDbData_1.2.1
[41] Formula_1.2-3          Matrix_1.2-17          Rcpp_1.0.5             munsell_0.5.0         
[45] lifecycle_0.2.0        stringi_1.4.6          zlibbioc_1.30.0        plyr_1.8.6            
[49] grid_3.6.1             blob_1.2.1             gdata_2.18.0           crayon_1.4.1          
[53] lattice_0.20-38        splines_3.6.1          annotate_1.62.0        xlsxjars_0.6.1        
[57] locfit_1.5-9.4         knitr_1.29             pillar_1.4.6           geneplotter_1.62.0    
[61] XML_3.99-0.3           glue_1.4.1             latticeExtra_0.6-29    data.table_1.12.8     
[65] png_0.1-7              vctrs_0.3.1            cellranger_1.1.0       gtable_0.3.0          
[69] purrr_0.3.4            xfun_0.17              xtable_1.8-4           survival_3.1-8        
[73] tibble_3.0.3           AnnotationDbi_1.46.1   memoise_1.1.0          cluster_2.1.0         
[77] ellipsis_0.3.1

DESeq2 • 1.1k views

ADD COMMENT • link updated 3.1 years ago by Michael Love 42k • written 3.1 years ago by sandeep • 0

0

Entering edit mode

You should show code. Unless you explicitely set a contrast or releveled the levels of your colData then THPI_RGA would be the reference (denominator) because alphabetically (that is how order is set by default) THPI_RGA comes before THPI-T and then the results would be perfectly fine in terms of fold change direction according to these counts. See here, the first level is the reference level:

> factor(c("THPI-T", "THPI_RGA"))
[1] THPI-T   THPI_RGA
Levels: THPI_RGA THPI-T

Lets see your code.

ADD REPLY • link 3.1 years ago ATpoint ★ 4.4k

0

Entering edit mode

Here is the code

Exp <- as.data.frame(read_xlsx("RC_FC.xlsx", sheet = 1, range= NULL, col_names = TRUE)); rownames(Exp)<-Exp$Geneid

countdata <- na.omit(Exp[,(2:7)]); head(countdata)

sampleinfo<- as.data.frame(read_xlsx("RC_FC.xlsx", sheet = 3, range= NULL, col_names = TRUE))

dds <- DESeqDataSetFromMatrix(countData=countdata, colData=sampleinfo, design=~condition ); dds

keep <-rowSums(counts(dds))>=20

dds <- dds[keep, ]

DEdds <- DESeq(dds)

res<- results( DEdds, contrast = c( "condition","THPI_RGA","THPI" ) ); summary(res)

sig <- res[ which(res$padj < 0.05), ]; summary(sig)

UP<- as.data.frame(subset(sig, log2FoldChange>0)); head(UP)

UP$Geneid<-rownames(UP);rownames(UP) <- c()

Down<- as.data.frame(subset(sig, log2FoldChange <0 ));head(Down)

Down$Geneid<-rownames(Down);rownames(Down) <- c()

to merge read count

NorCount<-as.data.frame(counts(DEdds, normalized=TRUE))

NorCount$Geneid<-rownames(NorCount);rownames(NorCount) <- c()

DEG_UP<- left_join(UP, NorCount, by ='Geneid')

DEG_Down<- left_join(Down, NorCount, by ='Geneid')

ADD REPLY • link 3.1 years ago sandeep • 0

score 0 · Answer 1 · 2021-08-23

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 1 day ago

United States

There's not enough information to answer your question.

The results table (or any selection of rows of the table) when printed to the console gives the numerator and denominator of the LFC but you've edited that out so I can't help you.

ADD COMMENT • link 3.1 years ago Michael Love 42k

0

Entering edit mode

Hi Michael !

do you mean this .

log2 fold change (MLE): condition THPI_RGA vs THPI

Wald test p-value: condition THPI RGA vs THPI

DataFrame with 50189 rows and 6 columns

                    baseMean     log2FoldChange             lfcSE               stat

                   <numeric>          <numeric>         <numeric>          <numeric>

ENSG00000284662 54.1629718008399 -0.246839573124536 0.35110171112022 -0.703042922624825

ENSG00000186827 8.84987306750273 0.881275279675999 0.805056175145754 1.09467551070762

ENSG00000186891 11.1668529066813 0.501420253662518 0.680120421976238 0.737252165146772

Even i tired to flip the comparison

ADD REPLY • link 3.1 years ago sandeep • 0

0

Entering edit mode

The LFC is more related to the normalized counts not raw counts.

ADD REPLY • link 3.1 years ago Michael Love 42k

0

Entering edit mode

Yes, DESeq uses raw read, normalize and calculate LCF. I just given raw read as additional information. I never had this problem before and above given script work well with other dataset. I am not able to figure out what and where is wrong. Why normalized read counts of samples are not matching with LCF?

ADD REPLY • link 3.1 years ago sandeep • 0

0

Entering edit mode

I see no normalized counts.

Try looking at the gene with plotCounts.

ADD REPLY • link 3.1 years ago Michael Love 42k