Enter the body of text here
Code should be placed in three backticks as shown below
Hi
I am engaged in the analysis of bacterial infected macrophages cell-line RNA-seq data. I did mapping by using STAR aligner over human genome and used DESeq2_1.24.0. My Data example is given below.
SampleName condition Replicates SampleNumber
THPI-T1 UnTr Rep1 1
THPI-T2 UnTr Rep2 1
THPI-T3 UnTr Rep3 1
THPI_RGA-T1 Tr Rep1 2
THPI_RGA-T2 Tr Rep2 2
THPI_RGA-T3 Tr Rep3 2
Geneid THPI_RGA-T1 THPI_RGA-T2 THPI_RGA-T3 THPI-T1 THPI-T2 THPI-T3
ENSG00000284662 36 61 48 75 62 51
ENSG00000186827 4 11 1 10 16 19
ENSG00000186891 7 10 6 8 26 18
ENSG00000160072 36 26 32 53 92 77
ENSG00000041988 24 19 33 28 78 59
I am getting result where log2FC is not matching with respective gene raw and normalized read counts
baseMean **log2FoldChange** lfcSE stat pvalue padj Geneid THPI_RGA-T1 THPI_RGA-T2 THPI_RGA-T3 THPI-T1 THPI-T2 THPI-T3
17.62892166 **2.02400919** 0.629276733 3.21640557 0.001298072 0.017763668 ENSG00000162441 **5.230429087** **3.435717614** **12.44675869** 36.62135601 15.62968585 32.40958272
64.80556587 **2.704573019** 0.40069459 6.749711843 1.48139E-11 2.83812E-09 ENSG00000131686 **9.153250902** **21.75954489** **20.74459782** 76.81552723 139.3646989 120.9957755
Here by table, it is clear, Up-regulated genes in treatment have lower normalized count than untreated samples. Even raw counts following same
Geneid **THPI_RGA-T1 THPI_RGA-T2 THPI_RGA-T3** THPI-T1 THPI-T2 THPI-T3
ENSG00000162441 **4 3 9** 41 24 45
I wish to know interpretation/mistake
Thank you in advance
print(sessionInfo())
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] rJava_0.9-12 dplyr_1.0.0 reshape2_1.4.4
[4] xlsx_0.6.2 readxl_1.3.1 DESeq2_1.24.0
[7] SummarizedExperiment_1.14.1 DelayedArray_0.10.0 BiocParallel_1.18.1
[10] matrixStats_0.56.0 Biobase_2.44.0 GenomicRanges_1.36.1
[13] GenomeInfoDb_1.20.0 IRanges_2.18.3 S4Vectors_0.22.1
[16] BiocGenerics_0.30.0 RColorBrewer_1.1-2 ggplot2_3.3.2
[19] gplots_3.0.4 Glimma_1.12.0 edgeR_3.26.8
[22] limma_3.40.6
loaded via a namespace (and not attached):
[1] bitops_1.0-7 bit64_4.0.5 tools_3.6.1 backports_1.2.1
[5] R6_2.4.1 rpart_4.1-15 KernSmooth_2.23-15 Hmisc_4.4-0
[9] DBI_1.1.0 colorspace_2.0-0 nnet_7.3-12 withr_2.2.0
[13] tidyselect_1.1.0 gridExtra_2.3 bit_4.0.4 compiler_3.6.1
[17] htmlTable_2.0.1 caTools_1.18.2 scales_1.1.1 checkmate_2.0.0
[21] genefilter_1.66.0 stringr_1.4.0 digest_0.6.25 foreign_0.8-71
[25] XVector_0.24.0 base64enc_0.1-3 jpeg_0.1-8.1 pkgconfig_2.0.3
[29] htmltools_0.5.0 htmlwidgets_1.5.1 rlang_0.4.7 rstudioapi_0.11
[33] RSQLite_2.2.0 generics_0.0.2 jsonlite_1.7.0 gtools_3.8.2
[37] acepack_1.4.1 RCurl_1.98-1.2 magrittr_1.5 GenomeInfoDbData_1.2.1
[41] Formula_1.2-3 Matrix_1.2-17 Rcpp_1.0.5 munsell_0.5.0
[45] lifecycle_0.2.0 stringi_1.4.6 zlibbioc_1.30.0 plyr_1.8.6
[49] grid_3.6.1 blob_1.2.1 gdata_2.18.0 crayon_1.4.1
[53] lattice_0.20-38 splines_3.6.1 annotate_1.62.0 xlsxjars_0.6.1
[57] locfit_1.5-9.4 knitr_1.29 pillar_1.4.6 geneplotter_1.62.0
[61] XML_3.99-0.3 glue_1.4.1 latticeExtra_0.6-29 data.table_1.12.8
[65] png_0.1-7 vctrs_0.3.1 cellranger_1.1.0 gtable_0.3.0
[69] purrr_0.3.4 xfun_0.17 xtable_1.8-4 survival_3.1-8
[73] tibble_3.0.3 AnnotationDbi_1.46.1 memoise_1.1.0 cluster_2.1.0
[77] ellipsis_0.3.1
You should show code. Unless you explicitely set a contrast or releveled the levels of your colData then
THPI_RGA
would be the reference (denominator) because alphabetically (that is how order is set by default)THPI_RGA
comes beforeTHPI-T
and then the results would be perfectly fine in terms of fold change direction according to these counts. See here, the first level is the reference level:Lets see your code.
Here is the code
Exp <- as.data.frame(read_xlsx("RC_FC.xlsx", sheet = 1, range= NULL, col_names = TRUE)); rownames(Exp)<-Exp$Geneid
countdata <- na.omit(Exp[,(2:7)]); head(countdata)
sampleinfo<- as.data.frame(read_xlsx("RC_FC.xlsx", sheet = 3, range= NULL, col_names = TRUE))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=sampleinfo, design=~condition ); dds
keep <-rowSums(counts(dds))>=20
dds <- dds[keep, ]
DEdds <- DESeq(dds)
res<- results( DEdds, contrast = c( "condition","THPI_RGA","THPI" ) ); summary(res)
sig <- res[ which(res$padj < 0.05), ]; summary(sig)
UP<- as.data.frame(subset(sig, log2FoldChange>0)); head(UP)
UP$Geneid<-rownames(UP);rownames(UP) <- c()
Down<- as.data.frame(subset(sig, log2FoldChange <0 ));head(Down)
Down$Geneid<-rownames(Down);rownames(Down) <- c()
to merge read count
NorCount<-as.data.frame(counts(DEdds, normalized=TRUE))
NorCount$Geneid<-rownames(NorCount);rownames(NorCount) <- c()
DEG_UP<- left_join(UP, NorCount, by ='Geneid')
DEG_Down<- left_join(Down, NorCount, by ='Geneid')