Inverted Log2FC values after DEseq2 analysis
1
0
Entering edit mode
Iwan • 0
@505d46ae
Last seen 23 months ago
Sweden

Hello. I am running DEseq2 analysis on published RNAseq data consisting of 4 control samples and 4 cKO samples. However my log2FC values are inverted compared to the original publication. I think the issue is with how I define the control group (#set factor level) as the plotted data is always appears the same irrespective of what group I select (Control or cKO). I am struggling to find a solution here as I am new to DEseq2 so I am reaching out for some help. I attach the volcano plot images for comparison and my script. Thanks in advance.

Published volcano Script volcano

#load libraries
library(dplyr)
library(tidyverse)
library(GEOquery)

#prepare data
reads <-read.csv(file = "GSE172457.csv")
colnames(reads) [1] <- "gene"
colnames(reads) [2] <- "GSM5257109"
colnames(reads) [3] <- "GSM5257110"
colnames(reads) [4] <- "GSM5257111"
colnames(reads) [5] <- "GSM5257112"
colnames(reads) [6] <- "GSM5257113"
colnames(reads) [7] <- "GSM5257114"
colnames(reads) [8] <- "GSM5257115"
colnames(reads) [9] <- "GSM5257116"
reads <- reads %>% separate(gene, into = c("ENSEMBL", "gene","gene_biotype"), convert = TRUE)
reads <- reads[, -c(2,3)]
rownames(reads) <- reads$ENSEMBL
reads <- reads[, -c(1)]

#get metadata
gse<- getGEO(GEO = 'GSE172457', GSEMatrix = TRUE)
metadata <- pData(phenoData(gse[[1]]))

#select column subsets from metadata and rename columns
metadata.subset <- metadata %>%
  select(1) %>%
  rename(genotype = title) %>%
  mutate(genotype = gsub("replicate 1", "", genotype)) %>%
  mutate(genotype = gsub("replicate 2", "", genotype)) %>%
  mutate(genotype = gsub("replicate 3", "", genotype)) %>%
  mutate(genotype = gsub("replicate 4", "", genotype))

#check sample information
all(colnames(reads) %in% rownames(metadata.subset))
all(colnames(reads) == rownames(metadata.subset))

#construct a DESeqDataSet
library(DESeq2)
dataset <- DESeqDataSetFromMatrix(countData = reads,
                                  colData = metadata.subset,
                                  design = ~ genotype)

#filtering data
#only keep rows that have at least 20 reads
dataset.filter <- rowSums(counts(dataset)) >= 20
dataset <- dataset[dataset.filter,]

#set factor level
metadata.subset$genotype <- as.factor(metadata.subset$genotype)
metadata.subset$genotype <- relevel(metadata.subset$genotype, ref = "Control ")

#run DEseq with p-value of 0.0001
dataset <- DESeq(dataset)
results <- results(dataset, alpha = 0.0001)
summary(results)

#load libraries for volcano plotting
library(org.Mm.eg.db)
library(EnhancedVolcano)

#convert results to a dataframe and add gene names
results.df <- as.data.frame(results)
results.df$symbol <- mapIds(org.Mm.eg.db, keys = rownames(results.df), keytype = "ENSEMBL", column = "SYMBOL")

#volcano plot
EnhancedVolcano(results.df, x = "log2FoldChange", y = "padj", lab = results.df$symbol,
                pCutoff = 1e-4, FCcutoff = 1)


> sessionInfo( )
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.0.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] EnhancedVolcano_1.16.0      ggrepel_0.9.2               org.Mm.eg.db_3.16.0         AnnotationDbi_1.60.0        DESeq2_1.38.2              
 [6] SummarizedExperiment_1.28.0 MatrixGenerics_1.10.0       matrixStats_0.63.0          GenomicRanges_1.50.2        GenomeInfoDb_1.34.4        
[11] IRanges_2.32.0              S4Vectors_0.36.1            GEOquery_2.66.0             Biobase_2.58.0              BiocGenerics_0.44.0        
[16] forcats_0.5.2               stringr_1.5.0               purrr_1.0.0                 readr_2.1.3                 tidyr_1.2.1                
[21] tibble_3.1.8                ggplot2_3.4.0               tidyverse_1.3.2             dplyr_1.0.10                biomaRt_2.54.0             

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           fs_1.5.2               lubridate_1.9.0        bit64_4.0.5            RColorBrewer_1.1-3     filelock_1.0.2        
 [7] progress_1.2.2         httr_1.4.4             tools_4.2.2            backports_1.4.1        utf8_1.2.2             R6_2.5.1              
[13] DBI_1.1.3              colorspace_2.0-3       withr_2.5.0            tidyselect_1.2.0       prettyunits_1.1.1      bit_4.0.5             
[19] curl_4.3.3             compiler_4.2.2         cli_3.5.0              rvest_1.0.3            xml2_1.3.3             DelayedArray_0.24.0   
[25] labeling_0.4.2         scales_1.2.1           rappdirs_0.3.3         digest_0.6.31          R.utils_2.12.2         XVector_0.38.0        
[31] pkgconfig_2.0.3        dbplyr_2.2.1           fastmap_1.1.0          limma_3.54.0           rlang_1.0.6            readxl_1.4.1          
[37] rstudioapi_0.14        RSQLite_2.2.20         farver_2.1.1           generics_0.1.3         jsonlite_1.8.4         BiocParallel_1.32.5   
[43] R.oo_1.25.0            googlesheets4_1.0.1    RCurl_1.98-1.9         magrittr_2.0.3         GenomeInfoDbData_1.2.9 Matrix_1.5-3          
[49] Rcpp_1.0.9             munsell_0.5.0          fansi_1.0.3            R.methodsS3_1.8.2      lifecycle_1.0.3        stringi_1.7.8         
[55] zlibbioc_1.44.0        BiocFileCache_2.6.0    grid_4.2.2             blob_1.2.3             parallel_4.2.2         crayon_1.5.2          
[61] lattice_0.20-45        Biostrings_2.66.0      haven_2.5.1            annotate_1.76.0        hms_1.1.2              KEGGREST_1.38.0       
[67] locfit_1.5-9.6         pillar_1.8.1           geneplotter_1.76.0     codetools_0.2-18       reprex_2.0.2           XML_3.99-0.13         
[73] glue_1.6.2             data.table_1.14.6      modelr_0.1.10          png_0.1-8              vctrs_0.5.1            tzdb_0.3.0            
[79] cellranger_1.1.0       gtable_0.3.1           assertthat_0.2.1       cachem_1.0.6           xtable_1.8-4           broom_1.0.2           
[85] googledrive_2.0.0      gargle_1.2.1           memoise_2.0.1          timechange_0.1.1       ellipsis_0.3.2        
>
DESeq2 Log2FC RNASeqData • 2.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

See vignette, note on factor levels.

ADD COMMENT
0
Entering edit mode

Thanks for the reply, Michael. Doesn't the following line in my script define the factor level?

metadata.subset$genotype <- relevel(metadata.subset$genotype, ref = "Control ")

ADD REPLY
1
Entering edit mode

No because you are changing the factor levels of a column of a data.frame that was already used to create the dds.

R does not connect these objects. When you make the dds the information you provide as colData is copied into the dds. Modifying the original data.frame doesn't propagate changes to the the dds.

You can modify the information in the dds with:

colData(dds)$variable <- ...
# or equivalently, the easier/shorter form:
dds$variable <- ...
ADD REPLY
0
Entering edit mode

Thanks again for the reply. I understand now. I set the factor levels prior to creating the dataset and it worked. Cheers!

ADD REPLY

Login before adding your answer.

Traffic: 484 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6