DESeq2 Heatmap
2
0
Entering edit mode
pm_25 • 0
@4b4c4e07
Last seen 23 hours ago
United States

Hello all,

I'm new to RNASeq and bioinformatics. I'm trying to create a map of the top 500 differentially expressed genes across treatments, where I have one vehicle/control condition compared to six treatments. First, I calculated the mean expression for replicates to collapse them into a single column per condition. Next, I scaled the data and generated the heatmap.

However, I've been asked why the first column (vehicle/control) isn't entirely zero on the log fold change (logFC) expression scale. I was told that this might indicate an issue in my code, as all treatments should be normalized relative to the vehicle/control condition. Where the first column should then look all white (or at 0).

I seem to be lacking understanding in this area. Any input would be appreciated,

dds <- DESeqDataSetFromTximport(txi, colData = sampleTable,~Condition-1)

dds <- DESeq(dds) 

keep <- rowSums(counts(dds)) >= 1
dds <- dds[keep,]
rld <- rlog(dds, blind = FALSE)

data.frame(assay(rld))

rld.df <- data.frame(assay(rld))

rld.collapsed  <- t(apply(rld.df, 1, function(row) c(mean(row[1:3]), mean(row[4:6]), mean(row[7:9]),mean(row[10:12]),mean(row[13:15]),mean(row[16:18]),mean(row[19:21]),mean(row[22:24]) )))

colnames(rld.df)


colnames(rld.collapsed)  <- c("Veh_1d", "Treatment_1_1d", "Treatment_2_1d", "Treatment_3_1d",
                         "Treatment_4_1d", "Treatment_5_1d", "Treatment_6_1d", "Treatment_7_1d")


top_500_sig_genes <- rownames(res_ord_padj)[1:500]
data.frame(top_500_sig_genes)


selected_conditions <- c("Veh_1d", "Treatment_1_1d", "Treatment_2_1d", "Treatment_3_1d",
                         "Treatment_4_1d", "Treatment_5_1d", "Treatment_6_1d")

cellLine <- list(c())

rld_heatmap <- rld_collapsed_obj[, colData(rld_collapsed_obj)$Condition %in% selected_conditions]

mat <- assay(rld_heatmap)[top_500_sig_genes, ]


mat_scaled <- t(apply(mat, 1, scale))


colnames(mat_scaled) <- colnames(assay(rld_heatmap))


cols <- colorRamp2(c(-3, 0, 3), c("blue", "white", "red"))


ha <- HeatmapAnnotation(
  Treatment = as.factor(colData(rld_heatmap)$Condition),
  col = list(Treatment = c(
    "Veh_1d" = "#440154ff",
    "Treatment_1_1d" = "#440154ff",
    "Treatment_2_1d" = "#440154ff",
    "Treatment_3_1d" = "#440154ff",
    "Treatment_4_1d" = "#21908dff",
    "Treatment_5_1d" = "#21908dff",
    "Treatment_6_1d" = "#21908dff"
  )),
  show_legend = TRUE
)


ht <- Heatmap(
  mat_scaled,
  name = "Expression",
  col = cols,
  top_annotation = ha,
  show_row_names = FALSE,
  show_column_names = TRUE,
  column_names_rot = 45,
  cluster_rows = TRUE,
  cluster_columns = FALSE,
)


ppi=500
png(paste0(figures, "heatmap_DEG_top500_", ".png"), width=7.5*ppi, height=6*ppi, res=ppi)
draw(ht, row_title = "Genes", column_title = "Day 1 Vehicle vs. Treatments")
dev.off()

enter image description here

heatmaps • 112 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 15 minutes ago
United States

Two things. First, scale is already vectorized, so you don't have to use apply. You can instead just do t(scale(t(mat))) . Second, the default for scale is to convert data to z-scores, so the individual values represent the distance from the average gene expression in units of standard deviation. If you wanted a heatmap that represented deviations from some baseline, you should compute logCPM values and then subtract out the baseline (you could use sweep, which is what scale uses under the hood). I have generated heatmaps with a baseline like that in the past, but they usually look sort of dumb with one entirely white column. I find z-scores like you already have to be more informative.

0
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 2 hours ago
San Diego

However, I've been asked why the first column (vehicle/control) isn't entirely zero on the log fold change (logFC) expression scale.

You are not graphing Log fold change as compared to Wt; you are looking at expression (a little normalized with rlog) as compared to the mean across all samples. If you were looking at log fold changes, it wouldn't make sense to have the wt there, and you wouldn't have to average all the replicates, you'd already have just one value for each sample group for each gene.

ADD COMMENT

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6