How do I fix an odd voom plot in a combined dataset?
1
1
Entering edit mode
1sunmic2 • 0
@4b974839
Last seen 5 days ago
Canada

Hi Everyone, I'm having a bit of trouble with my voom normalization as the mean-varience plot looks extremely off. As reference, here is an image: my voom plot which as you can see, looks like a fish

For context, my dataset is a merged dataset, here is my code for my dataset:

    pan_gene_reads$gene <- pan_gene_reads$Name
STARcounts$gene <- STARcounts$Ensembl_ID
Target_gene_exp_count$gene <- Target_gene_exp_count$sample

pan_gene_reads$gene <- gsub("\\.\\d+$", "", pan_gene_reads$gene)  # Remove version numbers
STARcounts$gene <- gsub("\\.\\d+$", "", STARcounts$gene)
Target_gene_exp_count$gene <- gsub("\\.\\d+$", "", Target_gene_exp_count$gene)

combined_data_counts <- merge(pan_gene_reads, STARcounts, by = "gene", all = FALSE)
combined_data_counts <- merge(combined_data_counts, Target_gene_exp_count, by = "gene", all = FALSE)

gene_names_counts <- combined_data_counts[,1:3]
combined_data_counts$gene
combined_data_counts <- combined_data_counts[, -c(1:3)]
combined_data_counts <- combined_data_counts[, -363]
combined_data_counts <- combined_data_counts[, -546]

And here is the code for my voom normalization:

    dge1 <- DGEList(counts = combined_data_counts)

 keep <- rowSums(cpm(dge1) > 1) >= 2
 d1 <- dge1[keep, , keep.lib.sizes = FALSE]
dim(d1)

dge1 <- calcNormFactors(dge1)

dge1$samples$norm.factors


design1 <- model.matrix(~1, data = dge1$samples)


voom_data1 <- voom(d1, design = design1, plot = TRUE)

Is this plot or code bad? If so, how can I fix it? I've tried removing batch effects, which doesn't work since voom doesn't accept negative values, and further filtering, which doesn't change the plot.

limma Normalization voom • 527 views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

voom is a DE analysis method rather than a normalization method. voom expects the design matrix to be the same complete design matrix that you will use for the DE analysis, including all your experimental factors as well as any important covariates and batch effects. It is not correct to simply replace the design matrix with an intercept column.

Batch effects are handled by including the batch variables in the design matrix, not by changing the counts.

Having said all that, it is very difficult to handle merged datasets. Have you tried doing a limma-voom analysis on the individual datasets before merging?

ADD COMMENT
1
Entering edit mode

Thanks for responding! I did change the code slightly:

dge1 <- DGEList(counts = combined_data_counts)


keep <- rowSums(cpm(dge1) > 3) >= 5
 d1 <- dge1[keep, , keep.lib.sizes = FALSE]
dim(d1)

dge1 <- calcNormFactors(dge1)

dge1$samples$norm.factors

group <- factor(c(rep("healthy", 362), rep("cancer", 917)))
batch <- factor(c(rep("healthy", 362), rep("CancerTCGA", 183), rep("CancerTARGET", 734)))

design1 <- model.matrix(~0 + group)

voom_data1 <- voom(d1, design = design1, plot = TRUE)

And ended up with a slightly better plot: enter image description here

When I include batch variables in the design matrix, A) It doesn't change the plot B) I receive a warning saying partial NA coefficients for 52764 probes Any ideas how to fix this? Any help would be appreciated, thanks!

ADD REPLY
1
Entering edit mode

You will get NA coefficients from ~Group+Batch because the Group factor is completely confounded with the batch factor.

ADD REPLY
1
Entering edit mode

So I changed it to ~0 + group (with the same plot), but I still don't know if it's good or what to fix if not.

ADD REPLY
0
Entering edit mode

It looks more as you might expect, although it doesn't appear that you are filtering many of the low expressing genes out. Normally you should expect some truncation of the genes with a logCPM < 1 or so.

ADD REPLY

Login before adding your answer.

Traffic: 688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6