Question

Use cpm() in edgeR and voom() in limma to find genes that have low expression in individual tumors

1

Entering edit mode

Frocha ▴ 20

@frocha-12039

Last seen 6.9 years ago

I have RNA-Seq count data for two groups of samples (normal vs tumor). For a specific gene, I wish to find the tumor samples that have low gene expression (compared to the samples in the normal group). The "low" expression means the (possibly transformed) log(cpm) has a z-score < -2, where the z-score is calculated using mean and standard deviation of the normal samples.

In order to do this, should I first filter the genes with low counts (as I usually do for differentially expressed gene analysis), using the results of cpm() function in edgeR package? If so, how? After the gene filtering and TMM normalization, should I use voom() in limma package to transform log(cpm) to values that are normally distributed?

gene filtering edger limma • 5.5k views

ADD COMMENT • link updated 7.2 years ago by Gordon Smyth 52k • written 7.2 years ago by Frocha ▴ 20

0

Entering edit mode

I'm not quite clear which analysis you are wanting to do with voom. Have you already computed the z-scores yourself, or are you asking how to use voom to compute the z-scores?

ADD REPLY • link 7.2 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks, Gordon.

I wish to identify the tumor samples with low gene expression (compared to normal samples), and then try to see whether the corresponding patients have survival rates that are different from the survival rates of other patients.

My question then is: how to best compute the z-scores?

ADD REPLY • link 7.2 years ago Frocha ▴ 20

0

Entering edit mode

Are you after one z-score for each gene, or a z-score for each individual tumor for each gene?

ADD REPLY • link 7.2 years ago Gordon Smyth 52k

0

Entering edit mode

Sorry for confusing. I wish to get the z-score for each individual tumor for each gene.

ADD REPLY • link 7.2 years ago Frocha ▴ 20

Gordon Smyth · Answer 1 · 2018-02-06

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

You should always filter out genes that have consistently very low counts, and the guidelines for voom are the same as for edgeR. You could for example use:

keep <- filterByExpr(dge, design)
dge <- dge[keep,,keep.lib.size=FALSE]
dge <- calcNormFactors(dge)

There is no change to this even if you wish to find genes that have very low expression in one group, although you could try reducing the filter thresholds a little if you want to live dangerously.

You want to find genes that are down-regulated in individual tumors relative to normal. Let's assume that the factor "Tumor" takes values "Normal", "Tumor1", "Tumor2" etc. All the normal samples should have the same name but each distinct tumor should have a different name.

Tumor <- relevel(Tumor, ref="Normal")
design <- model.matrix(~Tumor)

Now we can just do a regular voom analysis:

v <- voom(dge,design)
fit <- lmFit(v,design)
fit <- eBayes(fit, robust=TRUE)

The usual limma tests will tell you which genes are down in which tumors. For example:

topTable(fit, coef="Tumor:Tumor1")

will show you which genes are down-regulated in Tumor1 relative to normal.

You don't need to compute z-scores explicitly. A z-score < -2 corresponds to a p-value of 0.0455:

> 2*pnorm(-2)
[1] 0.04550026

To find genes with zscore < -2 in individual tumors, you can simply use:

Low <- (fit$p.value[,-1] < 2*pnorm(-2)) & (fit$coef[,-1] < 0)

This will give you a matrix of genes by tumors, which an entry of TRUE if z-score < -2 and FALSE if z-score > -2.

Note the use of "[,-1]" in the previous code line, which simply gets rid of the intercept term.

ADD COMMENT • link 7.2 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks!

One question: should we put the design<-model.matrix(~Tumor) line at the beginning?

In fact, the goal is not to do differentially expressed gene analysis. The goal is to calculate z-scores for the tumor sample so that we can divide the tumor samples into two groups: those with z-score <-2 and those with z-score > -2.

ADD REPLY • link updated 7.2 years ago by Gordon Smyth 52k • written 7.2 years ago by Frocha ▴ 20

0

Entering edit mode

OK, I understand what you want now and have revised my answer accordingly.

ADD REPLY • link 7.2 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you very much!

ADD REPLY • link 7.2 years ago Frocha ▴ 20

0

Entering edit mode

I am sorry that I have a new question: there is no p.value information in fit after I run the last line. Is it because that each tumor sample is treated as a single group? If so, how to identify the tumor samples with z-score < -2?

ADD REPLY • link 7.2 years ago Frocha ▴ 20

0

Entering edit mode

eBayes always produces p-values, so what you say cannot be correct.

You may have made a mistake earlier in the code, but you provide no information so I am not in a position to debug it for you. The code that I provided does work and it does produce p-values.

The fact that every tumor is a single group is not a problem. You do however need more than one Normal sample.

ADD REPLY • link 7.2 years ago Gordon Smyth 52k

0

Entering edit mode

You are right. The problem is that the "statmod" package was not installed on my computer, thus the line

fit <- eBayes(fit, robust=TRUE)

was not executed (I did not notice that). After I installed the "statmod" package, the problem is solved.

Thank you very much!

ADD REPLY • link 7.2 years ago Frocha ▴ 20