Fitted values from EdgeR
2
1
Entering edit mode
R ▴ 40
@r-5604
Last seen 3.7 years ago
Germany

Hi all, 

I wonder of the values in lrt$fitted.values are cpm normalized values?

des <- model.matrix(~KRAS)
fit <- glmFit(y2, des)
lrt <- edgeR::glmLRT(fit)

 

edger r • 3.9k views
ADD COMMENT
0
Entering edit mode

Thanks! So what are the fitted values then?

ADD REPLY
0
Entering edit mode

Dear Gordon. What is the best way to get out cpm values when I use lib.size from my libraries and norm.factors from my spike-ins. Please see pipeline below: 

s=spike-ins

counts=miRNAs

 

 

s <- read.delim("exprs.txt", sep="\t", check.names=FALSE)

df.mdp <- read.delim("miRNAs_expressed_all_samples_ALL.csv", sep="\t", check.names=FALSE, stringsAsFactors=FALSE)
counts <- as.matrix(df.mdp[,grep("\\d\\d\\d", colnames(df.mdp))])

sids <- as.character(read.table("config_braf.csv", stringsAsFactors=FALSE)[,3])
colnames(counts) <- sids
colnames(counts.N) <- sids
s <- s[,sids]

group <- sids
group[grep("\\dR", sids)] <- "Normal"
group[grep("\\dG", sids)] <- "Tumor"

 


group <- factor(group)
patient <- factor(sapply(str_split(sids, 'R|G'), function(x) x[[1]]))


d <- edgeR::DGEList(counts = round(counts), group=group)
d.s <- edgeR::DGEList(counts = s)


d$samples$lib.size <- colSums(d$counts) # recalculate size factors

## add spike-ins
d$counts <- rbind(d$counts, d.s$counts)

## use lib sizes from mirna and norm factors from spike-ins

method <- "TMM" # c("TMM","RLE","upperquartile","none")
method <- "RLE"

y <- calcNormFactors(d, method=method)
y.s <- calcNormFactors(d.s, method=method)
plot(y$samples$lib.size, y.s$samples$lib.size)
y$samples$lib.size <- y.s$samples$lib.size
y2 <- estimateDisp(y, trend.method="loess")


des <- model.matrix(~patient + group)
fit <- glmFit(y2, des)
lrt <- edgeR::glmLRT(fit)
tab <- topTags(lrt, n=Inf)@.Data[[1]]

 

ADD REPLY
0
Entering edit mode

You've also posted this CPM normalization issue as two separate questions, at print matrix from EdgeR that are normalized by cpm and by spike-in and Detect global differences in miRNA expression between tumor and normal using spike-ins. Please try to avoid redundancy, as we'll end up answering the same thing multiple times.

ADD REPLY
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Let's look.

> example(glmLRT)
<snip>
glmLRT> d <- DGEList(y)
<snip>
glmLRT> # Likelihood ratio tests for trend
glmLRT> results <- glmLRT(fit, coef=2)

<snip>

> head(results$fitted.values)
             x0        x1        x2
gene.1 9.618113 29.449462 69.696630
gene.2 4.787029  5.444920  4.787005
gene.3 8.071594  7.879561  5.945545
gene.4 5.807543  6.605920  5.807928
gene.5 6.194186  5.686449  4.035013
gene.6 3.528931  5.335160  6.234458


> head(d$counts)
       x0 x1 x2
gene.1  6 44 55
gene.2  5  5  5
gene.3  7 10  5
gene.4  8  2  8
gene.5  5  8  3
gene.6  5  2  8

Does that answer your question?

ADD COMMENT
1
Entering edit mode

To clarify James' point; the fitted values are on the same scale as the raw counts themselves. This demonstrates that they are not CPM values. If you want to convert them to CPMs, you could do something like this:

cpm(results$fitted.values, lib.size=exp(getOffset(d)))

The getOffset call allows us to handle non-unity normalization factors.

ADD REPLY
0
Entering edit mode

Dear Aaron,

Considering your reply, which of the following options do you think would be the correct way to obtain normalized counts?

fit<-glmFit(y,design)

lrt<-glmLRT(fit, coef=14:2)

norm_counts <- cpm(lrt$fitted.values, lib.size = exp(getOffset(y)))

norm_counts2 <- cpm(lrt$fitted.values, normalized.lib.sizes = TRUE)

norm_counts3<- cpm(y, lib.size = exp(getOffset(y)))

norm_counts4 <- cpm(y, normalized.lib.sizes = TRUE)

 

Thank you very much! Is it ok to use then this normalized counts to proceed with clustering or is it better to use the log2 of the counts (using log = TRUE in the cpm function)?

Best,

Andrew

 

ADD REPLY
1
Entering edit mode

Using cpm is the recommended way to obtain normalized expression values for other (i.e., non-DB) analyses such as clustering and dimensionality reduction. Running it as you have done with norm_counts4 is the simplest approach, so you might as well go with that. In fact, normalized.lib.sizes=TRUE is the default anyway, so you can even save yourself some typing by leaving that argument out.

And yes, it is usually better to perform a log-transformation, see my comments here:

A: Robust transformation of raw RNA-seq counts for exploratory data analysis and hi

Note that we generally don't use the term "normalized counts" in edgeR, as this implies that you can treat the values as counts (and use them in GLM fitting, etc.). This may not be appropriate as the scaling will distort the mean-variance relationship when the library sizes are very different, e.g., a "normalized count" of 1 that was derived from an original count of 100 is much less variable than the same value derived from an original count of 1.

ADD REPLY
0
Entering edit mode

Dear Aaron,

Thank you very much for your quick reply! I will keep on going with this in mind.

Kind regards.

Andrew

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen just now
WEHI, Melbourne, Australia

The help page for glmLRT explains the meaning of all the output components.

For more mathematical detail, here's the paper that describes the methodology:

 http://nar.oxfordjournals.org/content/40/10/4288

The quantities that are called fitted.values in the output are denoted by the Greek letter mu in the article. Generally speaking, these quantities are for internal edgeR use and you do not need to manipulate them yourself in a practical analysis.

ADD COMMENT

Login before adding your answer.

Traffic: 509 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6