Hi, i am using edgeR and i want to print the reads normalized with the TMM method but i have not found the command. Is there a command in edgeR that could help me?
Thank you
Riccardo
Hi, i am using edgeR and i want to print the reads normalized with the TMM method but i have not found the command. Is there a command in edgeR that could help me?
Thank you
Riccardo
You can't normalize reads, because that doesn't really make any sense. You can, however, adjust read counts to obtain normalized expression values. I suggest you have a look at ?calcNormFactors
and ?cpm
. I won't regurgitate the documentation here; suffice to say you run calcNormFactors
to get a DGEList
, and then run cpm
on that DGEList
to get a matrix of normalized expression values.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you, I meant to normalize the expression values. If I multiply the cpm values by a million would i get the TMM values?
I don't think it's clear what you are asking for. Let's assume that
y
is yourDGEList
with your count data, which you already calledcalcNormFactors
on.Are you after the TMM normalization factors? These are stored in your
y$samples$norm.factorsÂ
column.Do you just want a gene expression matrix from your data, normalized by a "simple" per-million factor? Call
cpm(y, normalized.lib.sizes=FALSE)
But you probably don't want that.
If you're after gene expression normalized by sequencing depth (adjusted by TMM factors), just call
cpm(y)
as Aaron has already suggested.If you aren't after any of these three things, can you please explain in more detail what you want?
I would to compare the normalized counts of DESeq2 with edgeR. In order to do this have i to use
calcNormFactors
and then cpm?This would be tricky, as the values returned by
cpm
are on a per-million scale, while - if I remember correctly - the values from DESeq2 are something on the scale of the original counts. This makes it difficult to compare the normalized values directly between methods. To me, such a comparison doesn't seem to have any purpose. If you just want to compare normalization strategies, you can simply compare the size factors from DESeq2 with the effective library sizes (lib.size*norm.factor
) from edgeR. If you want to compare the effect of the normalization strategies, then you should have a downstream analysis in mind (e.g., PCA, clustering). For most of these downstream analyses, you're comparing between samples in the same data set so the scale of the normalized expression from each method shouldn't matter.Ok, thank you. If I would to do a clustering analysis
cpm(y, normalized.lib.sizes=TRUE) is correct?
Gordon and co. typically suggest that you pass in a value between 3-5 for prior.count in your call to cpm, as well.
Depending on what clustering you're doing, you'll probably want to set
log=TRUE
as well. This stabilizes the variance between genes of different abundance. Otherwise, high-abundance genes with correspondingly large variances would dominate the calculations, e.g., for Euclidean distances. This probably wouldn't be helpful, as you'd end up clustering on the measurement error/random variability of constitutive housekeeping genes rather than on interesting biological differences in genes that are expressed at lower abundances.Hi Aaron, may I ask a relevant question here? If I just do:
And then conduct downstream analysis. Does this procedure make sense? After downstream analysis, can I say that I applied TMM method for normalisation?
If the above procedure does not make sense, how about the following procedure:
Thank you very much!
Best to start a new post for a new question.