Question

Handle zero effective gene length when tximport RSEM results

1

Entering edit mode

hanc ▴ 10

@caohan7659-20165

Last seen 20 months ago

Hong Kong

Hi,

I am using tximport to combine the gene-level quantification and further normalize it using edgeR's TMM normalization. The code I used is from https://bioconductor.org/packages/devel/bioc/vignettes/tximport/inst/doc/tximport.html#edgeR

cts <- txi$counts
normMat <- txi$length

# Obtaining per-observation scaling factors for length, adjusted to avoid
# changing the magnitude of the counts.
normMat <- normMat/exp(rowMeans(log(normMat)))
normCts <- cts/normMat

# RSEM produce 0 effective length

# Computing effective library sizes from scaled counts, to account for
# composition biases between samples.
eff.lib <- calcNormFactors(normCts) * colSums(normCts)

# Combining effective library sizes with the length factors, and calculating
# offsets for a log-link GLM.
normMat <- sweep(normMat, 2, eff.lib, "*")
normMat <- log(normMat)

# Creating a DGEList object for use in edgeR.
y <- DGEList(cts)
y <- scaleOffset(y, normMat)

y <- y[keep_rows, ]
cpms <- edgeR::cpm(y, offset = y$offset, log = FALSE)

However, as RSEM output zero effective gene length if it < 1 and result in error of calcNormFactors, shall I re-set those values to 1 in the gene length matrix before calcNormFactors?

cts <- txi$counts
normMat <- txi$length
normMat[normMat==0] <- 1

tximport edgeR • 1.8k views

ADD COMMENT • link updated 20 months ago by Michael Love 43k • written 20 months ago by hanc ▴ 10

score 1 · Answer 1 · 2023-08-29

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

You could import the RSEM data at transcript level instead of gene level, so use tx2gene.

I believe RSEM transcript-level quant -> tximport shouldn't produce effective gene length of zero.

ADD COMMENT • link 20 months ago Michael Love 43k

score 0 · Answer 2 · 2023-08-21

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 3 days ago

San Diego

I think you want to remove genes with an effective length of zero.

I typicaly do this after importing

txi.rsem$abundance <-
  txi.rsem$abundance[apply(txi.rsem$length,
                                1,
                                function(row) all(row !=0 )),]

txi.rsem$counts <-
  txi.rsem$counts[apply(txi.rsem$length,
                             1,
                             function(row) all(row !=0 )),]

txi.rsem$length <-
  txi.rsem$length[apply(txi.rsem$length,
                             1,
                             function(row) all(row !=0 )),]

ADD COMMENT • link 20 months ago swbarnes2 ★ 1.4k

0

Entering edit mode

Thanks a lot for your reply. Could you explain more for removing those genes at the beginning? My understanding is that those genes are shorter than fragments and thereby only 1 possible sequencing starting point. So, I reset the effective length to 1 to estimate library size. I will then remove those genes in downstream analysis.