Question

Counts from recount3 for DESeq2 analysis

1

Entering edit mode

william.benit ▴ 10

@7a8c2d00

Last seen 2.9 years ago

France

Hi,

As the topic title suggests, I want to use RSE objects from recount3 for differential expression analysis with DESeq2. This is an ongoing project that I took over. Looking at what has already been done, there are two points I wonder about :

1 - I found that the recount3::transform_counts() function (described here) was sometimes applied on the recount3 RSE objects before creating the DESeqDataSets. In the DESeq2 vignette, it is indicated :

The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input

However, transform_counts does take into account the size of the library. Counts scaled with this function should not be used with DESEq2, right?

2 - Sometimes transform_counts was not applied and raw recount3 RSE object was given as input to DESeq2::DESeqDataSet(). I believe this is also an error because the counts directly provided by recount3 are raw base-pair coverage counts (as stated here) and not raw read counts requested by DESeq2 :

As input, the DESeq2 package expects count data as obtained, e.g., from RNA-seq or another high-throughput sequencing experiment, in the form of a matrix of integer values. The value in the i-th row and the j-th column of the matrix tells how many reads can be assigned to gene i in sample j.

In view of points 1- and 2-, I think I have to retrieve the read counts of RSE recount3 objects with the recount3::compute_read_counts() function (described here) then use these objects to create the DEseqDataSets without any prior normalization. Am I right?

Thank you for your consideration,

William

recount3 DESeq2 • 3.4k views

ADD COMMENT • link updated 2.6 years ago by Leonardo Collado Torres ★ 1.1k • written 3.0 years ago by william.benit ▴ 10

score 2 · Answer 1 · 2022-04-17

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

A few years back I took a look at the rse objects from recount and I wrote this function to go from basepair coverage to what I think would be good input to DESeq2:

https://github.com/biodatascience/compbio_src/blob/master/bioc/my_scale_counts.R

ADD COMMENT • link 3.0 years ago Michael Love 43k

2

Entering edit mode

Hi,

This question is related to recount counts in the example experiment from Mike a few years ago (which he alluded to). That led to the introduction in recount (from the _recount2 project) of a function to compute the read counts, for which we have an equivalent in recount3. Which @william.benit found at http://research.libd.org/recount3/reference/compute_read_counts.html. The exact code behind it is: https://github.com/LieberInstitute/recount3/blob/6eb14b844062ebdf45fe5a356577e3ea0483c97e/R/transform_counts.R#L351-L362.

It does the same as https://github.com/biodatascience/compbio_src/blob/master/bioc/my_scale_counts.R although it's tailored for recount3 objects that have an assayName() called raw_counts. We introduced that assayName in recount3 to highlight how the counts provided are raw base-pair coverage counts, not read counts.

Best, Leo

ADD REPLY • link 3.0 years ago Leonardo Collado Torres ★ 1.1k

1

Entering edit mode

@william.benit

If recount3 has an assay raw_counts, which deals with single-end / paired-end experiments and converts from the coverage to the count values, this is the preferred solution. Just use this assay.

ADD REPLY • link 3.0 years ago Michael Love 43k

0

Entering edit mode

Understood ! Thank you both for your help, I can continue this project now ! :)

ADD REPLY • link 3.0 years ago william.benit ▴ 10

0

Entering edit mode

Thank you Michael !!! I changed your function a bit based on my data:

my_scale_counts <- function(rse_gene, round=TRUE) {
  cts <- SummarizedExperiment::assay(rse_gene)
  # mapped_read_count includes the count for both reads of a pair
  # here, mapped_read_counts = recount_qc.star.all_mapped_reads_both
  paired <- 2
  x <- (colData(rse_gene)$recount_qc.star.all_mapped_reads_both / paired) / colSums(cts)
  assay(rse_gene) <- t(t(assay(rse_gene)) * x)
  if (round) {
    assay(rse_gene) <- round(assay(rse_gene))
  }
  rse_gene
}

I'm not sure I understood some objects, please, correct me if I'm wrong.

cts must be the count matrix of the recount rse object.
paired is originally a vector indicating which samples have been paired-end sequenced. I don't have a paired-end column in these rse. They originate from the GTEx and TCGA projects: I'm sure that those of the GTEx are paired-end and I read in an article that this is also the case for those of the TCGA. Also, in each colData of my rse, the "recount_qc.star.all_mapped_reads" column is equal to the "recount_qc.star.all_mapped_reads_both" column. That's why I put paired at 2.

And so, what kind of values do we have in the rse_gene assay now?

Thank you again for you help,

William

ADD REPLY • link 3.0 years ago william.benit ▴ 10

1

Entering edit mode

These seem to be reasonable changes.

The values in the original assay are coverage values which is roughly read/fragment counts * basepairs. We divide out the sum and multiply so the columns now add to the number of mapped fragments. These values should now correspond to fragment counts. They are _not_ corrected for sequencing depth.

ADD REPLY • link 3.0 years ago Michael Love 43k

1

Entering edit mode

@william.benit: I would encourage you to use http://research.libd.org/recount3/reference/compute_read_counts.html instead of introducing a new function.

ADD REPLY • link 3.0 years ago Leonardo Collado Torres ★ 1.1k

0

Entering edit mode

Understood ! Thank you both for your help, I can continue this project now ! :)

ADD REPLY • link 3.0 years ago william.benit ▴ 10

0

Entering edit mode

HI Michael Love ,

I get an error, whenever I want to run DESeqdataset and got this error: " Error in DESeqDataSet(rse_gene_SRP143467, design = ~tissue) : 'counts' assay is not the first in assays list, re-arrange so that 'counts' is first"

How can I rearrange the columns so I can run dds<- DESeqDataSet(rse_gene_SRP143467, design = ~ tissue)

I got my counts using compute read counts:

assay(rse_gene_SRP143467) <- compute_read_counts( rse_gene_SRP143467, round = TRUE, avg_mapped_read_length = "recount_qc.star.average_mapped_length" ) And have to this in assay names:

assayNames(rse_gene_SRP143467) [1] "raw_counts" "counts"

Thank you!

ADD REPLY • link 2.6 years ago lizeth.peralesv • 0

2

Entering edit mode

You can re-arrange assays like so:

assayNames(x)
assays(x) <- assays(x)[ idx ]

where idx is an indexing vector. E.g. it could be idx <- c("counts", ...)

ADD REPLY • link 2.6 years ago Michael Love 43k

0

Entering edit mode

Yup! Or you could even drop the raw_counts if you don't need them anymore with assays(x)$raw_counts <- NULL.

ADD REPLY • link 2.6 years ago Leonardo Collado Torres ★ 1.1k