Hi,
As the topic title suggests, I want to use RSE objects from recount3 for differential expression analysis with DESeq2. This is an ongoing project that I took over. Looking at what has already been done, there are two points I wonder about :
1 - I found that the recount3::transform_counts() function (described here) was sometimes applied on the recount3 RSE objects before creating the DESeqDataSets. In the DESeq2 vignette, it is indicated :
The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input
However, transform_counts does take into account the size of the library. Counts scaled with this function should not be used with DESEq2, right?
2 - Sometimes transform_counts was not applied and raw recount3 RSE object was given as input to DESeq2::DESeqDataSet(). I believe this is also an error because the counts directly provided by recount3 are raw base-pair coverage counts (as stated here) and not raw read counts requested by DESeq2 :
As input, the DESeq2 package expects count data as obtained, e.g., from RNA-seq or another high-throughput sequencing experiment, in the form of a matrix of integer values. The value in the i-th row and the j-th column of the matrix tells how many reads can be assigned to gene i in sample j.
In view of points 1- and 2-, I think I have to retrieve the read counts of RSE recount3 objects with the recount3::compute_read_counts() function (described here) then use these objects to create the DEseqDataSets without any prior normalization. Am I right?
Thank you for your consideration,
William
Hi,
This question is related to recount counts in the example experiment from Mike a few years ago (which he alluded to). That led to the introduction in
recount
(from the _recount2 project) of a function to compute the read counts, for which we have an equivalent inrecount3
. Which @william.benit found at http://research.libd.org/recount3/reference/compute_read_counts.html. The exact code behind it is: https://github.com/LieberInstitute/recount3/blob/6eb14b844062ebdf45fe5a356577e3ea0483c97e/R/transform_counts.R#L351-L362.It does the same as https://github.com/biodatascience/compbio_src/blob/master/bioc/my_scale_counts.R although it's tailored for
recount3
objects that have anassayName()
calledraw_counts
. We introduced thatassayName
inrecount3
to highlight how the counts provided are raw base-pair coverage counts, not read counts.Best, Leo
@william.benit
If recount3 has an assay
raw_counts
, which deals with single-end / paired-end experiments and converts from the coverage to the count values, this is the preferred solution. Just use this assay.Understood ! Thank you both for your help, I can continue this project now ! :)
Thank you Michael !!! I changed your function a bit based on my data:
I'm not sure I understood some objects, please, correct me if I'm wrong.
And so, what kind of values do we have in the rse_gene assay now?
Thank you again for you help,
William
These seem to be reasonable changes.
The values in the original assay are coverage values which is roughly read/fragment counts * basepairs. We divide out the sum and multiply so the columns now add to the number of mapped fragments. These values should now correspond to fragment counts. They are _not_ corrected for sequencing depth.
@william.benit: I would encourage you to use http://research.libd.org/recount3/reference/compute_read_counts.html instead of introducing a new function.
Understood ! Thank you both for your help, I can continue this project now ! :)
HI Michael Love ,
I get an error, whenever I want to run DESeqdataset and got this error: " Error in DESeqDataSet(rse_gene_SRP143467, design = ~tissue) : 'counts' assay is not the first in assays list, re-arrange so that 'counts' is first"
How can I rearrange the columns so I can run dds<- DESeqDataSet(rse_gene_SRP143467, design = ~ tissue)
I got my counts using compute read counts:
assay(rse_gene_SRP143467) <- compute_read_counts( rse_gene_SRP143467, round = TRUE, avg_mapped_read_length = "recount_qc.star.average_mapped_length" ) And have to this in assay names:
assayNames(rse_gene_SRP143467) [1] "raw_counts" "counts"
Thank you!
You can re-arrange assays like so:
where
idx
is an indexing vector. E.g. it could beidx <- c("counts", ...)
Yup! Or you could even drop the
raw_counts
if you don't need them anymore withassays(x)$raw_counts <- NULL
.