I want to calculate something like a Intron retention ratio per sample using the data provided in recount : getting the junction quantifications for a certain region and dividing this value over the coverage (non split reads or total reads) of that region. I have thought on getting the junction counts with snapcount (I have done that for other projects before and it works very well). However I do not know how to get the coverage given that I read that recount2 has base pair information but not recount3 and it seems like it is not possible to just query a region to get the coverage across full projects like in snapcount.
I have thought that perhaps I should iteratively download all the projects in the big projects that I want to query (TCGA/GTEx) and then, in each one use recount::read_counts()
, subset for the regions that I am interested on, and finally collapse the projects. With this I should then proceed to merge my region counts with the junction counts and at the level of samples calculate the intron ratios.
I wonder if there is an easy way to get the coverage and junctions to perform this calculation.
Also, if I use recount2 for the counts, can I use "tcgav2" and "gtexv2" for snapcount? I think that the sample ids would not be compatible but I am not sure.