Hi there,
When performing a meta-analysis, is it sufficient to start with the published read count files or is better to perform your own gene alignment with all the sample you plan to use?
Hi there,
When performing a meta-analysis, is it sufficient to start with the published read count files or is better to perform your own gene alignment with all the sample you plan to use?
This is a general bioinformatics question, rather than a question about Bioconductor software. It is best asked on a more general forum like Biostars.
Having said that, here's my two cents. It is much easier to use the published count tables, and I do it all the time in my workflows. However, it's not completely stress-free if the studies have used different genome builds (mm9 vs mm10), annotations (Entrez or Ensembl), or different versions of the same annotation scheme. The main difficulty is trying to map features from one study to another, and deciding what to do with features that don't have 1:1 mappings, i.e., multiple or no genes in one study mapping to one gene in another. And what happens if the authors didn't include annotation for genes of interest (e.g., lncRNAs)? Well, that's too bad.
You can avoid these issues by remapping and recounting everything with a single genome/annotation. Of course, this is a lot of logistical effort - you have to download all the FASTQ files, get some HPC time to align the reads, figure out which FASTQ files are technical replicates and which are actual biological replicates, etc. Whether this is worth it in the general case is difficult to answer, but at the end you'll have some nice count matrices with the same feature set that are easy to compare.
Remapping also avoids differences due to the choice of aligner, which can have an effect on the counts. In practice, you will have batch effects anyway (due to biological and experimental differences) so variability in aligner behaviour is probably the least of your concerns.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.