Hi everyone,
I'm new to RNAseq analysis and I have a general question about how to treat genes in DESeq2. Initially I estimated transcript counts in salmon and then imported (tximport) them into DESeq2. Transcripts were converted to genes using the EnsDb.Hsapiens.v86 database, gene counts were estimated and normalized and those became the units (rows) of my DESeq2 analyses. But many of those genes belong to subfamilies of the same gene family or motifs of the same gene, etc. Hence my question is, when would be better, if ever, to group genes by family or motif or any other higher genomic hierarchy? I can see how grouping genes and increasing counts per gene may be statistically beneficial (e.g., less variance, less low-count genes), but is it biologically correct? Any thoughts, guidance, links to previous comments, etc would be highly appreciated.
Then assuming you want to combine genes from the same family or motif and analyze them in DESeq2, how do you do that?
Best regards
Marcos
If you followed a proper pipeline, all the software knows that there are repetitive elements in genes and transcripts and it's handling the best way to properly count them all.