From this link remove X and Y chromosome genes in RNA-seq data using DESeq2 pipeline , I have learned that depending on context, it is perfectly valid to remove X and Y chromosomal genes in RNA-seq data before doing differential expression analysis. However I only have access to the count matrix and not the bam files of the RNA seq data I am analysing. Is it advisable to do the X and Y chromosome gene removal at the level of the count matrix? I am talking about removing all rows that contain X and Y chromosomal genes from the .csv count matrix file read in to R before proceeding to create the DESeq2 object.
Thank you in advance for your kind response.
I'm less comfortable with removing something like a majority or ~50% of genes, but as long as MA plots look reasonable, and it's not violating some known biology (e.g. all remaining features are DE), I can see this being statistically valid.