Hi I have a RNAseq dataset that is a mix of related symbiont algae species, that change in proportions with the treatment. I would appreciate any advice on a method that I can use to normalize these rnaseq libraries based on the fraction of reads per sample. I already know the proportion of each species per sample based on the reads mapped to each species genome. For example, would it be possible to analyze the blue species in genotype O2, even though only 12% of the transcripts in the treatment (32) belong to this species, while its 100% in the control? These small libraries seem like outliers in a PCA, but not sure if it makes sense to run the analysis with this big differences in the libraries.
Thanks,
Catalina