Hi all,
I've posted this on biostars alos, as I m not sure it is appropriate on here. Let me know if this quesiton needs removing
About the data: I have 5 tissues, over 100 samples , and 2 variables of interest: RFI (High, Low) and Trial (1, 2). The trial variable is basically a surrogate for genotype, as the main difference between trials is the genotype of the animals. All samples were collected, and then processed in the lab by the same person. I don't know the sex of each animal (but that can be obtained from the data with a bit of work). I have no other batch information.
My question: I don't want to apply sva to model a hidden batch until I am confident there actually is a hidden batch. The problem is, I need guidance to know what evidence of a hidden batch looks like. I have read that hidden batches should be evident after exploratory data analysis. For clarification, I'm showing plenty of EDA images here to help my own understanding of replies.
Thank you all in advance, Kenneth
Exploratory Data Analysis results PCA separates intestinal tissues from liver, and kidney very well, with 1 outlier that has now been removed but there is no clear separation between Ileum and Jejunum even when intestinal tissues are plotted without liver and muscle:
Within individual tissues, PCAs are showing some clustering by variables of interest but I don;t see any extra groups, or groups of samples sitting way off by themselves (which I think would be evidence of a hidden batch effect):
The heatmaps however are where I need a bit of guidance. Duodenal tissue is clustering weakly by trial, but Ileum, Jejunum, and Muscle show strong clusters not attributable to the variables of interest. Can I consider this evidence of a hidden batch in those tissues or could they just be biological signal that is stronger than the variables of interest? Should I use sva on these tissues or not?