Hello Bioconductor community,
I'm using GAGE package to analyze single cell RNAseq data, and I ran into a situation, I can't quite figure out. I tried finding out some explanation online, but I wasn't very successful.
I prepared an expression matrix from individual cells and annotated the column names to be able to subset on these cells(samples) in the GAGE analysis. I have multiple time points and genotypes and up to 14 clusters in my dataset. The names look like this: day9_wt_Act_CD8_1/2/3.... day12_ko_neutrophils_1.
I tried running GAGE analysis by using two different approaches:
1) Feeding the whole expression dataset into the function and selecting appropriate column indices for reference and sample comparisons (e.g. ref = d9_wt_act_cd8 vs ref= d9_ko_act_cd8). In this case, there are numerous columns which aren't used in comparisons. I ran the comparison "
2) Subsetting the matrix into only the samples that I'm interested in comparing. In this case, when I select the reference samples, the rest of the dataset is used in my comparisons, and there are no samples(columns) which isn't included in the comparisons.
Between these two approaches, I got quite different gene sets and statistics. In the first approach (whole dataset as an input) comparing two subsets of data resulted in 8 genesets significantly (q<0.1) enriched. The second approach (trimmed expression matrix as an input to compare the same two subsets) resulted in 1 significantly enriched gene. I'm not sure which one to believe. Your insights are appreciated.
Thanks!
moved my comment down as an answer