Question

GAGE analysis with the whole or a subset of matrix

0

Entering edit mode

atakanekiz ▴ 30

@atakanekiz-15874

Last seen 11 months ago

Turkey

Hello Bioconductor community,

I'm using GAGE package to analyze single cell RNAseq data, and I ran into a situation, I can't quite figure out. I tried finding out some explanation online, but I wasn't very successful.

I prepared an expression matrix from individual cells and annotated the column names to be able to subset on these cells(samples) in the GAGE analysis. I have multiple time points and genotypes and up to 14 clusters in my dataset. The names look like this: day9_wt_Act_CD8_1/2/3.... day12_ko_neutrophils_1.

I tried running GAGE analysis by using two different approaches:

1) Feeding the whole expression dataset into the function and selecting appropriate column indices for reference and sample comparisons (e.g. ref = d9_wt_act_cd8 vs ref= d9_ko_act_cd8). In this case, there are numerous columns which aren't used in comparisons. I ran the comparison "

2) Subsetting the matrix into only the samples that I'm interested in comparing. In this case, when I select the reference samples, the rest of the dataset is used in my comparisons, and there are no samples(columns) which isn't included in the comparisons.

Between these two approaches, I got quite different gene sets and statistics. In the first approach (whole dataset as an input) comparing two subsets of data resulted in 8 genesets significantly (q<0.1) enriched. The second approach (trimmed expression matrix as an input to compare the same two subsets) resulted in 1 significantly enriched gene. I'm not sure which one to believe. Your insights are appreciated.

Thanks!

gage gage package gene expression matrix gene set analysis • 1.5k views

ADD COMMENT • link 6.8 years ago atakanekiz ▴ 30

0

Entering edit mode

moved my comment down as an answer

ADD REPLY • link 6.8 years ago atakanekiz ▴ 30

score 1 · Accepted Answer · 2018-07-17

1

Entering edit mode

atakanekiz ▴ 30

@atakanekiz-15874

Last seen 11 months ago

Turkey

I actually figured out the problem. There was a mistake in my code while resulting in erroneous output. I can now confirm that regardless of the input matrix size, as long as the column annotations (ie `ref` vs `samp` are done properly, GAGE gives the same output. Hopefully, this might help somebody, someday.

ADD COMMENT • link 6.8 years ago atakanekiz ▴ 30