Hello,
I've been trying to run GSVA and have come across the warning message below. I actually have no genes where all samples have constant expression values. It looks like this warning is being caused by genes with partial NA values (missing data for a subset samples), even though the actual row variance > 0. The NA values are a result of combining multiple different microarray and RNA-seq datasets. Any gene that has one or more NA value is being removed from the analysis, some of which overlap with gene sets I am testing, therefore I would like to avoid this if possible.
Is there any ways to perform GSVA with these partially NA genes included in the analysis? Or any alternative suggestions for how best to perform this analysis?
Thanks in advance for the help!
> gsva_res <- gsva(expr = rna_mat, gset.idx.list = gene_set_list)
Estimating GSVA scores for 10 gene sets.
Estimating ECDFs with Gaussian kernels
|===============================================================================================| 100%
Warning messages:
1: In .filterFeatures(expr, method) :
11204 genes with constant expression values throuhgout the samples.
2: In .filterFeatures(expr, method) :
Since argument method!="ssgsea", genes with constant expression values are discarded.
> # how many genes have more than 1 NA value
> table(rowSums(is.na(rna_mat)) > 0)
FALSE TRUE
12066 11204
I was a little worried about discarding so many genes, although sounds like this can't be helped. The results are still quite useful so I think I will not go down the imputation route. Cheers!