Getting Top Scoring Pair Scores greater than 1
2
1
Entering edit mode
bishwa.slwl ▴ 10
@bishwaslwl-11764
Last seen 8.0 years ago

Hello,

I'm using switchBox to find different TSP of gene expressions predicting a diagnosis of prostate cancer in African American males and European American Males based on this paper and the dataset associated with it:
Website: https://www.ncbi.nlm.nih.gov/pubmed/18245496

The Link to the Dataset : https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6956

I'm using the GEOQuery package to load the data in R. After separating the expressionSet into African Americans and European Americans, I tried running SwitchBox (as well as ktsp) on the African American expression set. The TSP scores that I got for the five pairs were all greater than 1 (around 1.000003 each) while for the ktsp package, I get TSP Value of 1 for all pairs.  Since TSP scores can't be greater than 1, I'm confused as to what I'm doing  wrong here. Any help is appreciated. Thanks.

Here is the code that I use to calculate it. 

source("https://bioconductor.org/biocLite.R")

library(GEOquery)

library(switchBox)

datasets <- getGEO("GSE6956", GSEMatrix=TRUE)

gene_data = datasets[[1]]

AA_eset <- gene_data[, gene_data[["characteristics_ch1"]]=="race: African American"]

AA_label <-  as.numeric(AA_eset[["source_name_ch1"]]) - 1

classifier_AA = SWAP.KTSP.Train(exprs(AA_eset),phenoGroup = factor(AA_label))

Here are the results that I got:

TSPs
     [,1]          [,2]         
[1,] "220725_x_at" "208316_s_at"
[2,] "219024_at"   "213977_s_at"
[3,] "210479_s_at" "214755_at"  
[4,] "207516_at"   "215212_at"  
[5,] "37170_at"    "211815_s_at"

$score
[1] 1.000003 1.000003 1.000003 1.000002 1.000002

$labels
[1] "0" "1"

ktsp switchbox • 1.1k views
ADD COMMENT
0
Entering edit mode
marchion • 0
@marchion-7968
Last seen 8.1 years ago
United States/Baltimore/Johns Hopkins U…

Hello, 

The point is that the scores you get from switchBox are not just simple TSP scores. They are TSP scores + (secondary scores)/C where C is a big constant. The secondary score is to make sure we break the ties. This is based on this paper:

http://m.bioinformatics.oxfordjournals.org/content/21/20/3905.full.pdf

The secondary score in the paper is called delta_ij.

Let me know if this explains the issue.

Luigi and Bahman

ADD COMMENT
0
Entering edit mode
bishwa.slwl ▴ 10
@bishwaslwl-11764
Last seen 8.0 years ago

Hello Luigi and Bahman,
Thank you very much for your response. We found the paper to be insightful. Upon looking at it, we understand that the rank score helps to break ties between different gene pairs but it doesn't seem like that's happening by looking at our results since three of the scores are still all equal to 1.000003 while the remaining two have scores of 1.00002. Is it possible that the scores are rounded up to 6 decimal places so that even while they are numerically not the same, they just appear as such in the results or can they really have the same scores?

Here are the scores for the gene-pairs once again.

$score
[1] 1.000003 1.000003 1.000003 1.000002 1.000002

Thanks,
Bishwa, Susan and Vijay

ADD COMMENT

Login before adding your answer.

Traffic: 546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6