Dear SPIA package maintainer,
My name is Franklin Johnson. I think of myself as a data scientist, and I have no professional affiliation at this moment with regard to this post.
I downloaded the 134 Malus domestica ('mdm') pathways using the KEGGREST package and successfully made the spia data using the makeSPIAdata() function. Then, I called the spia() function, and after it completed it only showed 10 pathways that were 'Done.'
> length(dir(mydir)) #Directory containing 134 Malus kgml/xml pathway files.
[1] 134
My spia input vectors 'de' and 'all' contained 243 unique probesets, and I took the mean logFC for each probeset. This dataset was a subset of the topTable limma result of adj.P.Value < 0.001.
res<-spia(de=DE_malus, all=entrez_only, organism="mdm", nB = 2000, pathids = NULL, data.dir="./", combine = 'fisher', plots = TRUE)
Done pathway 1 : RNA transport..
Done pathway 2 : RNA degradation..
Done pathway 3 : MAPK signaling pathway - plant..
Done pathway 4 : Plant hormone signal transduct..
Done pathway 5 : Sulfur relay system..
Done pathway 6 : SNARE interactions in vesicula..
Done pathway 7 : Autophagy..
Done pathway 8 : Protein processing in endoplas..
Done pathway 9 : Plant-pathogen interaction..
Done pathway 10 : Circadian rhythm - plant..>
res[ , -12]
Name ID pSize NDE pNDE tA pPERT pG pGFdr pGFWER Status
1 MAPK signaling pathway - plant 04016 4 4 1 24.29737166 0.160 0.4532130 1 1 Activated
2 Plant hormone signal transduction 04075 14 14 1 11.93279398 0.292 0.6514524 1 1 Activated
3 Circadian rhythm - plant 04712 3 3 1 9.17881852 0.440 0.8012314 1 1 Activated
4 Plant-pathogen interaction 04626 2 2 1 0.02234003 0.987 0.9999151 1 1 Activated
5 Protein processing in endoplasmic reticulum 04141 1 1 1 0.00000000 NA 1.0000000 1 1 Inhibited
While these results are exciting, I have four questions to make sure this output is correct:
1) What is the printed 'Done' pathway list, and why does it print only 10 pathways rather than all 134 pathways?
2) For res[5, ] (Protein processing in endoplasmic reticulum), why is this pathway in the list although pPERT = 'NA.' So, how come the other 'Done' pathways (e.g. RNA transport, RNA degradation, Autophagy....) do not show up in the res output? Is this Protein processing in endoplasmic reticulum pathway considered significant if tA = 0 and pG = 1?
3) I didn't expect pSize and NDE to be equal to each other, and so all the pNDEs are equal to 1...Why?
4) The fold change values for the 243 probe dataset range from +3.25 to -4.38. Since I'm starting with gene probesets with an adj.P.Value < 0.001, I lowered the nB value to say 100, but the results are identical as if I used nB =2000. Why?
I'm trying to understand this SPIA package in a bit more detail than explained in the vignette and ref manual. Hope to hear from you soon. Thanks, Franklin
Hi Adi,
Thanks for the reply. I made the adjustments accordingly. However, I'm still getting an 'NA' for the pPERT value for Protein processing in the endoplasmic reticulum, which is now ranked as the top pathway. Please see below:
> head(cp_mean) #df of unique DEs with Q.Value < 0.05
ENTREZ logFC Q.Value
1 13630213 1.8062437 0.02197336
2 29071792 1.0768734 0.03514726
3 103400014 -0.9435437 0.01468803
4 103400027 -0.7126018 0.03260799
5 103400049 -1.0686249 0.01136842
6 103400377 0.6382776 0.02896627
> dim(cp_mean)
[1] 1628 3
> length(DE_CP) #vector of unique DEs with Q.Value < 0.01 [1] 233
> length(Entrez_CP) #vector of all Entrez IDs with Q.Value < 0.05 from 'cp_mean'
[1] 1628
> res[,-12]
Name ID pSize NDE pNDE tA pPERT pG pGFdr pGFWER Status
1 Protein processing in ...ER 04141 11 4 0.05938292 0.000000 NA 0.05938292 0.2375317 0.2375317 Inhibited
2 Plant-pathogen interaction 04626 15 3 0.36566850 -1.026967 0.721 0.61512645 0.9532347 1.0000000 Inhibited
3 Plant hormone signal transduction 04075 34 4 0.73974950 -1.475442 0.682 0.84967883 0.9532347 1.0000000 Inhibited
4 RNA degradation 03018 8 1 0.71019445 0.000000 1.000 0.95323469 0.9532347 1.0000000 Inhibited
I adjusted the name to "Protein processing in ... ER" just now so that it fit this screen in one line.
What's going on with this pathway's pPERT value??
Franklin