I'm having trouble adapting the pRoloc vignette to my own data set; specifically section 2.4 "pRoloc's organelle markers".
The vignette tell you to see ?addMarkers
in R in order to add these to a new MSnSet
class object.
First you need to specify the parameters for the available marker set obtained using pRolocmarkers
function.
# Set parameters for Homo sapiens with Gene symbol identifier
>hsap <- pRolocmarkers("hsap")
# Next load my custom data a `MSnSet` S4 object using the readMSnSet2
function.
>df <- f <- "https://gist.githubusercontent.com/moldach/446852fcfa1adbb3be2ac754dc616421/raw/42f31e88f38afb7243d61dc46e2321e3ebfdae18/pRoloc-data" >e <- readMSnSet2(df, ecol = 2:20) >sampleNames(e)<- 1:19 >e$group <- c(rep("Treatment", 17),rep("Control",2)) >e$rep <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","1","2")
# Try to add the markers to the MSnSet
object
>try(addMarkers(e,hsap)) Error in addMarkers(e, hsap) : No markers found. Are you sure that the feature names match? Feature names: 1, 2, 3... Markers names: P08865, P0CW22, P15880...
This doesn't seem to be the issue as these markers are in my data:
> grep("P08865", hap1$UNIPROT) [1] 6370
I used clusterProfiler to give Uniprot identifiers to my HGNC gene symbols.
# Working back from the data I provided just to show the original genes I had - there were only 6538 unique genes
# This is the code I used to get Uniprot identifiers with clusterProfiler:
This assigned multiple Uniprot identifies to each gene symbol.
Looking at VPS18 for example, it's assigned to multiple Uniprot identifiers because this gene makes different isoforms/protein products: A0A024R9R3 and Q9P253.
So if every gene is repeated it makes sense that I can't use these as feature names but I'm wondering why you have set the feature names from 1 to 12345 rather than using Uniprot identifiers because these are all unique?
I'm also trying to make sense of the
plot2D
I get with my data.http://tinypic.com/r/2ce3vqa/9
Is this PCA telling me that the
pRolocmarkers
are not good organelle markers for my data, or that the sub-cellular niche isn't resolved in my data, or that my data is noisy, or maybe something else?Re PCA plot, the funny share of the points is because you have integers. You should start by removing proteins that have only 0 rows (and possibly those that have few and low values), then try to rescale between 0 and 1 (
use normalise(e, method = "sum")
).But even with this pre-processing, I think you won't get a plot like those from typical experimental data. It's difficult to say much more without knowing more about what these values represent.
You can use whatever you want as feature names. The default is to use indices, but you can also set then with
readMSnSet2(..., fnames = "UNIPROT")
- see?readMSnSet2
for details aboutfnames
.You can also set the feature names later with