Question

Trouble using addMarkers from pRoloc on my own dataset

0

Entering edit mode

moldach ▴ 20

@moldach-8829

Last seen 4.8 years ago

Canada/Montreal/Douglas Mental Health I…

I'm having trouble adapting the pRoloc vignette to my own data set; specifically section 2.4 "pRoloc's organelle markers".

The vignette tell you to see ?addMarkers in R in order to add these to a new MSnSet class object.

First you need to specify the parameters for the available marker set obtained using pRolocmarkers function.

# Set parameters for Homo sapiens with Gene symbol identifier

>hsap <- pRolocmarkers("hsap")

# Next load my custom data a `MSnSet` S4 object using the readMSnSet2 function.

>df <- f <- "https://gist.githubusercontent.com/moldach/446852fcfa1adbb3be2ac754dc616421/raw/42f31e88f38afb7243d61dc46e2321e3ebfdae18/pRoloc-data"
>e <- readMSnSet2(df, ecol = 2:20)
>sampleNames(e)<- 1:19
>e$group <- c(rep("Treatment", 17),rep("Control",2))
>e$rep <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","1","2")

# Try to add the markers to the MSnSet object

>try(addMarkers(e,hsap))
Error in addMarkers(e, hsap) :
  No markers found. Are you sure that the feature names match?
  Feature names: 1, 2, 3...
  Markers names: P08865, P0CW22, P15880...

This doesn't seem to be the issue as these markers are in my data:

> grep("P08865", hap1$UNIPROT)
[1] 6370

proloc proteomics msnset • 1.9k views

ADD COMMENT • link 7.2 years ago moldach ▴ 20

score 0 · Answer 1 · 2018-01-29

0

Entering edit mode

Laurent Gatto 1.6k

@laurent-gatto-5645

Last seen 5 weeks ago

Belgium

You need to tell addMarkers how to match the proteins in the MSnSet and in the marker vector.

> ## named marker vector
> hsap <- pRolocmarkers("hsap")
> head(hsap)
        P08865         P0CW22         P15880         P22090         P23396
"40S Ribosome" "40S Ribosome" "40S Ribosome" "40S Ribosome" "40S Ribosome"
        P25398
"40S Ribosome"
> head(names(hsap)) ## names, used for matching proteins
[1] "P08865" "P0CW22" "P15880" "P22090" "P23396" "P25398"

> ## your data
> f <- "https://gist.githubusercontent.com/moldach/446852fcfa1adbb3be2ac754dc616421/raw/42f31e88f38afb7243d61dc46e2321e3ebfdae18/pRoloc-data"
> e <- readMSnSet2(f, ecol = 2:20, sep = "\t")
> head(featureNames(e)) ## typically used for matching markers
[1] "1" "2" "3" "4" "5" "6"

One would typically set the feature names to uniprot identifiers. But that won't be possible in your case because they aren't unique (and feature names must be unique).

> ## What you want are UNIPROT identifiers
> fvarLabels(e)
[1] "Gene_symbol" "UNIPROT"    
> ## BUT they aren't unique
> sum(duplicated(fData(e)$UNIPROT))
[1] 12

You can also specify a feature variable (by setting fcol) to be used to do the matching:

> e <- addMarkers(e, hsap, fcol = "UNIPROT")
Markers in data: 552 out of 12353
organelleMarkers
         40S Ribosome          60S Ribosome    actin cytoskeleton
                   30                    44                    35
              Cytosol Endoplasmic Reticulum       Golgi Apparatus
                   47                    63                    23
             Lysosome          Mitochondria               Nucleus
                    8                   109                   102
           PEROXISOME            plasma mem            Proteasome
                   17                    44                    30
              unknown
                11801

ADD COMMENT • link 7.2 years ago Laurent Gatto 1.6k

0

Entering edit mode

I used clusterProfiler to give Uniprot identifiers to my HGNC gene symbols.

# Working back from the data I provided just to show the original genes I had - there were only 6538 unique genes

f <- read.csv("https://gist.githubusercontent.com/moldach/446852fcfa1adbb3be2ac754dc616421/raw/42f31e88f38afb7243d61dc46e2321e3ebfdae18/pRoloc-data", sep="\t")
>genelist <- f$Gene_symbol
>genelist <- unique(genelist) # Get only the unique gene symbols
>nrow(genelist)
[1] 6538

# This is the code I used to get Uniprot identifiers with clusterProfiler:

>library(clusterProfiler)
>id <- as.character(genelist)
>id <- bitr(id, fromType = "SYMBOL", toType = "UNIPROT", OrgDb="org.Hs.eg.db")
> nrow(id)
[1] 12345

This assigned multiple Uniprot identifies to each gene symbol.

> head(id)
  SYMBOL    UNIPROT
1  VPS18 A0A024R9R3
2  VPS18     Q9P253
3   CNN2     B4DDF4
4   CNN2     Q99439
5   CNN2     B4DUT8
6 RPS27A     B2RDW1

Looking at VPS18 for example, it's assigned to multiple Uniprot identifiers because this gene makes different isoforms/protein products: A0A024R9R3 and Q9P253.

So if every gene is repeated it makes sense that I can't use these as feature names but I'm wondering why you have set the feature names from 1 to 12345 rather than using Uniprot identifiers because these are all unique?

ADD REPLY • link 7.2 years ago moldach ▴ 20

0

Entering edit mode

I'm also trying to make sense of the plot2D I get with my data.

>f <- "https://gist.githubusercontent.com/moldach/446852fcfa1adbb3be2ac754dc616421/raw/42f31e88f38afb7243d61dc46e2321e3ebfdae18/pRoloc-data"
>e <- readMSnSet2(f, ecol = 2:20, sep = "\t")
>hsap <- pRolocmarkers("hsap")
>e <- addMarkers(e, hsap, fcol = "UNIPROT")
>plot2D(e, fcol = "markers")
>addLegend(e, fcol = "markers", cex = .7,
          where = "bottomright", ncol = 2)

http://tinypic.com/r/2ce3vqa/9

Is this PCA telling me that the pRolocmarkers are not good organelle markers for my data, or that the sub-cellular niche isn't resolved in my data, or that my data is noisy, or maybe something else?

ADD REPLY • link 7.2 years ago moldach ▴ 20

0

Entering edit mode

Re PCA plot, the funny share of the points is because you have integers. You should start by removing proteins that have only 0 rows (and possibly those that have few and low values), then try to rescale between 0 and 1 (use normalise(e, method = "sum")).

But even with this pre-processing, I think you won't get a plot like those from typical experimental data. It's difficult to say much more without knowing more about what these values represent.

ADD REPLY • link 7.2 years ago Laurent Gatto 1.6k

0

Entering edit mode

You can use whatever you want as feature names. The default is to use indices, but you can also set then with readMSnSet2(..., fnames = "UNIPROT") - see ?readMSnSet2 for details about fnames.

You can also set the feature names later with

featureNames(e) <- fData(e)$UNIPROT

ADD REPLY • link 7.2 years ago Laurent Gatto 1.6k