I've analysed shotgun proteomics data using MSGFplus, MSnBase and MSnID. I now have a final combined MSnExp. When running MSGFplus, I used uniprot data which brings in accession numbers etc. Seperatly, i've used uniprot to download EntrezID, symbol and gene names and i would like to connect the two. Is there a logical way to do this or do i need to change the data? Any help would be appreciated.
It would be useful to have some additional details on what you have done. I assume you have raw data that you read into R using readMSData. I also suspect you have identification data resulting from running MSGF+ (in this case through the Bioconductor package MSGFplus). Have you run addIdentificationData?
Additional matching from uniprot to EntrezID, ... (that you downloaded from the UniProt webapge) might need to be done manually, using dplyr::left_join and fData()<-, for example.
If you provide more details, and the code you ran, I might be able to help a bit more.
Sorry for the vagueness of my answer. This is extremely new to me. I can confirm that i did download some annotations from the uniprot website but they are not as informative as i would like.
I'm unsure of how to add the code i ran so i have just added it in text here. I've been trawling through the net trying to figure this out but i'm completely stuck. This is the code i have used but do i need to provide you with a minimum dataset?
Thank you for the code snippet. This looks reasonable to me. If I follow, you would like to add additional metadata from UniProt. If so, you'll need to have a column in that data. that matches the database accession numbers you used to combine the features. You can use dplyr::left_join to join the feature metadata fData and your additional data, that I assume is in a data.frame and is called uniprot below:
library("dplyr")
fd <- left_join(fData(si), uniprot)
## update the feature data
fData(si) <- fd
You'll have to adapt the left_join(fData(si), uniprot) call to match the column used to match the two tables. For example if uniprot also has a DatabaseAccess column, you would
fd <- left_join(fData(si), uniprot, by = "DatabaseAccess")
or even simply
fd <- left_join(fData(si), uniprot)
if that's the only column they share. See ?left_join for details.
Thank you so much!!! This worked beautifully and gave me all the information that is required for some downstream analysis. I've never come across anything that alludes to this. Its so simply and yet brilliant.
Sorry for the vagueness of my answer. This is extremely new to me. I can confirm that i did download some annotations from the uniprot website but they are not as informative as i would like.
I'm unsure of how to add the code i ran so i have just added it in text here. I've been trawling through the net trying to figure this out but i'm completely stuck. This is the code i have used but do i need to provide you with a minimum dataset?
Any help would be very much appreciated.
Thank you for the code snippet. This looks reasonable to me. If I follow, you would like to add additional metadata from UniProt. If so, you'll need to have a column in that data. that matches the database accession numbers you used to combine the features. You can use
dplyr::left_join
to join the feature metadatafData
and your additional data, that I assume is in adata.frame
and is calleduniprot
below:You'll have to adapt the
left_join(fData(si), uniprot)
call to match the column used to match the two tables. For example ifuniprot
also has aDatabaseAccess
column, you wouldor even simply
if that's the only column they share. See
?left_join
for details.Hope this helps.
Thank you so much!!! This worked beautifully and gave me all the information that is required for some downstream analysis. I've never come across anything that alludes to this. Its so simply and yet brilliant.