Entering edit mode
Hello.
I was kindly asking how I can convert "Majority protein IDs" to "Gene names" like so:
Majority.protein.IDs
A0AV96-2;B7Z8Z7;A0AV96;D6R9D6;D6RBS9
A0AVT1;A0AVT1-2
A1L0T0;M0R026
A1XBS5-5;E5RHK0;F8W7P5;E5RGD0;H0YC32;A1XBS5-2;A1XBS5-4;A1XBS5-3;A1XBS5
Q99798;A2A274
A2A2M0;Q9NQG5
Q9Y312;A2A2Q9
A2A2V2;P42696;Q5TCT4;P42696-2
A2A2Z9;Q9H560
A6PW58;A2A5X0;Q99755-2;Q99755-4;Q99755-3;A6PW57;Q99755
O15533-2;O15533;D3YTI9;A2AB90;O15533-3;C9JA35;O15533-4
A2IDC6;Q4TT38;Q13084
Q13887-2;A2TJX0;Q13887
A3KFJ0;O14965;Q5QPD4;A3KFJ1;Q5QPD2
A3KMH1-3;A3KMH1;A3KMH1-2
Gene names
RBM47
UBA6
ILVBL
FAM92A1
ACO2
RPRD1B
AAR2
RBM34
ANKRD18B;ANKRD19P
PIP5K1A
TAPBP
MRPL28
KLF5
AURKA
VWA8
My actual specific question is on how I can convert the list below to its appropriate Gene name:
tr|A0A4V3YUP9|A0A4V3YUP9_ECOLI;sp|P67660|YHAJ_ECOLI
tr|A0A6D2WI03|A0A6D2WI03_ECOLI;sp|P0ADT8|YGIM_ECOLI
tr|A0A4S5AVI8|A0A4S5AVI8_ECOLI
tr|A0A4S5AXW9|A0A4S5AXW9_ECOLI;sp|P06715|GSHR_ECOLI
tr|A0A4S5AR26|A0A4S5AR26_ECOLI;sp|P25746|HFLD_ECOLI
tr|A0A4S5AX73|A0A4S5AX73_ECOLI;sp|P0ADA3|NLPD_ECOLI
tr|A0A4S5B017|A0A4S5B017_ECOLI
tr|A0A4S5B5Y8|A0A4S5B5Y8_ECOLI;sp|P0AEN8|FUCM_ECOLI
tr|A0A6D2XCX9|A0A6D2XCX9_ECOLI;sp|P0A8C4|YGFB_ECOLI
tr|A0A6D2XI58|A0A6D2XI58_ECOLI;sp|P0ABI8|CYOB_ECOLI
tr|A0A6D2X748|A0A6D2X748_ECOLI;sp|P0A972|CSPE_ECOLI
tr|A0A6D2W544|A0A6D2W544_ECOLI;sp|P30130|FIMD_ECOLI
tr|A0A6D2W7D6|A0A6D2W7D6_ECOLI;sp|P0AFP4|YBBO_ECOLI
tr|A0A4S4P6R5|A0A4S4P6R5_ECOLI
sp|P0AGE6|CHRR_ECOLI;tr|A0A6D2WS16|A0A6D2WS16_ECOLI
tr|A0A6D2WPV4|A0A6D2WPV4_ECOLI;sp|P0A8J4|YBED_ECOLI
tr|A0A4S5APJ5|A0A4S5APJ5_ECOLI;sp|P27306|STHA_ECOLI
Thank you James. I am not sure which is the ideal option to use on under "Select Option" from the link you proposed. Maybe you can suggest me the appropriate options to use on the site, if possible?
You have UniProtKB IDs and so far as I can tell you want Gene Names, so that's what I would use.
If for example I plaste this header on the site, I get an error. In particular, suppose I have 1000 rows, how can I make this switch to gene names, using the site you proposed.
If for example I plaste this header on the site, I get an error. In particular, suppose I have 1000 rows, how can I make this switch to gene names, using the site you proposed.
This thing:
Isn't an ID! It's a set of identifiers separated by vertical bars and a semi-colon, with the intent that the reader will understand that. The first thing there is
Which indicates that it's a TrEMBL ID (tr), the ID being A0A4V3YUP9, and the name(?) being A0A4V3YUP9_ECOLI, which is just the ID concatenated with the species. So if you were to go to uniprot.org and search on that ID you would get this.
The semi-colon separates the first annotation from the second, which is
Which indicates that it's a SwissProt (sp) ID, the ID being P67660, and the name being YHAJ_ECOLI. Which you can see here.
So what I said you could do was to get an ID for each row (either the TrEMBL or SwissProt) and then paste it into the query box on the UniProt site I pointed you to. I made the assumption that you would understand that what I really meant was that you would have to extract the relevant ID from each row and use that, rather than just copy/pasting the whole thing, which as you have noted doesn't work.
How you would get the relevant ID is up to you. I would probably use some combination of
strsplit
andsapply
, but I'm old school like that. I'll leave it up to you to figure out how to do that, which is the best way to learn how to do anyway.I see. Thank you very much for the feedback, I noticed on R I could
str_split
, as you suggested is very useful. Much appreciated.