Program to calculate amino acid composition for a protein sequence in r, using fasta file multiple sequences
1
0
Entering edit mode
@pujapatel5400-8292
Last seen 9.4 years ago
India

>HMPREF9352_0001 rod shape-determining protein MreC [Streptococcus gallolyticus subsp. gallolyticus TX20005]
MSLAFLFRNSGVVSAISSPIRSVVARVDSVVSAPFRFLDSANEEIRDLFNTYSENKELKQ
KVAELEDQSELIDSLKEENEELNSEIGASSSITSQFSATGKVIVRSPVSWYDSLTVKLGK
KNNITKKMLALSGGGLIGTVSDVDSTTSSITLLSNGSDFNIPIKITTSSAEVYGLLESYD
SDKKCFVITNLNSSVDIEEGDSVVTSGLDGDTVANISVGTVSSVKNSSESLERVVYVTST
ADFSDISYVTIVGD
>HMPREF9352_0002 rod shape-determining protein MreD [Streptococcus gallolyticus subsp. gallolyticus TX20005]
MIKVKFYKNKYFLLLLLFLLMLIDGQLSFLASSIFSYHLKVSSHLLLLAVLYFYHDKNKY
FMFISSLVLGGIFDIYYLNRIGLVIFLLPILVIFTSKISKNFFVSNFQTLIFYIIVLFLF
EIVGELGAILLGMTTMSMTYFIAYCFAPTLIYNILMYLIFQKVFKKVFLES
>HMPREF9352_0003 CHAP domain protein [Streptococcus gallolyticus subsp. gallolyticus TX20005]
MKKRILSAVLVSGVTLGTAAATVNADDYDTQIAAQDAVISNLTSEQAAAQSQVDALQEQV
TSLQSQQDELEAQNAQLEAESQKLSEEIQALSSKIVARNESLKKQARSAQKTNTATSYIN
TILNSKSISDAINRVAAVREVVSANEKMLEQQEADKAAIEQKQAENQEAINTVAANKATI
EQNQAALATQQAELEAAQLNLSAQLATAEDEKASLVAQKEAAEQAAAEAAAAQAAAEAQA
QAEAEAQAASVAQAQESVENGTATVDTTTDTSSQDSTTASTDTAAATEDTSSTQQAATVT
PTATTTTSSSSSSSSASSSSSSSSSASTSSTASTSTSSSSSSSSSSSSVNTYPVGQCTWG
VKSLASWVGNNWGNANQWIASAQAAGHSVGTTPQVGAVAVWPYDGGGYGHVAYVTAVQSS
TSIQVMEANYAGNSSIGNYRGWFDPTSSTWGGGTVYYIYQ
>HMPREF9352_0004 ribose-phosphate diphosphokinase [Streptococcus gallolyticus subsp. gallolyticus TX20005]
MSYSDLKLFALSSNKELAEKVASAMGIELGKSTVRQFSDGEIQVNIEESIRGHHVFILQS
TSSPVNDNLMEILIMVDALKRASAEKISVVIPYYGYARQDRKARSREPITSKLVANMLEV
AGVDRLLTVDLHAAQIQGFFDIPVDHLMGAPLIADYFDRHGLVGDDVVVVSPDHGGVTRA
RKLAQFLQTPIAIIDKRRSVTKMNTSEVMNIIGNVKGKKCILIDDMIDTAGTICHAADAL
AEAGATAVYASCTHPVLSGPALENIEKSAIQKLVVLDTIYLSEERLIDKIEQISIAELIA
EAITRIHEKRPLSPLFEMGTAK
>HMPREF9352_0005 putative aromatic-amino-acid transaminase [Streptococcus gallolyticus subsp. gallolyticus TX20005]
MSLTNRFNKNLDKIEVSLIRQFDQSISDVPGIMKLTLGEPDFTTPDHVKEAAKAAIDANQ
SHYTGMAGLPALRQAAADFVKSKYNLSYNPDNEILVTIGATEALSATLTAILEPGDTVLL
PAPAYPGYEPIANLVGAEIVEIDTTANDFVLTPEMLEKAILEQGDKLKAVLLNYPTNPTG
VTYSREQIKALADVLKKYDIFVISDEVYSELTYNDEPHVSIAEYLPEQTILINGLSKSHA
MTGWRIGLIFAPAIFTAQLIKSHQYLVTAAATMAQFAAIEALSAGKDDALPMKVEYIKRR
DYIIDKMSALGFKIIKPDGAFYIFAKIPAGYEQDSFKFCQDFAREKAVAFIPGVAFGKYG
EGYLRLSYAASMETITTAMERLKEFMEEHAN
>HMPREF9352_0006 DNA repair protein RecO [Streptococcus gallolyticus subsp. gallolyticus TX20005]
MQTKETYGLVLYNRNYREDDKLVKIFTETNGKHMFFVKHAGKSRFNSVIQPLTVAKFILK
INDTGLSFIEDYKEVDSFKEINADLFKLSYASYVTALADAAVPDGVADPQLFAFVNKTLS
LMEEGLDYEILTNIFEIQLLERFGVSLNFHECAFCHRVGLPFDFSHKYSGLLCPEHYGKD
DYRSHLDPNVLYLVDRFQAIHFDELKTISVKPEMKRKLRLFIDDIYDNYVGLRLKSKKFI
DDLGTWGNIMK

 

aminoacid count composition of protein multiple sequences • 4.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

Say that your FASTA file is a file called, say, tmp.fa.

> library(Biostrings)

> z <- readAAStringSet("tmp.fa")
> z
  A AAStringSet instance of length 6
    width seq                                               names               
[1]   254 MSLAFLFRNSGVVSAISSPIRSV...ERVVYVTSTADFSDISYVTIVGD HMPREF9352_0001 r...
[2]   171 MIKVKFYKNKYFLLLLLFLLMLI...TLIYNILMYLIFQKVFKKVFLES HMPREF9352_0002 r...
[3]   460 MKKRILSAVLVSGVTLGTAAATV...NYRGWFDPTSSTWGGGTVYYIYQ HMPREF9352_0003 C...
[4]   322 MSYSDLKLFALSSNKELAEKVAS...AEAITRIHEKRPLSPLFEMGTAK HMPREF9352_0004 r...
[5]   391 MSLTNRFNKNLDKIEVSLIRQFD...AASMETITTAMERLKEFMEEHAN HMPREF9352_0005 p...
[6]   251 MQTKETYGLVLYNRNYREDDKLV...YVGLRLKSKKFIDDLGTWGNIMK HMPREF9352_0006 D...
> alphabetFrequency(z)
      A  R  N  D C  Q  E  G H  I  L  K  M  F  P  S  T W  Y  V U O B J Z X * - +
[1,] 12  7 14 18 1  3 18 15 0 18 21 14  2  9  4 45 18 1  6 28 0 0 0 0 0 0 0 0 0
[2,]  5  1  6  3 1  3  3  7 3 20 33 12  7 22  2 13  6 0 13 11 0 0 0 0 0 0 0 0 0
[3,] 87  6 23 16 1 45 31 20 2 16 20 15  3  1  5 73 47 7 11 31 0 0 0 0 0 0 0 0 0
[4,] 33 16  9 21 3 10 21 18 9 32 30 19 10  8 11 24 15 0  7 26 0 0 0 0 0 0 0 0 0
[5,] 49 10 14 23 1 12 28 21 6 32 36 26 11 18 20 20 25 1 19 19 0 0 0 0 0 0 0 0 0
[6,] 12 10 12 21 3  5 15 13 8 15 29 22  5 20  7 13 11 1 13 16 0 0 0 0 0 0 0 0 0
     . other
[1,] 0     0
[2,] 0     0
[3,] 0     0
[4,] 0     0
[5,] 0     0
[6,] 0     0
>

 

Does that help?

ADD COMMENT

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6