plotting proteases digestion points in a protein
1
0
Entering edit mode
Assa Yeroslaviz ★ 1.5k
@assa-yeroslaviz-1597
Last seen 11 days ago
Germany

Hi all,

I'm working now on a method to try and plot a protein sequence after digested with one (or more) restriction enzyme(s).

the goal is to use a specific cleavage points and cut the protein sequence into snippets. After filtering the too long and too shots snippets, I would like to plot the remained peptides and get a table of the coverage.

Below I have attached a part of code for one pattern (trypsin cleaves the protein after a K) as well as the image i would like to get. 

Before I reinvent the wheel, I was wondering if there is already some other method of doing it and if so, please let me know. 

Thanks

Assa

 

protein1 <- readAAStringSet(file="Protein1.fasta")
protein1 <- toString(protein1)
hits <- matchPattern("K", protein1)
subseqs <- IRanges(PartitioningByEnd(end(hits)))
subseqs.filt <- subseqs[width(subseqs) >=6 & width(subseqs) <= 30 ]
plotRanges(subseqs.filt)

AAcovered <- sum(as.data.frame(subseqs.filt)$width)
AApeptides<-  length(as.data.frame(subseqs.filt)$width)
write.table(x = as.data.frame(subseqs.filt), file = "Protein1.txt", sep="\t", quote = F )
paste0("This gives a coverage ratio of ", format(AAcovered * 100 / hits@subject@length, trim = FALSE, digits=4 ), "%" )

 

 

biostrings iranges sequence alignment protein • 1.6k views
ADD COMMENT
1
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States

You should look into the Gviz package for plotting, instead of using the simple function from the IRanges vignette.

For tabulating the coverage, it's not exactly clear what you mean, but your current approach seems basically fine. I'm curious though as to why you used the width() accessor in the first part of the code, and then switched to as.data.frame()$width later, and then resorted to direct slot access. Or why you converted the AAStringSet to a character vector before matching, which would just turn it back into an XStringSet.

ADD COMMENT
0
Entering edit mode

thanks for the Gviz recommendation. I'll look into it. 

For your comments, I'll try to answer the second remark first.

I'm converting the AAStringSet into String, as i can't seem to be able to use matchPattern on a StringSet. Is there a better way of doing it? As for the first remark, because i am converting it to a string, i need to get the length of it back again. this is though only saved deep down in the hit object, which is why i am calling it like that.

I have though another problem. This is the end of my protein sequence : ...LQGRRRQPRGKKKVERN

because I am using the PartitioningByEnd overlap I miss the last four amino acids in my IRanges object.

> hits
  Views on a 1834-letter BString subject
subject: MKLSVNEAQLGFYLGSLSHLSACPGIDPRSSED...DTHCITFTTAATRREKLQGRRRQPRGKKKVERN
views:
     start  end width
 [1]     2    2     1 [K]
 [2]    39   39     1 [K]
 [3]    52   52     1 [K]
 [4]    63   63     1 [K]
 [5]    64   64     1 [K]
 ...   ...  ...   ... ...
[86]  1769 1769     1 [K]
[87]  1817 1817     1 [K]
[88]  1828 1828     1 [K]
[89]  1829 1829     1 [K]
[90]  1830 1830     1 [K]

> subseqs
IRanges object with 90 ranges and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
   [1]         1         2         2
   [2]         3        39        37
   [3]        40        52        13
   [4]        53        63        11
   [5]        64        64         1
   ...       ...       ...       ...
  [86]      1764      1769         6
  [87]      1770      1817        48
  [88]      1818      1828        11
  [89]      1829      1829         1
  [90]      1830      1830         1

Is there a way to paste the last piece of the protein (VERN) back to my Ranges object?

so that it has 91 row and the last is

  [91]      1831      1834         4

thanks

Assa

 

ADD REPLY
1
Entering edit mode

The AAStringSet consists of individual AAString objects. You can pull one out with protein1[[1]] and call matchPattern() on that. The width() accessor will get you the length of the string; no need to dig down deep into the object. I think the last issue is now answered by another thread.

ADD REPLY
0
Entering edit mode

thanks for that. I think this covers that question. As for the second part, I still have some other problem, when the pattern in question should be the beginning and not the end pf the snippet looked for (A: convert a sequence to Ranges object).

ADD REPLY

Login before adding your answer.

Traffic: 663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6