Entering edit mode
Ulrike Goebel
▴
10
@ulrike-goebel-6255
Last seen 10.4 years ago
Dear list,
I have a question regarding the behavior of function processReads in
package nucleR.
Assume a RangedData object 'tmp':
>head(tmp,n=4)
RangedData with 6 rows and 1 value column across 1 space
space ranges | strand
<factor> <iranges> | <character>
1 Chr1 [ 4, 58] | -
2 Chr1 [ 7, 61] | -
3 Chr1 [ 9, 63] | -
4 Chr1 [10, 55] | +
of single-end read coordinates, then
>rd_150_40 <- processReads(tmp,type="single",fragmentLen=150,trim=40)
;
yields
>head(rd_150_40[1],n=4)
space ranges |
<factor> <iranges> |
1 Chr1 [-51, -12] |
2 Chr1 [-48, -9] |
3 Chr1 [-46, -7] |
4 Chr1 [ 65, 104] |
The processed coordinates of the (+) read conform to the protocol
"extend the read in 5'->3' direction to a length of 150 bp, then
extract the window from position 55 to 95 of the extended read". I
understand that this is the expected behavior (trim to the 40bp middle
window of the read after extension). Obviously, what the function does
is to shift the start of a read by 55bp (to the left in case of a (-)
read, and to the right in case of a (+) read), and then extract the
40 bp window starting at the start coordinate of the shifted read:
> head(start(tmp)-start(rd_150_40))
[1] 55 55 55 -55 55 55
> unique(start(tmp)-start(rd_150_40))
[1] 55 -55
For (+) reads, this selects the 40bp middle window as described above.
For (-) reads, however, I think that rather the 40 bp window *ending
at the end of the shifted read* should be extracted ? Otherwise, the
location of the window is displaced towards the 3' end of the
(oriented) extended read by an amount depending on the original read
length (the length before extension), rather than being at a distance
of 55bp from position 1 (which is the *last* position of a (-) read).
I was just wondering whether this behaviour of processReads is
intended. Sorry if I missed something obvious !
With best regards
Ulrike Goebel
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] nucleR_1.6.0 ShortRead_1.16.4 latticeExtra_0.6-26
[4] RColorBrewer_1.0-5 Rsamtools_1.10.2 lattice_0.20-10
[7] Biostrings_2.26.3 GenomicRanges_1.10.7 Biobase_2.18.0
[10] IRanges_1.16.6 BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 grid_2.15.2 hwriter_1.3 stats4_2.15.2
tools_2.15.2
[6] zlibbioc_1.4.0
[[alternative HTML version deleted]]