Importing a BED file with floating point scores?
2
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 10 weeks ago
Icahn School of Medicine at Mount Sinai…

I'm trying to import some of my BED files, and rtracklayer is choking because it expects an integer where my files have floating point values:

> x <- import(sprintf("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'an integer', got '16.25851'
> traceback()
8: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
       nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
       fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,
       multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,
       flush = flush, encoding = encoding, skipNul = skipNul)
7: read.table(con, colClasses = bedClasses, as.is = TRUE, na.strings = ".",
       comment.char = "")
6: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE,
       na.strings = ".", comment.char = ""))
5: .local(con, format, text, ...)
4: import(FileForFormat(con), ...)
3: import(FileForFormat(con), ...)
2: import(sprintf("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"))
1: import(sprintf("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"))
>

Here's what the first few lines of that file look like:

$ head "data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"
chr1    27510   30426   H3K4me3_peak_5  800     *       16.25851        80.02037        -1      1703
chr1    540149  541537  H3K4me3_peak_22 145     *       4.97981 14.53762        -1      426
chr1    713117  714018  H3K4me3_peak_28 1277    *       25.02645        127.71224       -1      770
chr1    714191  716308  H3K4me3_peak_29 602     *       14.99855        60.24956        -1      427
chr1    760783  762907  H3K4me3_peak_41 711     *       14.27445        71.18283        -1      1761
chr1    763044  765173  H3K4me3_peak_42 198     *       4.9907  19.84909        -1      987
chr1    776489  778099  H3K4me3_peak_51 358     *       9.35653 35.85113        -1      981
chr1    778950  780404  H3K4me3_peak_53 233     *       7.24799 23.38787        -1      1133
chr1    892835  894617  H3K4me3_peak_72 410     *       9.88536 41.03654        -1      1206
chr1    894804  897069  H3K4me3_peak_73 224     *       8.19048 22.41068        -1      101

Is it possible that rtracklayer could be modified to accept floating point values for the relevant columns?

rtracklayer bed • 5.9k views
ADD COMMENT
4
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

This looks like a narrowPeaks file, not a conventional BED file. For those, you need to use the extraCols argument. See ?import.bed.

ADD COMMENT
1
Entering edit mode

I'm trying out the extraCols argument, and it doesn't seem to be working, although it seems to encounter an error on a different element of the first row:

extraCols_narrowPeak <- c(signalValue = "numeric", pValue = "numeric",
                          qValue = "numeric", peak = "integer")
import.narrowPeak <- function(..., ) {
    import(..., format="BED", extraCols=extraCols_narrowPeak)
}

> x <- import.narrowPeak("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  (from #2) :
  scan() expected 'an integer', got '80.02037'
ADD REPLY
0
Entering edit mode

Where can I download that file?

ADD REPLY
1
Entering edit mode

I figured out the problem. My function signature above has an extra comma, which somehow resulted in this non sequitur error. After removing the comma, the above function works as expected.

ADD REPLY
0
Entering edit mode

Hi Ryan, I encounter the same your problem, could you please tell me where is the extra comma causing the problem and how it should be?

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'an integer', got '3.27363'

thank you

ADD REPLY
1
Entering edit mode

Ryan has not been seen for some time, and this thread is almost 5 years old. You may consider creating a new question, and also providing a minimal reproducible example for others.

ADD REPLY
0
Entering edit mode

The third line is

import.narrowPeak <- function(..., ) {

but should be

import.narrowPeak <- function(...) {
ADD REPLY
0
Entering edit mode

Yes, these are derived from narrowPeak MACS2 output files. Good to know that rtracklayer has support for them.

ADD REPLY
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 10 weeks ago
Icahn School of Medicine at Mount Sinai…

Actually, looking at the format for bed files, it looks like only the first 6 columns of these files correspond to bed columns, and the rest are specific to the application. Unfortunately, it still calls them bed files. Columns 7 and 8 are supposed to be integer genomic positions ("thick start" and "thick end", respectively), but they are used for something else in this file.

ADD COMMENT

Login before adding your answer.

Traffic: 438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6