import.bed fails on bed file with no scores
1
0
Entering edit mode
jmr • 0
@jmr-10063
Last seen 8.6 years ago

This file: UCSD.H1.H2AK5ac.SAK201.bed.gz

looks like this:

chr1    9942    10141    SOLEXA2_1:1:101:4024:16163    -
chr1    9988    10187    SOLEXA2_1:1:10:12241:10803    -
chr1    9992    10191    SOLEXA2_1:1:93:18918:18953    -
chr1    9997    10196    SOLEXA2_1:1:30:11903:16499    -

It doesn't have a scores column.  When I try to load it with

import.bed("UCSD.H1.H2AK5ac.SAK201.bed.gz")

I get:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'a real', got '-'

Is there a way to instruct import.bed to deal with missing scores?  And would the same options work with files that have scores?  The problem is that other files from the same source (e.g. UCSD.H1_BMP4_Derived_Mesendoderm_Cultured_Cells.H2AK5ac.AK126.bed.gz) do have scores, and I'd like to process them with the same instruction.  I just need the Ranges info.  I expected the format to be the same for every file on that site.

I'm quite new to R and to Bioconductor, so forgive my ignorance.  (I did try reading the help documents and searching the web.)

João Rodrigues

 

Edited: Fixed link to first file.

rtracklayer import bed files • 2.6k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 2.9 years ago
United States

This file strays pretty far from the standard by skipping a column, but I think you can at least get the range information by passing extraCols=c(strand="factor") to the import function. Effectively that is saying that the valid BED part stops at the name column, and that there is a strand column tacked onto the end. I'm not sure if the strand column will become the strand component on the GRanges, but it might.

ADD COMMENT
0
Entering edit mode

Thanks Michael, that really works.  It reads both files with no scores, as well as files with scores!  I really don't understand this function, but this seems to solve my problem.

It does, however, produce a warning when reading either of files I mentioned:

Warning message:
In `[<-.factor`(`*tmp*`, is.na(strand), value = "*") :
  invalid factor level, NA generated

It seems that this is caused by the files not having any "*" in the strand column, but the output seems fine to me.  Probably a bug?

ADD REPLY
0
Entering edit mode

Yea, it could be smarter. But I think this is already fixed in devel, which will be released soon.

ADD REPLY

Login before adding your answer.

Traffic: 810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6