Question

import.bed fails on bed file with no scores

0

Entering edit mode

jmr • 0

@jmr-10063

Last seen 8.9 years ago

This file: UCSD.H1.H2AK5ac.SAK201.bed.gz

looks like this:

chr1   9942   10141   SOLEXA2_1:1:101:4024:16163   -
chr1   9988   10187   SOLEXA2_1:1:10:12241:10803   -
chr1   9992   10191   SOLEXA2_1:1:93:18918:18953   -
chr1   9997   10196   SOLEXA2_1:1:30:11903:16499   -

It doesn't have a scores column. When I try to load it with

import.bed("UCSD.H1.H2AK5ac.SAK201.bed.gz")

I get:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'a real', got '-'

Is there a way to instruct import.bed to deal with missing scores? And would the same options work with files that have scores? The problem is that other files from the same source (e.g. UCSD.H1_BMP4_Derived_Mesendoderm_Cultured_Cells.H2AK5ac.AK126.bed.gz) do have scores, and I'd like to process them with the same instruction. I just need the Ranges info. I expected the format to be the same for every file on that site.

I'm quite new to R and to Bioconductor, so forgive my ignorance. (I did try reading the help documents and searching the web.)

João Rodrigues

Edited: Fixed link to first file.

rtracklayer import bed files • 2.8k views

ADD COMMENT • link 8.9 years ago jmr • 0

score 2 · Accepted Answer · 2016-04-07

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.3 years ago

United States

This file strays pretty far from the standard by skipping a column, but I think you can at least get the range information by passing extraCols=c(strand="factor") to the import function. Effectively that is saying that the valid BED part stops at the name column, and that there is a strand column tacked onto the end. I'm not sure if the strand column will become the strand component on the GRanges, but it might.

ADD COMMENT • link 8.9 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks Michael, that really works. It reads both files with no scores, as well as files with scores! I really don't understand this function, but this seems to solve my problem.

It does, however, produce a warning when reading either of files I mentioned:

Warning message:
In `[<-.factor`(`*tmp*`, is.na(strand), value = "*") :
  invalid factor level, NA generated

It seems that this is caused by the files not having any "*" in the strand column, but the output seems fine to me. Probably a bug?

ADD REPLY • link 8.9 years ago jmr • 0

0

Entering edit mode

Yea, it could be smarter. But I think this is already fixed in devel, which will be released soon.

ADD REPLY • link 8.9 years ago Michael Lawrence ★ 11k