Entering edit mode
Hi, rtracklayerers,
import.gff3 with asRangedData=TRUE passes a period through to the
strand of imported RangedData, however, calling it with
asRangedData=FALSE errors:
> gff.str<-"2L\tFlyBase\tgene\t7529\t9484\t0\t.\t0\tID=FBgn0031208;Nam
e=CG11023"
> import.gff3(textConnection(gff.str),asRangedData=TRUE)
RangedData with 1 row and 7 value columns across 1 space
space ranges | type source phase strand
ID Name score
<factor> <iranges> | <factor> <factor> <factor> <factor>
<character> <character> <numeric>
1 2L [7529, 9484] | gene FlyBase 0 NA
FBgn0031208 CG11023 0
> import.gff3(textConnection(gff.str),asRangedData=FALSE)
Error in strand(runValue(strand)) : strand values must be in '+' '-'
'*'
The GFF3 spec allows '.' (and '?') to appear as value of strand:
Column 7: "strand"
The strand of the feature. + for positive strand (relative to the
landmark), - for minus strand, and . for features that are not
stranded. In addition, ? can be used for features whose strandedness
is relevant, but unknown.
Arguably, import.gff{,2,3} should provide some control over
interpretation of '.' and '?' appearing in the strand column, allowing
it to comport with strand and GRanges
I propose the following as an intended backwards compatible fix.
New argument to import.gff{,2,3}
strandMap: control for mapping out-of-band values (FALSE,TRUE,a
string, a list), understood as follows
FALSE: the default - do not map out of band values to '*'
TRUE: map all out of band values to '*'
any 0 length character vector: map out of band values to it
(presumably it will be one of '*', '-','+'
a list: lookup how to map out of band values in the list by
name.
If it is agreed that this is the best resolution, and the rtracklayer
gods wish it, I will take this as my first opportunity to contribute
and will follow-up accordingly....
Else?
Cheers,
Malcolm