Entering edit mode
kent.riemondy
▴
20
@kentriemondy-14219
Last seen 6 months ago
Denver, University of Colorado Anschutz…
I'm working with Nanopore direct RNA-seq data which generates FASTQs with U bases. Is there a Bioconductor package that supports reading these files? I've tried using methods from ShortRead, but get errors due to the U bases. Thanks in advance.
suppressPackageStartupMessages(library(ShortRead))
fq <- tempfile()
u_fq_txt <- paste(c("@readid",
"UCGA",
"+",
"]]]]"),
collapse = "\n")
writeLines(u_fq_txt, fq)
strm <- FastqStreamer(fq)
yield(strm)
#> Error in x$yield(...): _DNAencode(): invalid DNAString input character: 'U' (byte value 85)
readFastq(fq)
#> Error: Input/Output
#> file(s):
#> /var/folders/r9/g3c47jrj40gc14d8qsqx7src0000gn/T//RtmpwZ8EOi/file4dee4320c214
#> message: invalid character '
t_fq_txt <- paste(c("@readid",
"TCGA",
"+",
"]]]]"),
collapse = "\n")
writeLines(t_fq_txt, fq)
strm <- FastqStreamer(fq)
yield(strm)
#> class: ShortReadQ
#> length: 1 reads; width: 4 cycles
readFastq(fq)
#> class: ShortReadQ
#> length: 1 reads; width: 4 cycles
unlink(fq)
Or alternatively
The original use case was to concatenate multiple fastqs into a single fastq, while also converting the U's to T's for compatibility with downstream tools. The streaming functionality of
FastqStreamer
seemed like a good approach to keep the memory usage low while converting each fastq. I could use unix tools ( e.g.cat
andawk
), but was curious to see how to do it using bioconductor tools.Your response helped point me in the right direction. I used a
BStringSet
initially to allow for Us to be converted to Ts, and subsequently coerced to aDNAStringSet
. I could then generate aShortReadQ
object and write records to disk withwriteFastq
. Probably not the most efficient but worked for my initial use case.Here's my approach, that allows for limiting the # of lines read at a time, in case it is useful to anyone.
Created on 2022-04-20 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)