writeFastq to gzip compressed files
1
0
Entering edit mode
Marcus Davy ▴ 390
@marcus-davy-5153
Last seen 6.7 years ago
Hi, as fastq files are often compressed now, I would like to be able to write out to compressed 'fq.gz' files, rather than raw text. I have searched the BioC threads and found a similar topic at; http://article.gmane.org/gmane.science.biology.informatics.conductor/4 5306/match=writeFastq A poor example of what I would like to be able to achieve follows; example(readFastq) print(rfq) fout <- "test.fq" writeFastq(rfq, fout) system(paste("gzip", fout)) but this does not compress the data as it is streaming out to file. I would like to be able to do something like below, so this is a potential feature request; fout <- "test.fq.gz" writeFastq(rfq, fout, compress=TRUE) Is there an obvious way to compress using writeFastq that I am missing? cheers, Marcus sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] C attached base packages: [1] parallel splines stats graphics grDevices utils datasets [8] methods base other attached packages: [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 [7] GenomicRanges_1.12.2 IRanges_1.18.0 BiocGenerics_0.6.0 [10] BiocInstaller_1.10.3 limma_3.16.7 Hmisc_3.12-2 [13] Formula_1.1-1 survival_2.37-4 [[alternative HTML version deleted]]
• 2.2k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
Ideally, it would be: writeFastq(rfq, gzfile(fout)) But I don't think that works. In theory, it would be feasible to support this now that R has exposed its connection interface... Michael On Wed, Sep 25, 2013 at 7:39 AM, Marcus Davy <mdavy86@gmail.com> wrote: > Hi, > as fastq files are often compressed now, I would like to be able to write > out to compressed 'fq.gz' files, rather than raw text. > > I have searched the BioC threads and found a similar topic at; > > > http://article.gmane.org/gmane.science.biology.informatics.conductor /45306/match=writeFastq > > A poor example of what I would like to be able to achieve follows; > > example(readFastq) > print(rfq) > fout <- "test.fq" > writeFastq(rfq, fout) > system(paste("gzip", fout)) > > but this does not compress the data as it is streaming out to file. I would > like to be able to do something like below, so this is a potential feature > request; > > fout <- "test.fq.gz" > writeFastq(rfq, fout, compress=TRUE) > > Is there an obvious way to compress using writeFastq that I am missing? > > > cheers, > > Marcus > > > sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] C > > attached base packages: > [1] parallel splines stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 > [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 > [7] GenomicRanges_1.12.2 IRanges_1.18.0 BiocGenerics_0.6.0 > [10] BiocInstaller_1.10.3 limma_3.16.7 Hmisc_3.12-2 > [13] Formula_1.1-1 survival_2.37-4 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 09/25/2013 09:09 AM, Michael Lawrence wrote: > Ideally, it would be: > writeFastq(rfq, gzfile(fout)) > > But I don't think that works. In theory, it would be feasible to support > this now that R has exposed its connection interface... That'll be an interesting learning experience, and I'll come up with a proper implementation. Here's a work-around, With writeFastq_con(rfq, stdout()) yy = writeFastq_con(rfq, gzfile(tempfile())) readFastq(yy) Martin .write1fastq <- function(x, con, ..., full=FALSE) { i <- paste0("@", as.character(id(x)), "\n") fill <- if (full) paste0("\n+", as.character(id(x)), "\n") else "\n+\n" s <- as.character(sread(x)) q <- as.character(quality(quality(x))) cat(paste(i, s, fill, q, sep=""), file=con, sep="\n") length(i) } writeFastq_con <- function(object, file, ..., chunkSize=1000000L) ## object: ShortReadQ instance ## file: connection, e.g., stdout(), gzfile(<...>, "wb") ## ...: additional arguments, in particular ## full=TRUE: logical(1), TRUE: replicate id tag on '+' line ## chunkSize: size of each chunk processed for output, smaller ## values use less memory but are slower { if (!isOpen(file)) { open(file, "wb") on.exit(close(file)) } len <- length(object) start <- 1L tot <- 0L while (start < len) { end <- min(start + chunkSize - 1L, len) tot <- tot + .write1fastq(object[seq(start, end)], file, ...) start <- end + 1L } if (tot != len) warning("output length does not equal input length") summary(file)$description } > > Michael > > > On Wed, Sep 25, 2013 at 7:39 AM, Marcus Davy <mdavy86 at="" gmail.com=""> wrote: > >> Hi, >> as fastq files are often compressed now, I would like to be able to write >> out to compressed 'fq.gz' files, rather than raw text. >> >> I have searched the BioC threads and found a similar topic at; >> >> >> http://article.gmane.org/gmane.science.biology.informatics.conducto r/45306/match=writeFastq >> >> A poor example of what I would like to be able to achieve follows; >> >> example(readFastq) >> print(rfq) >> fout <- "test.fq" >> writeFastq(rfq, fout) >> system(paste("gzip", fout)) >> >> but this does not compress the data as it is streaming out to file. I would >> like to be able to do something like below, so this is a potential feature >> request; >> >> fout <- "test.fq.gz" >> writeFastq(rfq, fout, compress=TRUE) >> >> Is there an obvious way to compress using writeFastq that I am missing? >> >> >> cheers, >> >> Marcus >> >> >> sessionInfo() >> R version 3.0.0 (2013-04-03) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] parallel splines stats graphics grDevices utils datasets >> [8] methods base >> >> other attached packages: >> [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 >> [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 >> [7] GenomicRanges_1.12.2 IRanges_1.18.0 BiocGenerics_0.6.0 >> [10] BiocInstaller_1.10.3 limma_3.16.7 Hmisc_3.12-2 >> [13] Formula_1.1-1 survival_2.37-4 >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Thanks for the responses, Martin I will give your work-around a try. I did also try out the following but in both cases they failed, and only produced a zero byte sized file (equivalent to bash 'touch' utility on the command line) using OS X. fout <- gzfile("test.fq.gz", "wb") print(fout) writeFastq(rfq, fout) fout2 <- gzcon(file("test.fq.gz", "wb")) print(fout2) writeFastq(rfq, fout) cheers, Marcus On Thu, Sep 26, 2013 at 6:01 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 09/25/2013 09:09 AM, Michael Lawrence wrote: > >> Ideally, it would be: >> writeFastq(rfq, gzfile(fout)) >> >> But I don't think that works. In theory, it would be feasible to support >> this now that R has exposed its connection interface... >> > > That'll be an interesting learning experience, and I'll come up with a > proper implementation. Here's a work-around, With > > writeFastq_con(rfq, stdout()) > yy = writeFastq_con(rfq, gzfile(tempfile())) > readFastq(yy) > > Martin > > .write1fastq <- > function(x, con, ..., full=FALSE) > { > i <- paste0("@", as.character(id(x)), "\n") > fill <- if (full) > paste0("\n+", as.character(id(x)), "\n") > else "\n+\n" > s <- as.character(sread(x)) > q <- as.character(quality(quality(**x))) > cat(paste(i, s, fill, q, sep=""), file=con, sep="\n") > length(i) > } > > writeFastq_con <- > function(object, file, ..., chunkSize=1000000L) > ## object: ShortReadQ instance > ## file: connection, e.g., stdout(), gzfile(<...>, "wb") > ## ...: additional arguments, in particular > ## full=TRUE: logical(1), TRUE: replicate id tag on '+' line > ## chunkSize: size of each chunk processed for output, smaller > ## values use less memory but are slower > > { > if (!isOpen(file)) { > open(file, "wb") > on.exit(close(file)) > } > > len <- length(object) > start <- 1L > tot <- 0L > while (start < len) { > end <- min(start + chunkSize - 1L, len) > tot <- tot + .write1fastq(object[seq(start, end)], file, ...) > start <- end + 1L > } > if (tot != len) > warning("output length does not equal input length") > > summary(file)$description > > } > > >> Michael >> >> >> On Wed, Sep 25, 2013 at 7:39 AM, Marcus Davy <mdavy86@gmail.com> wrote: >> >> Hi, >>> as fastq files are often compressed now, I would like to be able to write >>> out to compressed 'fq.gz' files, rather than raw text. >>> >>> I have searched the BioC threads and found a similar topic at; >>> >>> >>> http://article.gmane.org/**gmane.science.biology.** >>> informatics.conductor/45306/**match=writeFastq<http: article.gman="" e.org="" gmane.science.biology.informatics.conductor="" 45306="" match="writeFas" tq=""> >>> >>> A poor example of what I would like to be able to achieve follows; >>> >>> example(readFastq) >>> print(rfq) >>> fout <- "test.fq" >>> writeFastq(rfq, fout) >>> system(paste("gzip", fout)) >>> >>> but this does not compress the data as it is streaming out to file. I >>> would >>> like to be able to do something like below, so this is a potential >>> feature >>> request; >>> >>> fout <- "test.fq.gz" >>> writeFastq(rfq, fout, compress=TRUE) >>> >>> Is there an obvious way to compress using writeFastq that I am missing? >>> >>> >>> cheers, >>> >>> Marcus >>> >>> >>> sessionInfo() >>> R version 3.0.0 (2013-04-03) >>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] parallel splines stats graphics grDevices utils datasets >>> [8] methods base >>> >>> other attached packages: >>> [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 >>> [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 >>> [7] GenomicRanges_1.12.2 IRanges_1.18.0 BiocGenerics_0.6.0 >>> [10] BiocInstaller_1.10.3 limma_3.16.7 Hmisc_3.12-2 >>> [13] Formula_1.1-1 survival_2.37-4 >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> [[alternative HTML version deleted]] >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
I believe gzfile only applies when reading. With gzcon, you may have to flush or close the connection (not sure which one - I think your fout2), or your whole session, to get the file written. On Sep 25, 2013, at 4:07 PM, Marcus Davy wrote: > Thanks for the responses, Martin I will give your work-around a try. > > > I did also try out the following but in both cases they failed, and only > produced a zero byte sized file (equivalent to bash 'touch' utility on the > command line) using OS X. > > fout <- gzfile("test.fq.gz", "wb") > print(fout) > writeFastq(rfq, fout) > > fout2 <- gzcon(file("test.fq.gz", "wb")) > print(fout2) > writeFastq(rfq, fout) > > > cheers, > > > Marcus > > > On Thu, Sep 26, 2013 at 6:01 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > >> On 09/25/2013 09:09 AM, Michael Lawrence wrote: >> >>> Ideally, it would be: >>> writeFastq(rfq, gzfile(fout)) >>> >>> But I don't think that works. In theory, it would be feasible to support >>> this now that R has exposed its connection interface... >>> >> >> That'll be an interesting learning experience, and I'll come up with a >> proper implementation. Here's a work-around, With >> >> writeFastq_con(rfq, stdout()) >> yy = writeFastq_con(rfq, gzfile(tempfile())) >> readFastq(yy) >> >> Martin >> >> .write1fastq <- >> function(x, con, ..., full=FALSE) >> { >> i <- paste0("@", as.character(id(x)), "\n") >> fill <- if (full) >> paste0("\n+", as.character(id(x)), "\n") >> else "\n+\n" >> s <- as.character(sread(x)) >> q <- as.character(quality(quality(**x))) >> cat(paste(i, s, fill, q, sep=""), file=con, sep="\n") >> length(i) >> } >> >> writeFastq_con <- >> function(object, file, ..., chunkSize=1000000L) >> ## object: ShortReadQ instance >> ## file: connection, e.g., stdout(), gzfile(<...>, "wb") >> ## ...: additional arguments, in particular >> ## full=TRUE: logical(1), TRUE: replicate id tag on '+' line >> ## chunkSize: size of each chunk processed for output, smaller >> ## values use less memory but are slower >> >> { >> if (!isOpen(file)) { >> open(file, "wb") >> on.exit(close(file)) >> } >> >> len <- length(object) >> start <- 1L >> tot <- 0L >> while (start < len) { >> end <- min(start + chunkSize - 1L, len) >> tot <- tot + .write1fastq(object[seq(start, end)], file, ...) >> start <- end + 1L >> } >> if (tot != len) >> warning("output length does not equal input length") >> >> summary(file)$description >> >> } >> >> >>> Michael >>> >>> >>> On Wed, Sep 25, 2013 at 7:39 AM, Marcus Davy <mdavy86 at="" gmail.com=""> wrote: >>> >>> Hi, >>>> as fastq files are often compressed now, I would like to be able to write >>>> out to compressed 'fq.gz' files, rather than raw text. >>>> >>>> I have searched the BioC threads and found a similar topic at; >>>> >>>> >>>> http://article.gmane.org/**gmane.science.biology.** >>>> informatics.conductor/45306/**match=writeFastq<http: article.gma="" ne.org="" gmane.science.biology.informatics.conductor="" 45306="" match="writeFa" stq=""> >>>> >>>> A poor example of what I would like to be able to achieve follows; >>>> >>>> example(readFastq) >>>> print(rfq) >>>> fout <- "test.fq" >>>> writeFastq(rfq, fout) >>>> system(paste("gzip", fout)) >>>> >>>> but this does not compress the data as it is streaming out to file. I >>>> would >>>> like to be able to do something like below, so this is a potential >>>> feature >>>> request; >>>> >>>> fout <- "test.fq.gz" >>>> writeFastq(rfq, fout, compress=TRUE) >>>> >>>> Is there an obvious way to compress using writeFastq that I am missing? >>>> >>>> >>>> cheers, >>>> >>>> Marcus >>>> >>>> >>>> sessionInfo() >>>> R version 3.0.0 (2013-04-03) >>>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>>> >>>> locale: >>>> [1] C >>>> >>>> attached base packages: >>>> [1] parallel splines stats graphics grDevices utils datasets >>>> [8] methods base >>>> >>>> other attached packages: >>>> [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 >>>> [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 >>>> [7] GenomicRanges_1.12.2 IRanges_1.18.0 BiocGenerics_0.6.0 >>>> [10] BiocInstaller_1.10.3 limma_3.16.7 Hmisc_3.12-2 >>>> [13] Formula_1.1-1 survival_2.37-4 >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >>> >> >> -- >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On 09/25/2013 01:23 PM, Harris A. Jaffee wrote: > I believe gzfile only applies when reading. > > With gzcon, you may have to flush or close the connection (not sure which > one - I think your fout2), or your whole session, to get the file written. writeFastq() won't currently work for anything other than a character string pointing to a file. The code included earlier in this email thread writeFastq_con() works with any connection that supports writing. Both gzfile() and gzcon() connections can be written to; usually we'd use gzfile() to write to a file. I don't know of a real use case for gzcon. In general connection output is buffered, so if the connection is opened outside the function, then it needs to be flush()'ed or close()'ed. The code and examples I provided earlier open and then close the connection inside the function, and output is of course written to the file. Martin > > > On Sep 25, 2013, at 4:07 PM, Marcus Davy wrote: > >> Thanks for the responses, Martin I will give your work-around a try. >> >> >> I did also try out the following but in both cases they failed, and only >> produced a zero byte sized file (equivalent to bash 'touch' utility on the >> command line) using OS X. >> >> fout <- gzfile("test.fq.gz", "wb") >> print(fout) >> writeFastq(rfq, fout) >> >> fout2 <- gzcon(file("test.fq.gz", "wb")) >> print(fout2) >> writeFastq(rfq, fout) >> >> >> cheers, >> >> >> Marcus >> >> >> On Thu, Sep 26, 2013 at 6:01 AM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >> >>> On 09/25/2013 09:09 AM, Michael Lawrence wrote: >>> >>>> Ideally, it would be: >>>> writeFastq(rfq, gzfile(fout)) >>>> >>>> But I don't think that works. In theory, it would be feasible to support >>>> this now that R has exposed its connection interface... >>>> >>> >>> That'll be an interesting learning experience, and I'll come up with a >>> proper implementation. Here's a work-around, With >>> >>> writeFastq_con(rfq, stdout()) >>> yy = writeFastq_con(rfq, gzfile(tempfile())) >>> readFastq(yy) >>> >>> Martin >>> >>> .write1fastq <- >>> function(x, con, ..., full=FALSE) >>> { >>> i <- paste0("@", as.character(id(x)), "\n") >>> fill <- if (full) >>> paste0("\n+", as.character(id(x)), "\n") >>> else "\n+\n" >>> s <- as.character(sread(x)) >>> q <- as.character(quality(quality(**x))) >>> cat(paste(i, s, fill, q, sep=""), file=con, sep="\n") >>> length(i) >>> } >>> >>> writeFastq_con <- >>> function(object, file, ..., chunkSize=1000000L) >>> ## object: ShortReadQ instance >>> ## file: connection, e.g., stdout(), gzfile(<...>, "wb") >>> ## ...: additional arguments, in particular >>> ## full=TRUE: logical(1), TRUE: replicate id tag on '+' line >>> ## chunkSize: size of each chunk processed for output, smaller >>> ## values use less memory but are slower >>> >>> { >>> if (!isOpen(file)) { >>> open(file, "wb") >>> on.exit(close(file)) >>> } >>> >>> len <- length(object) >>> start <- 1L >>> tot <- 0L >>> while (start < len) { >>> end <- min(start + chunkSize - 1L, len) >>> tot <- tot + .write1fastq(object[seq(start, end)], file, ...) >>> start <- end + 1L >>> } >>> if (tot != len) >>> warning("output length does not equal input length") >>> >>> summary(file)$description >>> >>> } >>> >>> >>>> Michael >>>> >>>> >>>> On Wed, Sep 25, 2013 at 7:39 AM, Marcus Davy <mdavy86 at="" gmail.com=""> wrote: >>>> >>>> Hi, >>>>> as fastq files are often compressed now, I would like to be able to write >>>>> out to compressed 'fq.gz' files, rather than raw text. >>>>> >>>>> I have searched the BioC threads and found a similar topic at; >>>>> >>>>> >>>>> http://article.gmane.org/**gmane.science.biology.** >>>>> informatics.conductor/45306/**match=writeFastq<http: article.gm="" ane.org="" gmane.science.biology.informatics.conductor="" 45306="" match="writeF" astq=""> >>>>> >>>>> A poor example of what I would like to be able to achieve follows; >>>>> >>>>> example(readFastq) >>>>> print(rfq) >>>>> fout <- "test.fq" >>>>> writeFastq(rfq, fout) >>>>> system(paste("gzip", fout)) >>>>> >>>>> but this does not compress the data as it is streaming out to file. I >>>>> would >>>>> like to be able to do something like below, so this is a potential >>>>> feature >>>>> request; >>>>> >>>>> fout <- "test.fq.gz" >>>>> writeFastq(rfq, fout, compress=TRUE) >>>>> >>>>> Is there an obvious way to compress using writeFastq that I am missing? >>>>> >>>>> >>>>> cheers, >>>>> >>>>> Marcus >>>>> >>>>> >>>>> sessionInfo() >>>>> R version 3.0.0 (2013-04-03) >>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>>>> >>>>> locale: >>>>> [1] C >>>>> >>>>> attached base packages: >>>>> [1] parallel splines stats graphics grDevices utils datasets >>>>> [8] methods base >>>>> >>>>> other attached packages: >>>>> [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 >>>>> [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 >>>>> [7] GenomicRanges_1.12.2 IRanges_1.18.0 BiocGenerics_0.6.0 >>>>> [10] BiocInstaller_1.10.3 limma_3.16.7 Hmisc_3.12-2 >>>>> [13] Formula_1.1-1 survival_2.37-4 >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________**_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**cond uctor<http: news.gmane.org="" gmane.science.biology.informatics.conducto="" r=""> >>>>> >>>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: http://news.gmane.org/gmane.** >>>> science.biology.informatics.**conductor<http: news.gmane.org="" gma="" ne.science.biology.informatics.conductor=""> >>>> >>>> >>> >>> -- >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
For the record, gzcon() is used by rtracklayer for a couple things, including handling gzip-compressed data from a URL, i.e., gzcon(url(x)). On Wed, Sep 25, 2013 at 2:18 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 09/25/2013 01:23 PM, Harris A. Jaffee wrote: > >> I believe gzfile only applies when reading. >> >> With gzcon, you may have to flush or close the connection (not sure which >> one - I think your fout2), or your whole session, to get the file written. >> > > writeFastq() won't currently work for anything other than a character > string pointing to a file. > > The code included earlier in this email thread writeFastq_con() works with > any connection that supports writing. > > Both gzfile() and gzcon() connections can be written to; usually we'd use > gzfile() to write to a file. I don't know of a real use case for gzcon. > > In general connection output is buffered, so if the connection is opened > outside the function, then it needs to be flush()'ed or close()'ed. The > code and examples I provided earlier open and then close the connection > inside the function, and output is of course written to the file. > > Martin > > > > >> >> On Sep 25, 2013, at 4:07 PM, Marcus Davy wrote: >> >> Thanks for the responses, Martin I will give your work-around a try. >>> >>> >>> I did also try out the following but in both cases they failed, and only >>> produced a zero byte sized file (equivalent to bash 'touch' utility on >>> the >>> command line) using OS X. >>> >>> fout <- gzfile("test.fq.gz", "wb") >>> print(fout) >>> writeFastq(rfq, fout) >>> >>> fout2 <- gzcon(file("test.fq.gz", "wb")) >>> print(fout2) >>> writeFastq(rfq, fout) >>> >>> >>> cheers, >>> >>> >>> Marcus >>> >>> >>> On Thu, Sep 26, 2013 at 6:01 AM, Martin Morgan <mtmorgan@fhcrc.org> >>> wrote: >>> >>> On 09/25/2013 09:09 AM, Michael Lawrence wrote: >>>> >>>> Ideally, it would be: >>>>> writeFastq(rfq, gzfile(fout)) >>>>> >>>>> But I don't think that works. In theory, it would be feasible to >>>>> support >>>>> this now that R has exposed its connection interface... >>>>> >>>>> >>>> That'll be an interesting learning experience, and I'll come up with a >>>> proper implementation. Here's a work-around, With >>>> >>>> writeFastq_con(rfq, stdout()) >>>> yy = writeFastq_con(rfq, gzfile(tempfile())) >>>> readFastq(yy) >>>> >>>> Martin >>>> >>>> .write1fastq <- >>>> function(x, con, ..., full=FALSE) >>>> { >>>> i <- paste0("@", as.character(id(x)), "\n") >>>> fill <- if (full) >>>> paste0("\n+", as.character(id(x)), "\n") >>>> else "\n+\n" >>>> s <- as.character(sread(x)) >>>> q <- as.character(quality(quality(****x))) >>>> cat(paste(i, s, fill, q, sep=""), file=con, sep="\n") >>>> length(i) >>>> } >>>> >>>> writeFastq_con <- >>>> function(object, file, ..., chunkSize=1000000L) >>>> ## object: ShortReadQ instance >>>> ## file: connection, e.g., stdout(), gzfile(<...>, "wb") >>>> ## ...: additional arguments, in particular >>>> ## full=TRUE: logical(1), TRUE: replicate id tag on '+' line >>>> ## chunkSize: size of each chunk processed for output, smaller >>>> ## values use less memory but are slower >>>> >>>> { >>>> if (!isOpen(file)) { >>>> open(file, "wb") >>>> on.exit(close(file)) >>>> } >>>> >>>> len <- length(object) >>>> start <- 1L >>>> tot <- 0L >>>> while (start < len) { >>>> end <- min(start + chunkSize - 1L, len) >>>> tot <- tot + .write1fastq(object[seq(start, end)], file, ...) >>>> start <- end + 1L >>>> } >>>> if (tot != len) >>>> warning("output length does not equal input length") >>>> >>>> summary(file)$description >>>> >>>> } >>>> >>>> >>>> Michael >>>>> >>>>> >>>>> On Wed, Sep 25, 2013 at 7:39 AM, Marcus Davy <mdavy86@gmail.com> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>>> as fastq files are often compressed now, I would like to be able to >>>>>> write >>>>>> out to compressed 'fq.gz' files, rather than raw text. >>>>>> >>>>>> I have searched the BioC threads and found a similar topic at; >>>>>> >>>>>> >>>>>> http://article.gmane.org/****gmane.science.biology.**<http: ar="" ticle.gmane.org="" **gmane.science.biology.**=""> >>>>>> informatics.conductor/45306/****match=writeFastq<http: **="">>>>>> article.gmane.org/gmane.**science.biology.informatics.** >>>>>> conductor/45306/match=**writeFastq<http: article.gmane.org="" gma="" ne.science.biology.informatics.conductor="" 45306="" match="writeFastq"> >>>>>> > >>>>>> >>>>>> A poor example of what I would like to be able to achieve follows; >>>>>> >>>>>> example(readFastq) >>>>>> print(rfq) >>>>>> fout <- "test.fq" >>>>>> writeFastq(rfq, fout) >>>>>> system(paste("gzip", fout)) >>>>>> >>>>>> but this does not compress the data as it is streaming out to file. I >>>>>> would >>>>>> like to be able to do something like below, so this is a potential >>>>>> feature >>>>>> request; >>>>>> >>>>>> fout <- "test.fq.gz" >>>>>> writeFastq(rfq, fout, compress=TRUE) >>>>>> >>>>>> Is there an obvious way to compress using writeFastq that I am >>>>>> missing? >>>>>> >>>>>> >>>>>> cheers, >>>>>> >>>>>> Marcus >>>>>> >>>>>> >>>>>> sessionInfo() >>>>>> R version 3.0.0 (2013-04-03) >>>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit) >>>>>> >>>>>> locale: >>>>>> [1] C >>>>>> >>>>>> attached base packages: >>>>>> [1] parallel splines stats graphics grDevices utils >>>>>> datasets >>>>>> [8] methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] ShortRead_1.18.0 latticeExtra_0.6-24 RColorBrewer_1.0-5 >>>>>> [4] Rsamtools_1.12.3 lattice_0.20-15 Biostrings_2.28.0 >>>>>> [7] GenomicRanges_1.12.2 IRanges_1.18.0 BiocGenerics_0.6.0 >>>>>> [10] BiocInstaller_1.10.3 limma_3.16.7 Hmisc_3.12-2 >>>>>> [13] Formula_1.1-1 survival_2.37-4 >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> ______________________________****_________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@r-project.org >>>>>> https://stat.ethz.ch/mailman/****listinfo/bioconductor<https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor=""> >>>>>> <https: **="" stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>>> > >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.****science.biology.informatics.**** >>>>>> conductor<http: news.gmane.org="" gmane.**science.biology.informa="" tics.**conductor=""> >>>>>> <http: news.gmane.**org="" gmane.science.biology.**="">>>>>> informatics.conductor<http: news.gmane.org="" gmane.science.biolo="" gy.informatics.conductor=""> >>>>>> > >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________****_________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@r-project.org >>>>> https://stat.ethz.ch/mailman/****listinfo/bioconductor<https: s="" tat.ethz.ch="" mailman="" **listinfo="" bioconductor=""> >>>>> <https: **="" stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> > >>>>> Search the archives: http://news.gmane.org/gmane.** >>>>> science.biology.informatics.****conductor<http: news.gmane.**="">>>>> org/gmane.science.biology.**informatics.conductor<http: news.gm="" ane.org="" gmane.science.biology.informatics.conductor=""> >>>>> > >>>>> >>>>> >>>>> >>>> -- >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6