sorting FASTQ file by ID
0
0
Entering edit mode
Ramzi TEMANNI ▴ 160
@ramzi-temanni-3819
Last seen 10.2 years ago
Hi everyone, I have paired end data in fastq format where forward and reverse file have different number of reads and are not ordered(based on their id) . I write the following code to mate the reads but seems that *srsort* do not sort id. could anyone tell me what would be the function to use and if there any way to tune the code as the fastQ files to process are around 6gig ? I'm working on a 16 core / 16gig server. overlapingreads<-function(m1.filename,m2.filename) { fastq.m1 <- readFastq(m1.filename) # read forward fq file fastq.m2 <- readFastq(m2.filename) # read reverse fq file # HWI-EA332_0007_FC622U7:6:1:2761:1100#0/2 # extract tile and coordinates as key for matching forward and reverse reads id1=subseq(id(fastq.m1),26,nchar(id(fastq.m1))-4) id2=subseq(id(fastq.m2),26,nchar(id(fastq.m2))-4) cid=sort(intersect(id1,id2)) tmp1=srsort(fastq.m1[id1%in%cid]) tmp2=srsort(fastq.m2[id2%in%cid]) writeFastq(tmp1,paste("sorted_",m1.filename,sep="")) writeFastq(tmp2,paste("sorted_",m2.filename,sep="")) } Thanks in advance for your help and comments Regards, Ramzi > sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.8.2 Rsamtools_1.2.2 lattice_0.19-17 [4] Biostrings_2.18.2 GenomicRanges_1.2.2 IRanges_1.8.8 loaded via a namespace (and not attached): [1] Biobase_2.10.0 grid_2.12.1 hwriter_1.3 tools_2.12.1 > [[alternative HTML version deleted]]
PROcess PROcess • 2.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 999 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6