Hi, Im working with a 4c-seq experiment and one of the issues with this technique is that you have to delete reads that originated adjacent from the sequence of interest, I have the sequence corresponding to these reads, I need to delete these reads, till now I only found the reads that have this sequence using vmatchpattern but how can i delete these reads and create a new fastq archive without them?
Thank you very much for your help!
Looks like this is a COMMENT rather than Answer; please use the 'Add comment' button for comments.
To get some example data, I entered the following into my R session
This creates a variable that points to a small fastq file; I read it in and extracted the short read sequences, just to get a look
Suppose I wanted to get rid of sequences with "TTACC". I could use the grepl() function to find the reads that do not contain this pattern, and subset the original reads to get those that I want to keep
So it looks like there are 244 reads that satisfy my criterion. Now write a function that does this, and test...
Verify that it works
Then use it in the filterFastq function, creating a new fastq file with the filtered results
verify that the output file contains the correct number of sequences
Thank you very much! This was super useful for me :D