Entering edit mode
hannalberman
•
0
@user-24786
Last seen 3.9 years ago
I am trying to use the adapter_filter()
function in fastqCleaner to remove primer sequences, but can still find the primer sequences in the output after running the adapter_filter()
command. I am using sequences from this ENA entry (Project: PRJNA377530) and the following is using just the forward and reverse reads from the first sample with files SRR5314314_1.fastq.gz, and SRR5314314_1.fastq.gz as an example.
fwdRead <- readFastq("~/SRR5314314_1.fastq.gz")
revRead <- readFastq("~/SRR5314314_2.fastq.gz")
FWD <- "ACCTGCGGARGGATCA"
REV <- "GAGATCCRTTGYTRAAAGTT"
fwdFilt <- adapter_filter(fwdRead, Lpattern = FWD, anchored=TRUE, fixed = FALSE)
refFilt <- adapter_filter(revRead, Lpattern = REV, anchored=TRUE, fixed = FALSE)
There is no error message, but the primer sequences do not get filtered from the reads
Can you share some of the reads that you expect to be filtered (but that are not filtered)? Also, are you aware of the other parameters that may be of importance, such as:
The function appears to be working on the 3' primers (Rpattern), so I've just posted using the Lpattern here. The primer sequences are forward, so I do not need rc.L=TRUE, (I don't need to match the reverse complement.) There is no need to include the "first" variable if I'm only searching for one primer pattern. The function tests for the length of each pattern and if the length of the Rpattern is 0 it will just run the Lpattern. I tried those parameters anyway, ie
adapter_filter(fwdRead, Lpattern = FWD, anchored=TRUE, fixed = FALSE, first="L")
or "R" for the first read. The 5' primers should be in the terminals, but I have tried both anchored=TRUE and anchored=FALSE. Neither have worked for me. Also tried increasing the error rates, but that should not be an issue anyway since I can see how many times times I should be able to find the primers in each sample withvmatchpattern()
setting maximum mismatches to 0.I normally would not try to do this in R but I'm going through the dada2 ITS tutorial with a class and trying to avoid compatibility issues for people with Windows.
It seems the function is trimming the number of bases of the in each primer from the right instead of the left. Example of a sequence before and after running the function:
sread(fwdRead)[1]
DNAStringSet object of length 6: width seq [1] 187 ACCTGCGGAGGGATCATTACCGAGTTTACAACTCCCAAACCCCTGTGAACATACCTTATGTTGCC...CTGTTTTTAGTTGAACTTCTGAGTATAAAAAACAAATAAATCAAAACTTTCAACAATGGATCTC
Example after:
sread(fwdFilt)[1]
DNAStringSet object of length 1: width seq [1] 171 ACCTGCGGAGGGATCATTACCGAGTTTACAACTCCCAAACCCCTGTGAACATACCTTATGTTGCC...GCAGGAACCCTAAACTCTGTTTTTAGTTGAACTTCTGAGTATAAAAAACAAATAAATCAAAACT