I tried using the processAmplicons function from edgeR where the hairpin sequence is at start in the fastq file and the barcode towards the end. While there are 100% matches with barcodes, I'm getting 0% for the hairpins. However the hairpin sequences are present in the fastq file. As the updated version of this function now allows both structured and variable structures I am not able to specify hairpin and barcode start positions.
Any help with why the hairpin sequences aren't being recognized would be appreciated.
code:
library(edgeR)
processAmplicons(readfile="SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq",
barcodefile="GSE139118_Pool1revcompbarcodes.txt",
hairpinfile="GSE139118_Pool1revcompshRNA.txt",
verbose=TRUE)
output:
-- Number of Barcodes : 20
-- Number of Hairpins : 1911
Processing reads in SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq.
-- Processing 10 million reads
-- Processing 20 million reads
-- Processing 30 million reads
-- Processing 40 million reads
Number of reads in file SRR10312928_GSM4131258_DMSO.D8.R1_Mus_musculus.fastq : 31232748
The input run parameters are:
-- Barcode in forward read: length 4
-- Hairpin in forward read: length 22
-- Mismatch in barcode/hairpin sequences not allowed.
Total number of read is 31232748
There are 31232748 reads (100.0000 percent) with barcode matches
There are 1 reads (0.0000 percent) with hairpin matches
There are 1 reads (0.0000 percent) with both barcode and hairpin matches
Warning message:
In edgeR::DGEList(counts = hairpinReadsSummary, genes = hairpins) :
library size of zero detected
Thanks for updating the function with this argument. I have tried calling the processAmplicons function again with this version of edgeR as suggested above and I'm getting the following error:
Having looked at the code, I would guess there's an issue with the error function (line 233) after using tryCatch on line 154.
My fault. I introduced a code error when I committed Oliver Voogd's changes to the public package. Now fixed in edgeR 3.38.3 and 3.39.5.