count first 4000

Question

How to count barcode sequences in a fastq file

0

Entering edit mode

yura.grabovska ▴ 30

@yuragrabovska-9835

Last seen 3.5 years ago

United Kingdom

Hi,

I have a number of single end fastq files which contain sequencing from a barcoding experiment. I have a large list of barcodes (~120,000) and I want to count the number of exact barcode matches in the fastq files.

I have been looking into the ShortRead package but I'm not entirely sure if it's the right tool for this as I can't figure out how to use it to do this.

Can anyone suggest a way I can get counts for exact matches in R

fastq barcoding • 4.1k views

ADD COMMENT • link updated 3.7 years ago by Kevin Blighe ★ 4.0k • written 3.7 years ago by yura.grabovska ▴ 30

0

Entering edit mode

What do you mean by "barcoding experiment"? Something like sequencing reads that contains a barcode, like a CRISPRi screen? If so then it probably comes down to making a fasta file with all barcode sequences and end-to-end alignment with something like bowtie2 with penalty parameters set to a high value like 10000 so only perfect end-to-end matches will get aligned, and everything else will go unmapped. In R directly probably the Rsubread package can do that, but it is basically a one-liner in bash to run bowtie2. Then you could use something like featureCounts to count reads per barcode.

ADD REPLY • link 3.7 years ago ATpoint ★ 4.8k

0

Entering edit mode

Hi, you can do this is bash with a one liner,

assuming the barcode is, GTGAAA, here I count the first 4000 but if you eliminate the head pipe it will count the entire file.

count first 4000

gunzip -c test_R1.fastq.gz | head -4000 | grep 0:GTGAAA | wc -l;

count entire file ( takes a least a few mins for a typical rnaseq file)

gunzip -c test_R1.fastq.gz | grep 0:GTGAAA | wc -l

A

ADD REPLY • link 3.7 years ago Ahdee ▴ 60

0

Entering edit mode

You may be able to do it with umi_tools extract: https://umi-tools.readthedocs.io/en/latest/reference/extract.html

This question is more suited to a general forum like Biostars or Bioinformatics Stack Exchange.

ADD REPLY • link 3.7 years ago Kevin Blighe ★ 4.0k