Dear All,
I have some RNAseq BAM files left from previous labmates. Is there a quick way to tell are these BAM files paired end or not ? Many Thanks!!
Stephen
Dear All,
I have some RNAseq BAM files left from previous labmates. Is there a quick way to tell are these BAM files paired end or not ? Many Thanks!!
Stephen
Hi Stephen,
Rsamtools (devel version) has a new function testPairedEndBam which returns TRUE if the BAM file contains paired end reads, FALSE otherwise.
library(Rsamtools) fl <- system.file("extdata", "ex1.bam", package="Rsamtools") testPairedEndBam(fl) [1] TRUE packageVersion("Rsamtools") [1] ‘1.19.23’
- Sonali
Load Rsamtools
library(Rsamtools)
Point to a file, and mark it as a 'BamFile'
## your file here! fl = system.file(package="Rsamtools", "extdata", "ex1.bam")
And summarize
quickBamFlagSummary(fl)
My original longer answer was to query the file for sequence ranges, and choose a not too large chromosome
bfl = BamFile(fl) seq = as(seqinfo(bfl), "GRanges")[1]
Input the reads on that chromosome, specifying that you'd like the 'flag' field in addition to the standard coordinates
libary(GenomicAlignments) aln <- readGAlignments(bfl, param=ScanBamParam(which=seq[1], what="flag"))
Tally the different flags, looking for lots of properly paired reads
colSums(bamFlagAsBitMatrix(mcols(aln)$flag))
Depends on the platform. For Illumina, it's usually a set of files like:
sample_1.1.fastq sample_1.2.fastq
And if you look at the data inside you will see:
head -n 4 CTATCGCT.AGGATAGG_8.1.fastq @C5NM5ACXX:8:1101:10000:16673/1 GTCACAGGTCTTGCATAGGTAAACTACTTGGAGGTCAGGGGCTACGTGGT + CCCFFFFFFFDHGIJIJIJIFIHJJJIIIIJFIDGHIIHGIIHJIJGIIC head -n 4 CTATCGCT.AGGATAGG_8.2.fastq @C5NM5ACXX:8:1101:10000:16673/2 CGGCTTTATTCTTCTATAGTAAGTCTCCCCTCTTTACTGGGGAGGGGGGG + CCCFFFFFHGHHGJIJJJJJHIIJJIJIGJJIJIJIJJIIIIHIIIIJDD
Where the @ lines are identical but for the last part.
Hi Stephen,
library(Rsamtools) quickBamFlagSummary(bamfile, main.groups.only=TRUE) group | nb of | nb of | mean / max of | records | unique | records per records | in group | QNAMEs | unique QNAME All records........................ A | 3307 | 1699 | 1.95 / 2 o template has single segment.... S | 0 | 0 | NA / NA o template has multiple segments. M | 3307 | 1699 | 1.95 / 2 - first segment.............. F | 1654 | 1654 | 1 / 1 - last segment............... L | 1653 | 1653 | 1 / 1 - other segment.............. O | 0 | 0 | NA / NA Note that (S, M) is a partitioning of A, and (F, L, O) is a partitioning of M. Indentation reflects this.
Look at the nb of records in groups S and M. 0 and 3307 here so it's paired-end.
Cheers,
H.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.