how can I tell if a RNAseq BAM file is paired end or single end ?
4
2
Entering edit mode
stephen66 ▴ 50
@stephen66-7177
Last seen 3.6 years ago
United States

Dear All,

I have some RNAseq BAM files left from previous labmates. Is there a quick way to tell are these BAM files paired end or not ? Many Thanks!!

Stephen

NGS BAm • 7.5k views
ADD COMMENT
4
Entering edit mode
Sonali Arora ▴ 390
@sonali-arora-6563
Last seen 8.7 years ago
United States

Hi Stephen,

Rsamtools (devel version) has a new function testPairedEndBam which returns TRUE if the BAM file contains paired end reads, FALSE otherwise. 

library(Rsamtools)

fl <- system.file("extdata", "ex1.bam", package="Rsamtools")

testPairedEndBam(fl)
[1] TRUE

packageVersion("Rsamtools")

[1] ‘1.19.23’

- Sonali

 

ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

Load Rsamtools

library(Rsamtools)

Point to a file, and mark it as a 'BamFile'

## your file here!
fl = system.file(package="Rsamtools", "extdata", "ex1.bam")

And summarize

quickBamFlagSummary(fl)

My original longer answer was to query the file for sequence ranges, and choose a not too large chromosome

bfl = BamFile(fl)
seq = as(seqinfo(bfl), "GRanges")[1]

Input the reads on that chromosome, specifying that you'd like the 'flag' field in addition to the standard coordinates

libary(GenomicAlignments)
aln <- readGAlignments(bfl, param=ScanBamParam(which=seq[1], what="flag"))

Tally the different flags, looking for lots of properly paired reads

colSums(bamFlagAsBitMatrix(mcols(aln)$flag))
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States

Depends on the platform. For Illumina, it's usually a set of files like:

sample_1.1.fastq
sample_1.2.fastq

And if you look at the data inside you will see:

head -n 4 CTATCGCT.AGGATAGG_8.1.fastq
@C5NM5ACXX:8:1101:10000:16673/1
GTCACAGGTCTTGCATAGGTAAACTACTTGGAGGTCAGGGGCTACGTGGT
+
CCCFFFFFFFDHGIJIJIJIFIHJJJIIIIJFIDGHIIHGIIHJIJGIIC
head -n 4 CTATCGCT.AGGATAGG_8.2.fastq
@C5NM5ACXX:8:1101:10000:16673/2
CGGCTTTATTCTTCTATAGTAAGTCTCCCCTCTTTACTGGGGAGGGGGGG
+
CCCFFFFFHGHHGJIJJJJJHIIJJIJIGJJIJIJIJJIIIIHIIIIJDD

Where the @ lines are identical but for the last part.

 

ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 2 hours ago
Seattle, WA, United States

Hi Stephen,

library(Rsamtools)
quickBamFlagSummary(bamfile, main.groups.only=TRUE)
                                group |    nb of |    nb of | mean / max
                                   of |  records |   unique | records per
                              records | in group |   QNAMEs | unique QNAME
All records........................ A |     3307 |     1699 | 1.95 / 2
  o template has single segment.... S |        0 |        0 |   NA / NA
  o template has multiple segments. M |     3307 |     1699 | 1.95 / 2
      - first segment.............. F |     1654 |     1654 |    1 / 1
      - last segment............... L |     1653 |     1653 |    1 / 1
      - other segment.............. O |        0 |        0 |   NA / NA

Note that (S, M) is a partitioning of A, and (F, L, O) is a partitioning of M.
Indentation reflects this.

Look at the nb of records in groups S and M. 0 and 3307 here so it's paired-end.

Cheers,

H.

ADD COMMENT

Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6