Question

How to interpret the fusion.txt file output from subjunc (Rsubread package)

0

Entering edit mode

misic • 0

@misic-8351

Last seen 9.6 years ago

United States

I am collaborating with a lab which is trying to find unusual virus transcripts using RNAseq (Illumina NextSeq, 150 bp paired-end sequencing). Specifically, we are looking for sequences which map to more than 1 part of the reference virus genome. I am using the Rsubreads package and have used subjunc to identify if these sequences exist. Using the following commands, I got a "fusion.txt" file output, however, I am unsure how to interpret the two location columns and how to determine where the alignments start and stop. Are "junctions" and "fusions" determined independently? Also, how do I identify which sequences specifically were called fusions, so that I may examine the sequences myself?

Thank you in advance for your time and assistance.

Commands:

buildindex(basename="virus", reference="virus.fas")

subjunc("virus",readfile1="R1.fastq",readfile2="R2.fastq",input_format="FASTQ",output_format="SAM",nsubreads=14,TH1=1,TH2=1,maxMismatches=3,nthreads=8,indels=5,phredOffset=33,tieBreakQS=FALSE,tieBreakHamming=TRUE,unique=TRUE,nBestLocations=1,minFragLength=50,maxFragLength=250,PE_orientation="fr",nTrim5=0,nTrim3=0,readGroupID=NULL,readGroup=NULL,color2base=FALSE,DNAseq=FALSE,reportAllJunctions=TRUE)

Fusion.txt file output:

#Chr        Location        Chr        Location        SameStrand        nSupport
Contig_1        11053        Contig_1        11177        Yes        7
Contig_1        1433        Contig_1        1482        Yes        6
Contig_1        7858        Contig_1        8127        Yes        4
Contig_1        10505        Contig_1        10945        Yes        2
Contig_1        1459        Contig_1        1711        Yes        1
Contig_1        13261        Contig_1        13389        Yes        6
Contig_1        13502        Contig_1        13550        No        11
Contig_1        11904        Contig_1        12065        Yes        29
Contig_1        14692        Contig_1        14951        Yes        1
Contig_1        14585        Contig_1        14742        Yes        6
Contig_1        12156        Contig_1        12393        Yes        1
Contig_1        4344        Contig_1        4432        Yes        2
Contig_1        8142        Contig_1        8263        Yes        8
Contig_1        264        Contig_1        313        Yes        2
Contig_1        3886        Contig_1        3976        Yes        1
Contig_1        14054        Contig_1        14101        Yes        8
Contig_1        8003        Contig_1        8182        Yes        5
Contig_1        5948        Contig_1        6204        Yes        2

rsubread subjunc rnaseq fusion genes • 1.5k views

ADD COMMENT • link updated 9.6 years ago by Wei Shi ★ 3.6k • written 9.6 years ago by misic • 0

score 0 · Answer 1 · 2015-07-08

The 'fusion.txt' file contains breakpoints arising from structural variant events (breakpoints arising from exon-exon splicing events can be found in the 'junction.bed' file). If a read was found to contain a breakpoint, the subjunc program will put the breakpoint into the 'fusion.txt' file if:

(1) one part of the read sequence maps to one chromosome and the other part maps to a different chromosome (the two parts are separated by the breakpoint), or
(2) both parts map to the same chromosome but to different strands, or
(3) both parts map to the same strand on the same chromosome but their relative position swapped (ie. the right-side part map to the left of the left-side part).

If none of these conditions was satisfied, the breakpoint will be put into the 'junction.bed' file, indicating that this breakpoint arose from an exon-exon splicing event.

Most of the fusions you listed fell in condition (3), indicating that quite a few tandem duplication events occurring in your sample.

The reads giving rise to the breakpoint data in 'fusion.txt' file have additional fields, including CC(Chr), CP(Position),CG(CIGAR) and CT(strand) in the mapping output. However, exon-spanning reads that span an intron of >500kb long will also contain these additional fields. Therefore you will need to make use of both the SAM/BAM file and the 'fusion.txt' file to locate the reads that contain the breakpoints included in your 'fusion.txt' file.