I am collaborating with a lab which is trying to find unusual virus transcripts using RNAseq (Illumina NextSeq, 150 bp paired-end sequencing). Specifically, we are looking for sequences which map to more than 1 part of the reference virus genome. I am using the Rsubreads package and have used subjunc to identify if these sequences exist. Using the following commands, I got a "fusion.txt" file output, however, I am unsure how to interpret the two location columns and how to determine where the alignments start and stop. Are "junctions" and "fusions" determined independently? Also, how do I identify which sequences specifically were called fusions, so that I may examine the sequences myself?
Thank you in advance for your time and assistance.
Commands:
buildindex(basename="virus", reference="virus.fas")
subjunc("virus",readfile1="R1.fastq",readfile2="R2.fastq",input_format="FASTQ",output_format="SAM",nsubreads=14,TH1=1,TH2=1,maxMismatches=3,nthreads=8,indels=5,phredOffset=33,tieBreakQS=FALSE,tieBreakHamming=TRUE,unique=TRUE,nBestLocations=1,minFragLength=50,maxFragLength=250,PE_orientation="fr",nTrim5=0,nTrim3=0,readGroupID=NULL,readGroup=NULL,color2base=FALSE,DNAseq=FALSE,reportAllJunctions=TRUE)
Fusion.txt file output:
#Chr Location Chr Location SameStrand nSupport
Contig_1 11053 Contig_1 11177 Yes 7
Contig_1 1433 Contig_1 1482 Yes 6
Contig_1 7858 Contig_1 8127 Yes 4
Contig_1 10505 Contig_1 10945 Yes 2
Contig_1 1459 Contig_1 1711 Yes 1
Contig_1 13261 Contig_1 13389 Yes 6
Contig_1 13502 Contig_1 13550 No 11
Contig_1 11904 Contig_1 12065 Yes 29
Contig_1 14692 Contig_1 14951 Yes 1
Contig_1 14585 Contig_1 14742 Yes 6
Contig_1 12156 Contig_1 12393 Yes 1
Contig_1 4344 Contig_1 4432 Yes 2
Contig_1 8142 Contig_1 8263 Yes 8
Contig_1 264 Contig_1 313 Yes 2
Contig_1 3886 Contig_1 3976 Yes 1
Contig_1 14054 Contig_1 14101 Yes 8
Contig_1 8003 Contig_1 8182 Yes 5
Contig_1 5948 Contig_1 6204 Yes 2