Error in dbCreator function at PGA package
1
0
Entering edit mode
gabrielw • 0
@gabrielw-15924
Last seen 4.6 years ago

Hello all I just tried to use with my data PGA package and i had an error in dbCreator:

library(BSgenome.Hsapiens.UCSC.hg19)
dbfile <- dbCreator(gtfFile=gtffile,vcfFile=vcffile,bedFile=bedfile,annotation_path=annotation_path,outfile_name=outfile_name,genome=Hsapiens,outdir=outfile_path)
Error in y[, z] : subscript out of bounds

this is the vcf header:

##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
#CHROM  POS ID  REF ALT QUAL
chr1    10043516    .   A   G   0.01
chr1    10044378    .   A   G   0.01

the gtf header:

iarc@iarc-H8QG6:/New_data/pancreatic/new_data$ head  -20 new_panc/merged_asm/merged.gtf 
1   Cufflinks   exon    852260  857889  .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.6.1"; class_code "u"; tss_id "TSS1";
1   Cufflinks   exon    857991  858025  .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.6.1"; class_code "u"; tss_id "TSS1";
1   Cufflinks   exon    878110  878438  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "1"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
1   Cufflinks   exon    878633  878757  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "2"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";

bedfile header:

1   12697   13221   JUNC_0  0   +   12697   13221   255,0,0 2   50,50   0,575
1   14829   14970   JUNC_1  1   -   14829   14970   255,0,0 2   50,50   0,192
1   14829   15021   JUNC_2  0   -   14829   15021   255,0,0 2   50,50   0,243
1   15012   25233   JUNC_3  0   +   15012   25233   255,0,0 2   50,50   0,10272
1   15038   15796   JUNC_4  3   -   15038   15796   255,0,0 2   50,50   0,809

annotation prepared with PrepareAnnotationRefseq2

any thoughts?

PGA Proteogenomics • 810 views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 6 weeks ago
Republic of Ireland

I re-formatted your post by highlighting the code chunks and pressing the 101 010 button... to improve its aesthetics.

The likely cause is that your contigs ('chromosomes') are labeled differently: in your VCF, they have a 'chr' prefix; in your GTF and BED, they have no prefix.

I suppose that you have different choices to make about which contig style you wish to use. For example, adding a 'chr' prefix to your GTF and BED is as simple as:

awk '{print "chr"$0}' merged.gtf
*et cetera*

On the other hand, removing the 'chr' prefix from the VCF is as simple as:

sed 's/^chr//' merged.vcf

That sed command will automatically ignore the VCF header because we match the start of each line via the caret (^).

Kevin

ADD COMMENT
0
Entering edit mode

Thank you for the patience to format it! and it passed straight from me this incompatibility of chr within files!! lol Thanks I fixed merged gtf and the bed file with chr but still I am getting the same error!

iarc@iarc-H8QG6:/New_data/pancreatic/new_data$ head merged.gtf
chr1    Cufflinks   exon    852260  857889  .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.6.1"; class_code "u"; tss_id "TSS1";
chr1    Cufflinks   exon    857991  858025  .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.6.1"; class_code "u"; tss_id "TSS1";
chr1    Cufflinks   exon    878110  878438  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "1"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1    Cufflinks   exon    878633  878757  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "2"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1    Cufflinks   exon    879078  879188  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "3"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1    Cufflinks   exon    879288  880180  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "4"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1    Cufflinks   exon    880422  880526  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "5"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1    Cufflinks   exon    880898  881033  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "6"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1    Cufflinks   exon    881527  881666  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "7"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
chr1    Cufflinks   exon    881782  881925  .   +   .   gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "8"; oId "CUFF.36.1"; class_code "u"; tss_id "TSS2";
iarc@iarc-H8QG6:/New_data/pancreatic/new_data$ head junctions.bed 
chr1    12697   13221   JUNC_0  0   +   12697   13221   255,0,0 2   50,50   0,575
ADD REPLY
0
Entering edit mode

No problem. Okay, the next problem is that your VCF is corrupt and is not valid. A VCF should have the following mandatory headers:

1. #CHROM
2. POS
3. ID
4. REF
5. ALT
6. QUAL
7. FILTER
8. INFO

You need to go back a few steps and determine how this corrupt VCF was created

ADD REPLY

Login before adding your answer.

Traffic: 971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6