Error when predicting features with SGSeq
5
1
Entering edit mode
Helen Zhou ▴ 150
@helen-zhou-2654
Last seen 9.4 years ago
United States

I'm trying to use the analyzeFeatures function in SGSeq following the vignette, but I get this error:

> library(SGSeq)

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)

> sgfc <- analyzeFeaturesbam.info, which = gr)

Predict features...

Error in mergeTxFeatures(list_features, min_n_sample = min_n_sample) : 

  ... must be one or more TxFeatures or a single

            list of TxFeatures

 

The bam.info object contains information about my BAM files, after running getBamInfo():

> bam.info

  sample_name                                                     file_bam paired_end read_length frag_length  lib_size

1     sample1    /home/projects/gene_expression/Tmp/A549.cytosol.polyA.bam       TRUE         101         262 126299548

2     sample2    /home/projects/gene_expression/Tmp/A549.nucleus.polyA.bam       TRUE         101         293 149162537

6     sample3 /home/projects/gene_expression/Tmp/GM12878.cytosol.polyA.bam       TRUE          76         276  62103999

7     sample4 /home/projects/gene_expression/Tmp/GM12878.nucleus.polyA.bam       TRUE          76         239  60730206

 

These files have been merged from two separate bam files, processed with RSEM and TopHat. It includes transcripts from UCSC (hg19), but also some from other sources.

When I did into the functions a bit, the problem seems to occur with predictTxFeaturesPerSample. If I run that directly on a single sample, I get:

> sample_info <- bam.info

>  i <- 1

>  SGSeq:::predictTxFeaturesPerSample(file_bam = sample_info$file_bam[i], 

+         paired_end = sample_info$paired_end[i], read_length = sample_info$read_length[i], 

+         frag_length = sample_info$frag_length[i], lib_size = sample_info$lib_size[i], which=gr, alpha = 2, psi = 0, beta = 0.2, 

+     gamma = 0.2)

Error in (function (classes, fdef, mtable)  : 

  unable to find an inherited method for function ‘mcols’ for signature ‘"character"’

 

Does SGSeq simply not work for samples that have been mapped using annotation that only partly overlaps with the USCS data?

~~Helen

 

 

SGSeq • 2.1k views
ADD COMMENT
0
Entering edit mode
@leonard-goldstein-5925
Last seen 9.4 years ago
United States

Hi Helen,

Thanks for your question. Can I ask what version of SGSeq you are using? The output of sessionInfo() would be helpful.

BAM files used as input for SGSeq should be RNA-seq data mapped to a reference genome with a splice-aware aligner that includes the custom tag XS in the BAM output (e.g. GSNAP, STAR, TopHat). If you have multiple BAM  files that are intended to be analyzed together, they should all include mappings to the same reference genome. It sounds like your BAM files might contain alignments against a transcriptome rather than a genome?

Leonard

ADD COMMENT
0
Entering edit mode
Helen Zhou ▴ 150
@helen-zhou-2654
Last seen 9.4 years ago
United States

My apologize Leonard. Session info is shown below, along with the error.

The bam files do contain multiple alignments. I asked our bioinformatics facility to follow the approach outlined in this G&D article, since we are also looking at neuronal stem cells. http://genesdev.cshlp.org/content/27/9/1032.long It is my understanding (but I am not sure) that the reads were mapped both to the genome and the transcriptome (human).

 

> sgfc <- analyzeFeaturesbam.info, which = NULL)

Predict features...

Error in mergeTxFeatures(list_features, min_n_sample = min_n_sample) : 

  ... must be one or more TxFeatures or a single

            list of TxFeatures

In addition: Warning message:

In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule,  :

  20 function calls resulted in an error

> sessionInfo()

R version 3.2.1 (2015-06-18)

Platform: x86_64-unknown-linux-gnu (64-bit)

Running under: Ubuntu 14.04.2 LTS

 

locale:

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              

 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    

 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   

 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 

 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

 

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets 

[8] methods   base     

 

other attached packages:

 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.2

 [2] GenomicFeatures_1.20.1                 

 [3] AnnotationDbi_1.30.1                   

 [4] Biobase_2.28.0                         

 [5] SGSeq_1.2.2                            

 [6] GenomicRanges_1.20.5                   

 [7] GenomeInfoDb_1.4.1                     

 [8] IRanges_2.2.5                          

 [9] S4Vectors_0.6.2                        

[10] BiocGenerics_0.14.0                    

 

loaded via a namespace (and not attached):

 [1] igraph_1.0.1            XVector_0.8.0           magrittr_1.5           

 [4] zlibbioc_1.14.0         GenomicAlignments_1.4.1 BiocParallel_1.2.9     

 [7] tools_3.2.1             DBI_0.3.1               lambda.r_1.1.7         

[10] futile.logger_1.4.1     rtracklayer_1.28.6      futile.options_1.0.0   

[13] bitops_1.0-6            RCurl_1.95-4.7          biomaRt_2.24.0         

[16] RSQLite_1.0.0           Biostrings_2.36.1       Rsamtools_1.20.4       

[19] XML_3.98-1.3           

ADD COMMENT
0
Entering edit mode
@leonard-goldstein-5925
Last seen 9.4 years ago
United States

Hi Helen,

Thanks for providing the sessionInfo output. 

It sounds like these are non-standard BAM files (merged from different sources). Can you try running SGSeq on the unmodified BAM files that were generated by TopHat? 

Leonard

 

ADD COMMENT
0
Entering edit mode
Helen Zhou ▴ 150
@helen-zhou-2654
Last seen 9.4 years ago
United States

Leonard,

unfortunately we have to remove temporary files due to size constraints on our folders. I will try running SGSeq next time I have some of the intermediate result file from our bioinformatics core.

~~Helen

ADD COMMENT
0
Entering edit mode

Peharps you could try extracting the reads in the BAM file you have and then realign to a reference genome. Googling for "realign reads bam file" will turn up several hits around that theme that you might be able to try.

ADD REPLY
0
Entering edit mode
@leonard-goldstein-5925
Last seen 9.4 years ago
United States

Hi Helen,

If it is an option to make available the current BAM files, I can have a look at the data to understand what is causing the problem. 

Otherwise I suggest running SGSeq on the unmodified TopHat output when you get the chance and if you are still encountering problems, please do report them on the support site. 

Many thanks,

Leonard

ADD COMMENT

Login before adding your answer.

Traffic: 563 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6