Issues with SGSeq: analyzeVariants method returns an error...
3
0
Entering edit mode
@sylvain-foisy-5539
Last seen 5.2 years ago
Canada

Hi,

Continuing my exploration of SGSeq ;-) Ok, I got my analyzeFeatures() data using an annotation file and I am trying to move on to analyzeVariants(). I get the following errors if I try it:

> sgfc.var<-analyzeVariants(sgfc,maxnvariant=NA,cores=8)

Find segments...

Find variants...

Error: subscript contains NAs or out-of-bounds indices

In addition: There were 11 warnings (use warnings() to see them)

> warnings()

Warning messages:

1: In mclapply(geneIDs, findVariantsPerGene, g = g, maxnvariant = maxnvariant,  ... :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

2: In mclapply(x, "[[", j, mc.cores = cores) :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

3: In mclapply(x, "[[", j, mc.cores = cores) :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

4: In mclapply(x, "[[", j, mc.cores = cores) :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

5: In mclapply(x, "[[", j, mc.cores = cores) :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

6: In mclapply(x, "[[", j, mc.cores = cores) :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

7: In mclapply(x, "[[", j, mc.cores = cores) :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

8: In mclapply(x, "[[", j, mc.cores = cores) :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

9: In mclapply(x, "[[", j, mc.cores = cores) :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

10: In mclapply(x, "[[", j, mc.cores = cores) :

  scheduled cores 4 encountered errors in user code, all values of the jobs will be affected

11: In (S4Vectors:::coercerToClass(element.type))(v, ...) :

  NAs introduced by coercion

Any inputs appreciated ;-)

Sylvain

sgseq analyzevariants • 2.5k views
ADD COMMENT
0
Entering edit mode
@leonard-goldstein-6845
Last seen 5 months ago
Australia

Hi Sylvain,

Thanks for your question. Can you include the sessionInfo() please? 

Leonard

ADD COMMENT
0
Entering edit mode

Hi Leonard,

Here goes:

> sessionInfo()

R version 3.3.1 (2016-06-21)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 14.04.4 LTS

 

locale:

[1] C

 

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets 

[8] methods   base     

 

other attached packages:

 [1] SGSeq_1.6.11               SummarizedExperiment_1.2.3

 [3] Biobase_2.32.0             Rsamtools_1.24.0          

 [5] Biostrings_2.40.2          XVector_0.12.1            

 [7] GenomicRanges_1.24.3       GenomeInfoDb_1.8.7        

 [9] IRanges_2.6.1              S4Vectors_0.10.3          

[11] BiocGenerics_0.18.0       

 

loaded via a namespace (and not attached):

 [1] igraph_1.0.1            AnnotationDbi_1.34.4    magrittr_1.5           

 [4] zlibbioc_1.18.0         GenomicAlignments_1.8.4 BiocParallel_1.6.6     

 [7] tools_3.3.1             DBI_0.5-1               rtracklayer_1.32.2     

[10] bitops_1.0-6            RCurl_1.95-4.8          biomaRt_2.28.0         

[13] RUnit_0.4.31            RSQLite_1.0.0           GenomicFeatures_1.24.5 

[16] XML_3.98-1.4           

ADD REPLY
0
Entering edit mode
@leonard-goldstein-6845
Last seen 5 months ago
Australia

Hi Sylvain,

Thanks for reporting. It looks like the error is related to a recent bug fix in findSGVariants(). I submitted a fix – please try again with the latest version available with biocLite().

Also note that when running findSGVariants() with maxnvariant = NA (instead of the default 20), the output can include splice events with many variants (common events such as cassette exons only have two variants). 

Leonard

ADD COMMENT
0
Entering edit mode

Hi Sylvain, 

I noticed you only changed the maxnvariant argument in the example you provided  – I edited my previous answer for clarity. 

Thanks, 
Leonard

ADD REPLY
0
Entering edit mode

Hi,

Actually, NA was used only in my first tries; I played with the numbers a bit since then. Thanks for the inputs!

S

ADD REPLY
0
Entering edit mode

Hi Leonard,

Can do ;-) I'll work on this ASAP and let you know if it breaks/fails.

Best regards

S

ADD REPLY
0
Entering edit mode

Hi Leonard,

Sorry for the long delays: I reach and I admin and my analysis time has been short this fall... Ok, I installed the latest SGSeq and I got a new type of errors :-( Trying to do my analysis pipeline, it failed while doing analyzeFeatures() with annotation file with this after computing for a while:

Error in `colnames<-`(`*tmp*`, value = c("MONOCYTES_169377", "MONOCYTES_201678",  : 

  length of 'dimnames' [2] not equal to array extent

Don't know it it is specific to v. 1.8.0 since it went a-ok with an older SGSeq (1.4.x running under R 3.0.2).

Best regards

S

ADD REPLY
0
Entering edit mode

Hi Sylvain,

It's difficult to diagnose the problem based on this info, but it looks similar to one of your earlier posts from September. If you are working with a larger data set or running a genome-wide analysis, I'd try to run the individual analysis steps separately instead of using the analyzeFeatures() wrapper function. Please see my response to the earlier C: Issues with SGSeq: plotFeatures method returns NULL. When running steps individually, it may be easier to see what causes the problem and also easier to rerun parts of the analysis if the error is caused by memory issues or similar.

Leonard

ADD REPLY
0
Entering edit mode

Hi Leonard,

Cool, can do. I'll keep you posted but it might take a while.. Yo are right that these datasets are both genome-wide and multi-samples; I have n=8 to n=12 for each... BTE, and possible on a different tangent from the original question: let's say that I have done my analysis per cell type with discrete objects, what might be the better/faster way of doing pairwise comparison for downstream analysis? I always feel like a total twit asking these kind of questions on this board...

Best 

S

ADD REPLY
0
Entering edit mode

Hi Sylvain, 

OK let me know how it goes. Regarding the second question, I think you are asking how to combine multiple SGFeatureCounts objects? As long as the objects have matching rows and assays, and the colData slots have matching columns, you can simply merge them using function cbind(). However, this only works if rows are matching i.e. you must have used the same gene annotation (or the same predicted gene models) for obtaining counts. 

Best wishes

Leonard

ADD REPLY
0
Entering edit mode

Hi Leonard,

I kind of knew that it needed matching row and assays; my problem is that I have matching assays but a quick look at the rows cashes much differences... I'll have to find a way of putting missing rows in reciprocating sets with 0 values for missing info.

Thanks again for your help

S

ADD REPLY
0
Entering edit mode

Hi Sylvain,

You can only cbind SGFeatureCounts objects with identical SGFeatures (i.e. they must be based on the same gene models). If you try to merge non-compatible objects, most likely the gene models will not be valid and you will have zeros in cases where there are actually reads in your data. 

If you want to analyze multiple data sets together, you should use the same gene annotation for all data sets. If you work with predictions, you can either run the prediction across the combined set of samples from all data sets, or you can merge predictions from different data sets to arrive at one set of gene models and then obtain counts for these. In the latter case you would have to merge the predicted TxFeatures using function mergeTxFeatures() before converting to SGFeatures (see section "Advanced usage" in the vignette)

Leonard

ADD REPLY
0
Entering edit mode

Hi,

Another issue that I am getting: when running plotFeatures() for a specific gene, I have this:

> plotFeatures(sgfc.denovo,geneName="TMEM56-RWDD3",color_novel="red",include="both")

Error in strsplit(co, split = ":", fixed = TRUE) : non-character argument

Any idea on cause? I want to create a script to create the figures for the genes in a given list and this error kind of breaks it...

Best regards

S

ADD REPLY
0
Entering edit mode

Hi Sylvain,

Can you send me the SGFeatures object that causes the error? You can save it like this

sgf <- rowRanges(sgfc.denovo)
save(sgf, file = "test.RData")

and send the RData file by email – then I can have a look. 

Thanks,
Leonard

ADD REPLY
0
Entering edit mode
@leonard-goldstein-6845
Last seen 5 months ago
Australia

Hi Sylvain,

Thanks for the test data, it looks like you encountered an unusual case. The plotting function tries to plot the subset of your data, which corresponds to the gene of interest (in your case TMEM56-RWDD3). However, in your predicted gene models, the only feature that is shared with TMEM56-RWDD3 is a single splice site, and the plotting function couldn't handle this case. I submitted a fix. The function now returns NULL without making a plot (the same behavior as if you provide a gene name that cannot be found in the data). Thanks for reporting the problem. 

Leonard

ADD COMMENT

Login before adding your answer.

Traffic: 687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6