SomaticSignatures package : motifMatrix function trouble
Hello everyone, 

I am trying to use the SomaticSignatures package, which have a very complete vignette but I encounter trouble with the motifMatrix function. 

I built a vr object with my reference genome and a VCF file. Then I get a "VRanges object with 6 ranges and 70 metadata columns" and from there I don't understand how to use the motifMatrix function and which argument to prefer to build the "M matrix" of the vignette and to go further with the evaluation of the signature. 

Here is my session (I'm sorry for the length): 

> head(Caf62_motifs)
VRanges object with 6 ranges and 70 metadata columns:
      seqnames                 ranges strand         ref              alt
         <Rle>              <IRanges>  <Rle> <character> <characterOrRle>
  [1]     chr5 [149495253, 149495253]      *           T                C
  [2]     chr5 [149495287, 149495287]      *           G                C
  [3]     chr5 [149495395, 149495395]      *           T                C
  [4]     chr5 [149500397, 149500397]      *           T                C
  [5]     chr5 [149505131, 149505131]      *           A                C
  [6]     chr5 [149509270, 149509270]      *           A                G
          totalDepth       refDepth       altDepth   sampleNames
      <integerOrRle> <integerOrRle> <integerOrRle> <factorOrRle>
  [1]          10522           <NA>           <NA>          none
  [2]          10548           <NA>           <NA>          none
  [3]           2957           <NA>           <NA>          none
  [4]            220           <NA>           <NA>          none
  [5]           3874           <NA>           <NA>          none
  [6]          48870           <NA>           <NA>          none
  seqinfo: 25 sequences from GenomeA genome
  hardFilters: NULL

> Caf62_mm = motifMatrix(Caf62_motifs, normalize = TRUE)
> head(round(Caf62_mm, 4))
Would anyone have an idea to allow me to go further in this ? 

Thank you so much,



Haiying.Kong ▴ 110
I think it is because Caf62_motifs does not have any information for sample names.

So, when motifMatrix tries to sort the data by sampleNames, it cannot.


Julian Gehring ★ 1.3k
In order to construct a matrix with the counts of mutational motifs, you need the variant calls annotated with the mutational context (the context column) and a grouping variable that is also present in your VRanges object. The grouping variable then defines the columns of your mutational matrix M, and this needs to be a categorical variable with at least two unique elements (otherwise, we won't really get a matrix). The 'sampleNames' in your VRanges object only seem to have one unique entry (none), but you can choose any variable that you deem interesting and meaningful for you analysis: If your data has e.g. a 'phenotype' column, you can group by this phenotype with

motifMatrix(vr, group = "phenotype")

In the vignette, we use for example a grouping according to the tumour type as a grouping variable.

Sorry for the late response, thank you so much ! I am going to try to have a better annotated matrix. 



