Help creating VRange object
3
0
Entering edit mode
@maisapinheiro12-20747
Last seen 5.5 years ago

Hello,

I am trying to run SomaticSignatures package but I can't get the input file (a VRange object) right.

In the examples given by their vignette is this:

>sca_vr = VRanges(
  seqnames = seqnames(sca_data),
  ranges = ranges(sca_data),
  ref = sca_data$Reference_Allele,
  alt = sca_data$Tumor_Seq_Allele2,
  sampleNames = sca_data$Patient_ID,
  seqinfo = seqinfo(sca_data),
  study = sca_data$study)

But when I try to use exactly the same code with my data, that I pulled from a vcf example, provided but the VariantAnnotation package, I am unable to obtain the same VRange object:

> test_range=VRanges(
+   seqnames = seqnames(vcfRange),
+   ranges = ranges(vcfRange),
+   ref = vcfRange$REF,
+   alt = vcfRange$ALT)
Error in as.vector(x, mode = "character") : 
  no method for coercing this S4 class to a vector

This is how I obtained the vcf example:

>library(VariantAnnotation) #load the package
>fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") #folder
>vcf <- readVcf(fl, "hg19") #input vcf
>vcfRange=rowRanges(vcf)

Can anyone help me to figure this out? Thanks

vcf somaticSignatures Vrange • 1.4k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 16 hours ago
United States

From ?VRanges-class

Details:

     VRanges extends GRanges to store the following components. Except
     where noted, the components are considered columns in the dataset,
     i.e., their lengths match the number of variants. Many columns can
     be stored as either an atomic vector or an Rle.

     'ref' ('character'), the reference allele. The range
          (start/end/width) should always correspond to this sequence.

     'alt' ('character/Rle'), the alternative allele (NA allowed). By
          convention there is only a single alt allele per element
          (row) of the VRanges. Many methods, like 'match', make this
          assumption.

And

> vcfRange$ALT
DNAStringSetList of length 10376
[[1]] G
[[2]] T
[[3]] A
[[4]] T
[[5]] T
[[6]] A
[[7]] C
[[8]] A
[[9]] A
[[10]] C
...
<10366 more elements>

This is a DNAStringSetList because there is more than one alternate allele for at least one location, and you can only have one alternate allele per location.

ADD COMMENT
0
Entering edit mode

Yeas... I had noticed that. There is a list instead of a vector in the ALT allele, but I didn't know why exactly. I'll try to extract this as a vector using only the first one. Thanks

ADD REPLY
0
Entering edit mode

Thank you so much. It worked with unlist(vcfRange$ALT)

ADD REPLY
0
Entering edit mode

Thank you so much. It worked with unlist(vcfRange$ALT)

ADD REPLY
0
Entering edit mode

Yes, that works in this instance, where the DNAStringSetList for the ALT allele has only one allele per position. But if that isn't true, then it doesn't work

> z <- vcfRange$ALT
> z[1] <- DNAStringSetList(c("G", "T"))
> vcfRange$ALT <- z
> test_range=VRanges(
   seqnames = seqnames(vcfRange),
   ranges = ranges(vcfRange),
   ref = vcfRange$REF,
   alt = unlist(vcfRange$ALT))
Error in validObject(.Object) : 
  invalid class "VRanges" object: 'ref' must not match 'alt'
In addition: Warning messages:
1: In ans[] <- x :
  number of items to replace is not a multiple of replacement length
2: In ans[] <- x :
  number of items to replace is not a multiple of replacement length

A (probably naive) way to get around that is to just use the first element in any list item that is longer than 1. There are most assuredly better ways of doing this, (and probably Martin Morgan or Michael Lawrence will be along shortly to show how), but a quick solution is

> ind <- lengths(vcfRange$ALT)
> ind2 <- rep(seq_len(length(ind)), ind)
> vcfRange$ALT <- unlist(vcfRange$ALT)[!duplicated(ind2)]
> test_range=VRanges(
   seqnames = seqnames(vcfRange),
   ranges = ranges(vcfRange),
   ref = vcfRange$REF,
   alt = vcfRange$ALT)
> test_range
VRanges object with 10376 ranges and 0 metadata columns:
              seqnames    ranges strand         ref              alt
                 <Rle> <IRanges>  <Rle> <character> <characterOrRle>
    rs7410291       22  50300078      *           A                G
  rs147922003       22  50300086      *           C                T
  rs114143073       22  50300101      *           G                A
  rs141778433       22  50300113      *           C                T
  rs182170314       22  50300166      *           C                T
          ...      ...       ...    ...         ...              ...
  rs187302552       22  50999536      *           A                G
    rs9628178       22  50999538      *           A                G
    rs5770892       22  50999681      *           A                G
  rs144055359       22  50999830      *           G                A
  rs114526001       22  50999964      *           G                C
                  totalDepth       refDepth       altDepth   sampleNames
              <integerOrRle> <integerOrRle> <integerOrRle> <factorOrRle>
    rs7410291           <NA>           <NA>           <NA>          <NA>
  rs147922003           <NA>           <NA>           <NA>          <NA>
  rs114143073           <NA>           <NA>           <NA>          <NA>
  rs141778433           <NA>           <NA>           <NA>          <NA>
  rs182170314           <NA>           <NA>           <NA>          <NA>
          ...            ...            ...            ...           ...
  rs187302552           <NA>           <NA>           <NA>          <NA>
    rs9628178           <NA>           <NA>           <NA>          <NA>
    rs5770892           <NA>           <NA>           <NA>          <NA>
  rs144055359           <NA>           <NA>           <NA>          <NA>
  rs114526001           <NA>           <NA>           <NA>          <NA>
              softFilterMatrix
                      <matrix>
    rs7410291                 
  rs147922003                 
  rs114143073                 
  rs141778433                 
  rs182170314                 
          ...              ...
  rs187302552                 
    rs9628178                 
    rs5770892                 
  rs144055359                 
  rs114526001                 
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
  hardFilters: NULL
ADD REPLY
0
Entering edit mode
@maisapinheiro12-20747
Last seen 5.5 years ago

Thank you so much! It worked with

unlist(vcfRange$ALT)

ADD COMMENT
0
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.5 years ago

The simplest way for getting a VRanges object from a VCF file would be to use the readVcfAsVRanges function instead of the readVcf function in your example. The two functions provide the same functionalities and parameters, but the readVcfAsVRanges takes care of a lot of the conversions steps that you currently try to do manually.

ADD COMMENT

Login before adding your answer.

Traffic: 758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6