SKAT-O method using GENESIS, question about range iterator using SeqVarWindowIterator
1
0
Entering edit mode
pjuge • 0
@bcf8f511
Last seen 14 months ago
United States

I am investing rare exonic variants in WES data from cases and controls using the SKAT-O method with GENESIS. I want to perform a sliding window approach defining a Window size and shift with the following code:


# make the window iterator object
iterator <- SeqVarWindowIterator(seqData, windowSize=10000, windowShift=500, verbose=FALSE)

I have run the analysis over chromosome 11 to try. If all the window have exactly 10000 pb, the step between the different windows are different than 500 and vary from each other. Here are the 10 first windows from the results:

 #  chr start   end windows shift
1   11  188001  198000  3001
2   11  201001  211000  -8499
3   11  202501  212500  -7999
4   11  204501  214500  -4999
5   11  209501  219500  -8499
6   11  211001  221000  -8499
7   11  212501  222500  -7999
8   11  214501  224500  13001
9   11  237501  247500  23001
10  11  270501  280500  -9499

If I set a window shift of 500, why is it so different? Some regions seem not to be covered (but may be no exon in those regions? And some seem to be covered multiple times.

Thank you for your help!

GENESIS SeqVarTools • 660 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

I believe it's explained in the help page.

Details:

     Iterator classes allow for iterating filters over blocks of
     variants, ranges, or sliding windows.

     For 'SeqVarBlockIterator', each call to 'iterateFilter' will
     select the next unit of 'variantBlock' variants.

     For 'SeqVarRangeIterator', each call to 'iterateFilter' will
     select the next range in 'variantRanges'.

     'SeqVarWindowIterator' is an extension of 'SeqVarRangeIterator'
     where the ranges are determined automatically by sliding a window
     of size 'windowSize' base pairs by steps of 'windowShift' across
     the genome. Only windows containing unique sets of variants are

And the example indicates that the windows are in fact 1000 bases long with 500 base overlaps

> gds <- seqOpen(seqExampleFileName("gds"))
> seqData <- SeqVarData(gds)
> iterator <- SeqVarWindowIterator(seqData, windowSize = 1000, windowShift = 500)
# of selected variants: 2
> variantRanges(iterator)
GRanges object with 1155 ranges and 0 metadata columns:
         seqnames            ranges strand
            <Rle>         <IRanges>  <Rle>
     [1]        1   1104501-1105500      *
     [2]        1   1109501-1110500      *
     [3]        1   3537001-3538000      *
     [4]        1   3538001-3539000      *
     [5]        1   3541001-3542000      *
     ...      ...               ...    ...
  [1151]       22 43670001-43671000      *
  [1152]       22 43690001-43691000      *
  [1153]       22 43690501-43691500      *
  [1154]       22 43691001-43692000      *
  [1155]       22 48958001-48959000      *
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths
> table(width(variantRanges(iterator)))

1000
1155
> table(width(pintersect(variantRanges(iterator)[-length(variantRanges(iterator))], variantRanges(iterator)[-1])))

  0 500
838 316

Each range is exactly 1000 bp long, and the overlaps are either 500 or zero, as one might expect.

ADD COMMENT
0
Entering edit mode

The key line from the help page is "Only windows containing unique sets of variants are kept." In your case, the differing shifts between the final set of windows is because the variants in your GDS file are not uniformly distributed across the chromosome.

ADD REPLY

Login before adding your answer.

Traffic: 452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6