GenomicRanges: Find first, second and third nearest genes
2
0
Entering edit mode
ssabri ▴ 20
@ssabri-9464
Last seen 4.6 years ago

For each row of signal, I'd like to find what the nearest n genes are and their corresponding distances.

>signal
GRanges object with 295644 ranges and 1 metadata column:
           seqnames            ranges strand |      E123
              <Rle>         <IRanges>  <Rle> | <numeric>
       [1]    chr19             1-200      * |         0
       [2]    chr19           201-400      * |         0
       [3]    chr19           401-600      * |         0
       [4]    chr19           601-800      * |         0
       [5]    chr19          801-1000      * |         0
       ...      ...               ...    ... .       ...
  [295640]    chr19 59127801-59128000      * |         0
  [295641]    chr19 59128001-59128200      * |         0
  [295642]    chr19 59128201-59128400      * |         0
  [295643]    chr19 59128401-59128600      * |         0
  [295644]    chr19 59128601-59128800      * |         0
  -------
  seqinfo: 23 sequences from an unspecified genome
>genes
GRanges object with 18436 ranges and 5 metadata columns:
          seqnames    ranges strand |         gene_id      gene_type   gene_name                                                                        gene_description               Expr
             <Rle> <IRanges>  <Rle> |     <character>    <character> <character>                                                                             <character>          <numeric>
    OR4F5     chr1     69091      + | ENSG00000186092 protein_coding       OR4F5      olfactory_receptor,_family_4,_subfamily_F,_member_5_[Source:HGNC_Symbol;Acc:14825]                  0
   SAMD11     chr1    860260      + | ENSG00000187634 protein_coding      SAMD11                 sterile_alpha_motif_domain_containing_11_[Source:HGNC_Symbol;Acc:28706]  0.420768029336243
   KLHL17     chr1    895967      + | ENSG00000187961 protein_coding      KLHL17                               kelch-like_17_(Drosophila)_[Source:HGNC_Symbol;Acc:24023]   3.02720360347554
  PLEKHN1     chr1    901877      + | ENSG00000187583 protein_coding     PLEKHN1 pleckstrin_homology_domain_containing,_family_N_member_1_[Source:HGNC_Symbol;Acc:25284]  0.841240518732754
    ISG15     chr1    948803      + | ENSG00000187608 protein_coding       ISG15                             ISG15_ubiquitin-like_modifier_[Source:HGNC_Symbol;Acc:4053]   1.16774328010894
      ...      ...       ...    ... .             ...            ...         ...                                                                                     ...                ...
  MTCP1NB     chrX 154299637      - | ENSG00000182712 protein_coding     MTCP1NB                   mature_T-cell_proliferation_1_neighbor_[Source:HGNC_Symbol;Acc:35428]     4.872993207549
    MTCP1     chrX 154376212      - | ENSG00000214827 protein_coding       MTCP1                             mature_T-cell_proliferation_1_[Source:HGNC_Symbol;Acc:7423]   2.00656014504748
   RAB39B     chrX 154493874      - | ENSG00000155961 protein_coding      RAB39B                       RAB39B,_member_RAS_oncogene_family_[Source:HGNC_Symbol;Acc:16499] 0.0983656544592598
    CLIC2     chrX 154563966      - | ENSG00000155962 protein_coding       CLIC2                          chloride_intracellular_channel_2_[Source:HGNC_Symbol;Acc:2063]   1.46184283788407
    TMLHE     chrX 154899605      - | ENSG00000185973 protein_coding       TMLHE                     trimethyllysine_hydroxylase,_epsilon_[Source:HGNC_Symbol;Acc:18308]   1.34060784208875
  -------
  seqinfo: 23 sequences from hg19 genome

I've explored distanceToNearest(signal, genes, select = "all") which returns:

Hits object with 295913 hits and 1 metadata column:
           queryHits subjectHits |  distance
           <integer>   <integer> | <integer>
       [1]         1       15215 |    107265
       [2]         2       15215 |    107065
       [3]         3       15215 |    106865
       [4]         4       15215 |    106665
       [5]         5       15215 |    106465
       ...       ...         ... .       ...
  [295909]    295640       16535 |     42858
  [295910]    295641       16535 |     43058
  [295911]    295642       16535 |     43258
  [295912]    295643       16535 |     43458
  [295913]    295644       16535 |     43658
  -------
  queryLength: 295644 / subjectLength: 18436

But I'd like to expand this to several genes, not just the closest.

Any help would be much appreciated!

R GenomicRanges Granges • 2.5k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

See https://support.bioconductor.org/p/127895/#128144

ADD COMMENT
0
Entering edit mode

Great -- I'll have a look and see if I can get this to work. I will report back!

ADD REPLY
0
Entering edit mode

Hmm any ideas on this error?

>GenomicRanges:::findKNN(signal, genes, k = 2) # get the 2 nearest genes for every location in signal

Error in pc(extractList(start(starts), pwindows) - end(query), end(query) -  : 
  all the objects to combine must have the same length

ADD REPLY
0
Entering edit mode

If I swap genes and signal, it returns:

> GenomicRanges:::findKNN(genes, signal, k = 2)
IntegerList of length 18436
[[1]] integer(0)
[[2]] integer(0)
[[3]] integer(0)
[[4]] integer(0)
[[5]] integer(0)
[[6]] integer(0)
[[7]] integer(0)
[[8]] integer(0)
[[9]] integer(0)
[[10]] integer(0)
...
<18426 more elements>
ADD REPLY
0
Entering edit mode

The function seems to break at this line within GenomicRanges:::findKNN():

dist <- pc(extractList(start(starts), pwindows) - end(query), end(query) - extractList(end(ends), fwindows)) # ERRORS OUT

any help would be much appreciated!

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 1 minute ago
WEHI, Melbourne, Australia

csaw::detailRanges may also be of interest.

ADD COMMENT

Login before adding your answer.

Traffic: 960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6