How to correlate bp and Chr columns in the corresponding RSID column?
1
@49f16e03
Last seen 3.6 years ago
Brazil
Hello dear all!
I have a summary statistic that has Chr and bp columns. However, to run the LDSC script I need that same summary with the RSID column, So I need a BiomaRT script that can correlate the Chr and bp column and give me the corresponding RSID column. Nevertheless, me and my team are struggling in using BiomaRT. Is there anyone here who knows how to do that? Please contact me :)
All the best,
Iago Junger
biomaRt
• 1.8k views
@james-w-macdonald-5106
Last seen 14 hours ago
United States
I would tend to use one of the SNPlocs
packages for this, rather than biomaRt
. As a completely contrived example,
> library(SNPlocs.Hsapiens.dbSNP144.GRCh37)
## fake GRanges - you need to use your Chr and bp columns to do this!
## also note that the chromosomes have no prepended 'chr'.
> fakeo <- GRanges(rep("1", 500), IRanges(sample(1:1e5, 500), width = 1))
## EDIT
> z <- snpsByOverlaps(SNPlocs.Hsapiens.dbSNP144.GRCh37, fakeo)
> z
UnstitchedGPos object with 6 positions and 2 metadata columns:
seqnames pos strand | RefSNP_id alleles_as_ambig
<Rle> <integer> <Rle> | <character> <character>
[1] 1 14728 * | rs547701710 M
[2] 1 15150 * | rs11803681 Y
[3] 1 17538 * | rs200046632 M
[4] 1 63643 * | rs202004563 R
[5] 1 66737 * | rs560785016 K
[6] 1 69869 * | rs548049170 W
-------
seqinfo: 25 sequences (1 circular) from GRCh37.p13 genome
## and now you can get the RSIDs from the GPos object.
> fo <- findOverlaps(fakeo, z)
> fo
Hits object with 6 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 3 1
[2] 95 2
[3] 120 6
[4] 229 5
[5] 370 3
[6] 465 4
-------
queryLength: 500 / subjectLength: 6
> mcols(fakeo)$rsid <- NA
> mcols(fakeo)$rsid[queryHits(fo)] <- mcols(z)$RefSNP_id[subjectHits(fo)]
> fakeo
GRanges object with 500 ranges and 1 metadata column:
seqnames ranges strand | rsid
<Rle> <IRanges> <Rle> | <character>
[1] 1 94944 * | <NA>
[2] 1 97983 * | <NA>
[3] 1 14728 * | rs547701710
[4] 1 56186 * | <NA>
[5] 1 53476 * | <NA>
... ... ... ... . ...
[496] 1 91756 * | <NA>
[497] 1 70297 * | <NA>
[498] 1 27187 * | <NA>
[499] 1 66576 * | <NA>
[500] 1 81208 * | <NA>
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
Given that I just faked up some positions there isn't much overlap. But you get the general idea, I hope.
Login before adding your answer.
Traffic: 618 users visited in the last hour
What have you tried? Have you tried a few coordinates in BiomaRt through the web interface?