Entering edit mode
SimonNoël
▴
450
@simonnoel-3455
Last seen 10.4 years ago
Thank you. Sorry for the delay. As I say, on my computer, thing
run really
slowy... I have no error but I have some warning and no result...
library("IRanges")
library("GenomicRanges")
library("GenomicFeatures")
#?? changer si une version plus r??cente de la librairie est
t??l??charg??e.
library(SNPlocs.Hsapiens.dbSNP.20101109)
> getSNPcount()
ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8
ch9 c
h10
1849438 1936836 1613418 1613633 1453710 1446827 1335745 1243129
995075
1158707
ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18
ch19 c
h20
1147722 1105364 815729 740129 657719 757926 641905 645646
520666
586708
ch21 ch22 chX chY chMT
338254 331060 529608 67438 624
> ch22snps <- getSNPlocs("ch22")
> ch22snps[1:5, ]
RefSNP_id alleles_as_ambig loc
1 56342815 K 16050353
2 7288968 S 16050994
3 6518357 M 16051107
4 7292503 R 16051209
5 6518368 Y 16051241
>
>
> makeGRangesFromRefSNPids <- function(myids)
+ {
+ ans_seqnames <- character(length(myids))
+ ans_seqnames[] <- "unknown"
+ ans_locs <- integer(length(myids))
+ for (seqname in names(getSNPcount())) {
+ locs <- getSNPlocs(seqname)
+ ids <- paste("rs", locs$RefSNP_id, sep="")
+ myrows <- match(myids, ids)
+ ans_seqnames[!is.na(myrows)] <- seqname
+ ans_locs[!is.na(myrows)] <- locs$loc[myrows]
+ }
+ GRanges(seqnames=ans_seqnames,
+ IRanges(start=ans_locs, width=1),
+ RefSNP_id=myids)
+ }
>
>
> myids <- c("rs7547453", "rs2840542", "rs1999527",
"rs4648545",
"rs10915459", "rs16838750", "rs12128230", "rs999999999")
> mysnps <- makeGRangesFromRefSNPids(myids)
Warning message:
In ans_locs[!is.na(myrows)] <- locs$loc[myrows] :
number of items to replace is not a multiple of replacement
length
> mysnps # a GRanges object with 1 SNP per row
GRangeswith 8 ranges and 1 elementMetadata value
seqnames ranges strand | RefSNP_id
<rle> <iranges> <rle> | <character>
[1] ch1 [2195117, 2195117] * | rs7547453
[2] ch1 [2291680, 2291680] * | rs2840542
[3] ch1 [3256108, 3256108] * | rs1999527
[4] ch1 [3577321, 3577321] * | rs4648545
[5] ch1 [4230463, 4230463] * | rs10915459
[6] ch1 [4404344, 4404344] * | rs16838750
[7] ch1 [4501911, 4501911] * | rs12128230
[8] unknown [ 0, 0] * | rs999999999
seqlengths
ch1 unknown
NA NA
> txdb <- makeTranscriptDbFromUCSC(genome="hg19",
tablename="refGene")
Downloadthe refGene table ... OK
Downloadthe refLink table ... OK
Extractthe 'transcripts' data frame ... OK
Extractthe 'splicings' data frame ... OK
Downloadand preprocess the 'chrominfo' data frame ... OK
Preparethe 'metadata' data frame ... OK
Makethe TranscriptDb object ... OK
Il y a eu 50 avis ou plus (utilisez warnings() pour voir les 50
premiers)
> txdb
TranscriptDbobject:
| Db type: TranscriptDb
| Data source: UCSC
| Genome: hg19
| UCSC Table: refGene
| Type of Gene ID: Entrez Gene ID
| Full dataset: yes
| transcript_nrow: 38098
| exon_nrow: 230201
| cds_nrow: 204683
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2011-01-06 11:55:44 -0500 (Thu, 06 Jan 2011)
| GenomicFeatures version at creation time: 1.2.3
| RSQLite version at creation time: 0.9-4
| DBSCHEMAVERSION: 1.0
>
> #voir pour exonsBy() ?? la place de tx
> txExon = exonsBy(txdb, by= "tx")
> txExon
GRangesListof length 38098
$1
GRanges with 25 ranges and 3 elementMetadata values
seqnames ranges strand
| exon_id exon_name exon_rank
<rle> <iranges> <rle> | <integer>
<character>
<integer>
[1] chr1 [66999825, 67000051] + | 1
NA
1
[2] chr1 [67091530, 67091593] + | 2
NA
2
[3] chr1 [67098753, 67098777] + | 3
NA
3
[4] chr1 [67101627, 67101698] + | 4
NA
4
[5] chr1 [67105460, 67105516] + | 5
NA
5
[6] chr1 [67108493, 67108547] + | 6
NA
6
[7] chr1 [67109227, 67109402] + | 7
NA
7
[8] chr1 [67126196, 67126207] + | 8
NA
8
[9] chr1 [67133213, 67133224] + | 9
NA
9
... ... ... ... ... ...
...
...
[17] chr1 [67155873, 67155999] + | 17
NA
17
[18] chr1 [67161117, 67161176] + | 18
NA
18
[19] chr1 [67184977, 67185088] + | 19
NA
19
[20] chr1 [67194947, 67195102] + | 20
NA
20
[21] chr1 [67199431, 67199563] + | 21
NA
21
[22] chr1 [67205018, 67205220] + | 22
NA
22
[23] chr1 [67206341, 67206405] + | 23
NA
23
[24] chr1 [67206955, 67207119] + | 24
NA
24
[25] chr1 [67208756, 67210768] + | 25
NA
25
...
<38097 more elements>
seqlengths
chr1 chr2 ...
chr18_gl000207_random
249250621 243199373 ...
4262
>
> tx <- transcripts(txdb, columns=c("tx_id", "tx_name", "gene_id"))
> tx # a GRanges object with 1 transcript per row
GRangeswith 38098 ranges and 3 elementMetadata values
seqnames ranges strand | tx_id
tx_name
<rle> <iranges> <rle> | <integer>
<character>
[1] chr1 [ 69091, 70008] + | 1021
NM_001005484
[2] chr1 [323892, 328581] + | 1023
NR_028327
[3] chr1 [323892, 328581] + | 1024
NR_028322
[4] chr1 [323892, 328581] + | 1025
NR_028325
[5] chr1 [367659, 368597] + | 1022
NM_001005277
[6] chr1 [367659, 368597] + | 1026
NM_001005221
[7] chr1 [367659, 368597] + | 1027
NM_001005224
[8] chr1 [763064, 789740] + | 174
NR_015368
[9] chr1 [861121, 879961] + | 1035
NM_152486
... ... ... ... ... ...
...
[38090] chrY [27177050, 27198251] - | 18991
NM_004678
[38091] chrY [27177050, 27198251] - | 18992
NM_001002760
[38092] chrY [27177050, 27198251] - | 18993
NM_001002761
[38093] chrY [27209230, 27246039] - | 18994
NR_001525
[38094] chrY [27209230, 27246039] - | 18995
NR_002177
[38095] chrY [27209230, 27246039] - | 18996
NR_002178
[38096] chrY [27329790, 27330920] - | 18997
NR_001526
[38097] chrY [27329790, 27330920] - | 18998
NR_002179
[38098] chrY [27329790, 27330920] - | 18999
NR_002180
gene_id
<compressedcharacterlist>
[1] 79501
[2] 100133331
[3] 100132287
[4] 100132062
[5] 81399
[6] 729759
[7] 26683
[8] 643837
[9] 148398
... ...
[38090] 9083
[38091] 442867
[38092] 442868
[38093] 114761
[38094] 474150
[38095] 474149
[38096] 252949
[38097] 474152
[38098] 474151
seqlengths
chr1 chr2 ...
chr18_gl000207_random
249250621 243199373 ...
4262
>
> seqnames(mysnps) <- sub("ch", "chr", seqnames(mysnps))
> map <- as.matrix(findOverlaps(mysnps, tx))
Message d'avis :
In .local(query, subject, maxgap, minoverlap, type, select, ...) :
No seqnames from the 'query' and 'subject' were identical
> mapExon <- as.matrix(findOverlaps(mysnps, txExon))
Message d'avis :
In .local(query, subject, maxgap, minoverlap, type, select, ...) :
No seqnames from the 'query' and 'subject' were identical
>
> mapped_genes <- values(tx)$gene_id[map[, 2]]
> mapped_snps <- rep.int(values(mysnps)$RefSNP_id[map[,
1]],
elementLengths(mapped_genes))
> snp2gene <-
unique(data.frame(snp_id=mapped_snps,
gene_id=unlist(mapped_genes)))
> rownames(snp2gene) <- NULL
> snp2gene[1:4, ]
snp_id gene_id
NA <na> <na>
NA.1 <na> <na>
NA.2 <na> <na>
NA.3 <na> <na>
On our super computer, I can't update manualy the package. They
are suposed
to update automatically once per month according to
our
informactician. The result of sessionInfo() is :
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attachedbase packages:
[1] stats graphics grDevices utils datasets methods
base
otherattached packages:
[1] GenomicFeatures_1.2.3
[2] SNPlocs.Hsapiens.dbSNP.20101109_0.99.2
[3] GenomicRanges_1.2.2
[4] IRanges_1.8.7
loadedvia a namespace (and not attached):
[1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2
BSgenome_1.18.
2
[5] DBI_0.2-5 RCurl_1.5-0
RSQLite_0.9-4 rtracklayer_1.10.6
[9] tools_2.12.0 XML_3.2-0
On my computer , something stragge happen. The package get
unstalled when
I run
source([1]"http://bioconductor.org/biocLite.R")
update.packages(repos=biocinstallRepos(), ask=FALSE,
checkBuilt=TRUE)
And when I try to reinstall SNPlocs.Hsapiens.dbSNP.20101109, I now
get
an error message...
> source("[2]http://www.bioconductor.org/biocLite.R")
BioC_mirror= [3]http://www.bioconductor.org
Change using chooseBioCmirror().
> biocLite("SNPlocs.Hsapiens.dbSNP.20101109")
Using R version 2.12.0, biocinstall version 2.7.4.
InstallingBioconductor version 2.7 packages:
[1] "SNPlocs.Hsapiens.dbSNP.20101109"
Pleasewait...
Message d'avis :
In getDependencies(pkgs, dependencies, available, lib) :
package ???SNPlocs.Hsapiens.dbSNP.20101109??? is not available
But for sessionInfo, it's :
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=French_Canada.1252 LC_CTYPE=French_Canada.1252
[3] LC_MONETARY=French_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=French_Canada.1252
attachedbase packages:
[1] stats graphics grDevices utils datasets methods
base
otherattached packages:
[1] GenomicFeatures_1.2.3 GenomicRanges_1.2.2 IRanges_1.8.8
loadedvia a namespace (and not attached):
[1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.2
BSgenome_1.18.2
[5] DBI_0.2-5 RCurl_1.5-0.1
RSQLite_0.9-4
rtracklayer_1.10.6
[9] XML_3.2-0.2
Now I have a super computer that can't run the program because
of
a subscript out of bounds and a personal computer than have a
missing
library... So now nothing work:( Other problem : why
when
I install again IRanges I got a version older than your for
exemple?
Simon No??l
CdeC
References
1. http://bioconductor.org/biocLite.R
2. http://www.bioconductor.org/biocLite.R
3. http://www.bioconductor.org/