Entering edit mode
Hi
I'm trying to use extractUpstreamSeqs from the GenomicsFeatures package with a custom genome using the following commands:
txdb <- makeTxDbFromGFF("test.gff3",format="gff3") genome<-FaFile("test_genome.fasta", index=sprintf("%s.fai", "test_genome.fasta")) open(genome) promoter_transcripts<-extractUpstreamSeqs(genome, transcripts(txdb), width=500)
I get an ouput like this (yellow shaded):
A DNAStringSet instance of length 17660
width seq names
[1] 500 TCTCTTTTTTTTCTCCTTCTACTGAC...AATGAATGGAGAAGAAAAAATTTAA scaffold_1
[2] 500 CTTTATATTAGTAAATGAATAATAAC...AAAAGAGATCCATACTTTTCAAGAA scaffold_1
[3] 500 AGTCAAAGGTTATCTTCTTGGGAACT...ATAGGCTTTGTTGGATGCCTTATTA scaffold_1
[4] 500 AATCTTTTTTTGCAAAGCAGCCAATA...CCTAACCTAACCTCTTTGTAAAACA scaffold_1
[5] 500 TCTTCAGCTTACAACCATCTTCTAAT...TTGATTTGTTGATTATTTATCCCTT scaffold_1
but without gene/transcript_ids and with only the scaffold IDs being shown (see also mcols output)
mcols(promoter_transcripts)
DataFrame with 17660 rows and 3 columns
gene_seqnames gene_strand gene_TSS
<Rle> <Rle> <integer>
1 scaffold_1 + 6883
2 scaffold_1 + 23525
3 scaffold_1 + 29225
4 scaffold_1 + 31774
5 scaffold_1 + 38937
I would like to keep the geneIDs linked to the extracted upstream sequences
Any help would be much appreciated!
W
below you can find the txdb information:
# Db type: TxDb # Supporting package: GenomicFeatures # Data source: test.gff3 # Organism: NA # Taxonomy ID: NA # miRBase build ID: NA # Genome: NA # transcript_nrow: 17660 # exon_nrow: 66022 # cds_nrow: 62514 # Db created by: GenomicFeatures package from Bioconductor # Creation time: 2016-04-14 12:03:14 +0200 (Thu, 14 Apr 2016) # GenomicFeatures version at creation time: 1.22.13 # RSQLite version at creation time: 1.0.0 # DBSCHEMAVERSION: 1.1
Â