Question

extractUpstreamSeqs with custom genome, no gene_id shown

0

Entering edit mode

wd ▴ 30

@wd-7410

Last seen 4.7 years ago

Germany

Hi

I'm trying to use extractUpstreamSeqs from the GenomicsFeatures package with a custom genome using the following commands:

txdb <- makeTxDbFromGFF("test.gff3",format="gff3")
genome<-FaFile("test_genome.fasta", index=sprintf("%s.fai", "test_genome.fasta"))
open(genome)
promoter_transcripts<-extractUpstreamSeqs(genome, transcripts(txdb), width=500)

I get an ouput like this (yellow shaded):

A DNAStringSet instance of length 17660
        width seq                                                    names               
    [1]   500 TCTCTTTTTTTTCTCCTTCTACTGAC...AATGAATGGAGAAGAAAAAATTTAA scaffold_1
    [2]   500 CTTTATATTAGTAAATGAATAATAAC...AAAAGAGATCCATACTTTTCAAGAA scaffold_1
    [3]   500 AGTCAAAGGTTATCTTCTTGGGAACT...ATAGGCTTTGTTGGATGCCTTATTA scaffold_1
    [4]   500 AATCTTTTTTTGCAAAGCAGCCAATA...CCTAACCTAACCTCTTTGTAAAACA scaffold_1
    [5]   500 TCTTCAGCTTACAACCATCTTCTAAT...TTGATTTGTTGATTATTTATCCCTT scaffold_1

but without gene/transcript_ids and with only the scaffold IDs being shown (see also mcols output)

mcols(promoter_transcripts)

DataFrame with 17660 rows and 3 columns
      gene_seqnames gene_strand  gene_TSS
              <Rle>       <Rle> <integer>
1        scaffold_1           +      6883
2        scaffold_1           +     23525
3        scaffold_1           +     29225
4        scaffold_1           +     31774
5        scaffold_1           +     38937

I would like to keep the geneIDs linked to the extracted upstream sequences

Any help would be much appreciated!

W

below you can find the txdb information:

# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: test.gff3
# Organism: NA
# Taxonomy ID: NA
# miRBase build ID: NA
# Genome: NA
# transcript_nrow: 17660
# exon_nrow: 66022
# cds_nrow: 62514
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2016-04-14 12:03:14 +0200 (Thu, 14 Apr 2016)
# GenomicFeatures version at creation time: 1.22.13
# RSQLite version at creation time: 1.0.0
# DBSCHEMAVERSION: 1.1

genomicranges txdb genomicfeatures featureextraction • 1.2k views

ADD COMMENT • link updated 8.6 years ago by Martin Morgan 25k • written 8.6 years ago by wd ▴ 30