I'm having trouble understanding the export function in rtracklayer. I have created an object of 3'-UTRs and can export them as gff. But I would like to also include the gene names in the gff-file. They are there in the column Name, but i don't know how to include this in the gff-file.
My script:
txdb = makeTxDbFromGFF(file = "transcripts.gtf", format = "gtf") txdb utr = threeUTRsByTranscript(txdb, use.names = TRUE) asGFF(utr) export(utr, "3_UTR.gff", format = "GFF")
I would like to also have the gene names under the column Name to appear in my gff-file:
> asGFF(utr) GRanges object with 14983 ranges and 7 metadata columns: seqnames ranges strand | type ID <Rle> <IRanges> <Rle> | <character> <character> [1] Supercontig_1.1 [ 8533, 8777] + | mRNA mRNA1 [2] Supercontig_1.1 [22263, 22394] + | mRNA mRNA2 [3] Supercontig_1.1 [34734, 34943] + | mRNA mRNA3 [4] Supercontig_1.1 [56043, 56381] + | mRNA mRNA4 [5] Supercontig_1.1 [70457, 70507] + | mRNA mRNA5 ... ... ... ... . ... ... [14979] Supercontig_1.994 [ 8764, 9020] - | exon exon7593 [14980] Supercontig_1.995 [23924, 24302] + | exon exon7594 [14981] Supercontig_1.995 [25442, 25477] - | exon exon7595 [14982] Supercontig_1.997 [18118, 19806] - | exon exon7596 [14983] Supercontig_1.998 [22519, 23126] - | exon exon7597 Name exon_id exon_name exon_rank Parent <character> <integer> <character> <integer> <character> [1] SARC_00001T0 <NA> <NA> <NA> <NA> [2] SARC_00003T0 <NA> <NA> <NA> <NA> [3] SARC_00004T0 <NA> <NA> <NA> <NA> [4] SARC_00008T0 <NA> <NA> <NA> <NA> [5] SARC_00011T0 <NA> <NA> <NA> <NA> ... ... ... ... ... ... [14979] <NA> 65051 <NA> 6 mRNA7382 [14980] <NA> 65078 <NA> 5 mRNA7383 [14981] <NA> 65079 <NA> 1 mRNA7384 [14982] <NA> 65115 <NA> 4 mRNA7385 [14983] <NA> 65133 <NA> 5 mRNA7386 ------- seqinfo: 7751 sequences from an unspecified genome; no seqlengths
Output of my current gff-file:
##gff-version 1 ##source-version rtracklayer 1.28.10 ##date 2016-03-03 Supercontig_1.1 rtracklayer mRNA 8533 8777 . + . Supercontig_1.1 Supercontig_1.1 rtracklayer mRNA 22263 22394 . + . Supercontig_1.1 Supercontig_1.1 rtracklayer mRNA 34734 34943 . + . Supercontig_1.1 Supercontig_1.1 rtracklayer mRNA 56043 56381 . + . Supercontig_1.1 Supercontig_1.1 rtracklayer mRNA 70457 70507 . + . Supercontig_1.1 Supercontig_1.1 rtracklayer mRNA 86426 87450 . + . Supercontig_1.1
> sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS Sierra 10.12.3 locale: [1] C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods [9] base other attached packages: [1] rtracklayer_1.32.2 GenomicFeatures_1.24.5 AnnotationDbi_1.34.4 [4] Biobase_2.32.0 GenomicRanges_1.24.3 GenomeInfoDb_1.8.7 [7] IRanges_2.6.1 S4Vectors_0.10.3 BiocGenerics_0.18.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.8 XVector_0.12.1 zlibbioc_1.18.0 [4] GenomicAlignments_1.8.4 BiocParallel_1.6.6 tools_3.3.2 [7] SummarizedExperiment_1.2.3 DBI_0.5-1 digest_0.6.10 [10] bitops_1.0-6 RCurl_1.95-4.8 biomaRt_2.28.0 [13] memoise_1.0.0 RSQLite_1.1 Biostrings_2.40.2 [16] Rsamtools_1.24.0 XML_3.98-1.5
That solved it, thanks!