make TxDb from list of genes
0
0
Entering edit mode
rtwest • 0
@rtwest-18709
Last seen 6.0 years ago

I am relatively new to bioinformatics however I have learned a lot from this site and can't find a solution to an issue I am having.

I am trying to create a TxDb from a certain list of genes. 

I have tried several different options to no avail. Converting the list to granges, however could never create the txdb because no meta data was ever captured, tried to create from ensembl and most recently from biomart directly.

I inputted the list. Converted  the list to ensembl Ids. Then converted to a list of characters and finally tried to create the txdb, the transcript Ids are invalid.

Long story short: How can i create a custom txdb from a certain list of genes?

Also (second question) from the list of 70 genes, almost 700 ensembl ids are generated. Not exactly sure why that is as well.

Below is my code after loading packages:

list <- c("AMOTL2",
             "ANKRD1",
             "ANLN",
             "ARHGAP29",
             "AXL",
             "NA",
             "BIRC5",
             "CCRN4L",
             "CDC20",
             "CDK6",
             "CDKN2C",
             "CENPF",
             "COL4A3",
             "CRIM1",
             "CTGF",
             "CYR61",
             "CYR61",
             "DAB2",
             "DDAH1",
             "ASAP1",
             "DLC1",
             "DUSP1",
             "DUT",
             "ECT2",
             "EMP2",
             "ETV5",
             "FGF2",
             "FLNA",
             "FSCN1",
             "FSTL1",
             "GADD45B",
             "GAS2L3",
             "GAS6",
             "GGH",
             "GKAP1",
             "GLIS2",
             "GLS",
             "HEXB",
             "HMMR",
             "AGFG2",
             "ITGB2",
             "ITGB5",
             "LHFP",
             "MACF1",
             "MARCKS",
             "MDFIC",
             "MSRB3",
             "MYO1C",
             "NDRG1",
             "PDLIM2",
             "PHGDH",
             "PMP22",
             "SCHIP1",
             "SDPR",
             "SERPINE1",
             "SERTAD4",
             "SFRS2IP",
             "SGK1",
             "SH2D4A",
             "SHCBP1",
             "SLIT2",
             "STMN1",
             "TGFB2",
             "TGM2",
             "THBS1",
             "TK1",
             "TNNT2",
             "TNS1",
             "TOP2A",
             "TSPAN3")

 

ids <- getBM(attributes="ensembl_transcript_id", filters = "hgnc_symbol", values = list, mart= ensembl)
ids
ids.c <- as.character(ids)
ids.c
yap_taz.c <- as.character(yap_taz)

txdb_YT <- makeTxDbFromBiomart(biomart="ensembl",
                               dataset="hsapiens_gene_ensembl",
                               transcript_ids=ids.c,  
                               circ_seqs=NULL,
                               host="www.ensembl.org",
                               port=80,
                               taxonomyId=NA,
                               miRBaseBuild=NA)

 

 

Download and preprocess the 'transcripts' data frame ... Error in .makeBiomartTranscripts(filter, mart, transcript_ids, recognized_attribs,  : 
  invalid transcript ids:

R maketxdbfrombiomart maketxdbfromgranges rstudio ensembl • 2.1k views
ADD COMMENT
1
Entering edit mode

Why do you want a TxDb for just a set of genes? It's simple enough to use a full sized one and subset after the fact.

ADD REPLY
1
Entering edit mode

I agree with James. No need to create a new TxDb database. You could simply subset an EnsDb database to your list of input genes (even better if you have Ensembl IDs): assuming you have your Ensembl gene IDs in a variable called ensids:

library(EnsDb.Hsapiens.v86)
edb <- filter(EnsDb.Hsapiens.v86, filter = ~ gene_id == ensids)

On that edb you can then call the same functions you would use on a TxDb (such as genes, exonsBy etc) and you would always just get the results for the genes you provided.

The package I loaded above contains annotations from Ensembl version 86, if you want more recent annotations you would want to download the EnsDb from AnnotationHub.

ADD REPLY
0
Entering edit mode

I know I am almost five years too late to this, but I was hoping to get this to work with kpPloteGene from karyoplote R but ended up realizing that datatype has to be a Txdb object.

> filterIds <- ranges.df[ranges.df$coverage == 0,]$names
> edb <- filter(EnsDb.Hsapiens.v75, filter = ~ gene_id == filterIds)
> kpPlotGenes(kp, edb, r0 = 0.2, r1 = 0.3, gene.name.cex = 0.8, data.panel = 2, gene.margin = 0, col = "darkblue", gene.names.col = "black", gene.name.position = "top", avoid.overlapping = TRUE, plot.transcripts.structure = FALSE, plot.transcripts = FALSE)

Error in data$genes: $ operator not defined for this S4 class
Show stack trace

I was wondering if we could filter the Txdb object directly?

ADD REPLY
0
Entering edit mode

Thank you, I was over thinking it and going about it the wrong way, appreciate the guidance

ADD REPLY

Login before adding your answer.

Traffic: 478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6