I am relatively new to bioinformatics however I have learned a lot from this site and can't find a solution to an issue I am having.
I am trying to create a TxDb from a certain list of genes.
I have tried several different options to no avail. Converting the list to granges, however could never create the txdb because no meta data was ever captured, tried to create from ensembl and most recently from biomart directly.
I inputted the list. Converted the list to ensembl Ids. Then converted to a list of characters and finally tried to create the txdb, the transcript Ids are invalid.
Long story short: How can i create a custom txdb from a certain list of genes?
Also (second question) from the list of 70 genes, almost 700 ensembl ids are generated. Not exactly sure why that is as well.
Below is my code after loading packages:
list <- c("AMOTL2",
"ANKRD1",
"ANLN",
"ARHGAP29",
"AXL",
"NA",
"BIRC5",
"CCRN4L",
"CDC20",
"CDK6",
"CDKN2C",
"CENPF",
"COL4A3",
"CRIM1",
"CTGF",
"CYR61",
"CYR61",
"DAB2",
"DDAH1",
"ASAP1",
"DLC1",
"DUSP1",
"DUT",
"ECT2",
"EMP2",
"ETV5",
"FGF2",
"FLNA",
"FSCN1",
"FSTL1",
"GADD45B",
"GAS2L3",
"GAS6",
"GGH",
"GKAP1",
"GLIS2",
"GLS",
"HEXB",
"HMMR",
"AGFG2",
"ITGB2",
"ITGB5",
"LHFP",
"MACF1",
"MARCKS",
"MDFIC",
"MSRB3",
"MYO1C",
"NDRG1",
"PDLIM2",
"PHGDH",
"PMP22",
"SCHIP1",
"SDPR",
"SERPINE1",
"SERTAD4",
"SFRS2IP",
"SGK1",
"SH2D4A",
"SHCBP1",
"SLIT2",
"STMN1",
"TGFB2",
"TGM2",
"THBS1",
"TK1",
"TNNT2",
"TNS1",
"TOP2A",
"TSPAN3")
ids <- getBM(attributes="ensembl_transcript_id", filters = "hgnc_symbol", values = list, mart= ensembl)
ids
ids.c <- as.character(ids)
ids.c
yap_taz.c <- as.character(yap_taz)
txdb_YT <- makeTxDbFromBiomart(biomart="ensembl",
dataset="hsapiens_gene_ensembl",
transcript_ids=ids.c,
circ_seqs=NULL,
host="www.ensembl.org",
port=80,
taxonomyId=NA,
miRBaseBuild=NA)
Download and preprocess the 'transcripts' data frame ... Error in .makeBiomartTranscripts(filter, mart, transcript_ids, recognized_attribs, :
invalid transcript ids:
Why do you want a TxDb for just a set of genes? It's simple enough to use a full sized one and subset after the fact.
I agree with James. No need to create a new
TxDb
database. You could simply subset anEnsDb
database to your list of input genes (even better if you have Ensembl IDs): assuming you have your Ensembl gene IDs in a variable calledensids
:On that
edb
you can then call the same functions you would use on aTxDb
(such asgenes
,exonsBy
etc) and you would always just get the results for the genes you provided.The package I loaded above contains annotations from Ensembl version 86, if you want more recent annotations you would want to download the
EnsDb
fromAnnotationHub
.I know I am almost five years too late to this, but I was hoping to get this to work with kpPloteGene from karyoplote R but ended up realizing that datatype has to be a Txdb object.
I was wondering if we could filter the Txdb object directly?
Thank you, I was over thinking it and going about it the wrong way, appreciate the guidance