Hello all,
I'm trying to build a custom CDF (or more precisely, a pdInfo package for use with the oligo
R package) for the HTA-2.0 array. Since Brainarray has discontinued creating such files and clear tutorials for building them seem scarce, I'm seeking guidance on using an updated transcript annotation based an updated GTF file.
I have seen a related post on the Bioconductor support site:
Annotation package generation for HTA-2_0 by pdInfoBuilder fails
It discusses the creation of custom CDFs but does not appear to utilize a GTF file for updating transcript annotations.
Files I have downloaded from the ThermoFisher website:
HTA-2_0.r3.clf
HTA-2_0.r3.pgf
HTA-2_0.r3.Psrs.mps
HTA-2_0_MappingFile.r1.map
HTA-2_0.na36.hg19.probeset.csv
HTA-2_0.r3.na36.hg19.a1.transcript.csv
I'm considering using the GTF available from GENCODE Genes.
Which GTF file would be the best to download for this purpose?
- Basic gene annotation
- Comprehensive gene annotation
Could anyone please advise if there is a step-by-step tutorial (preferably from Brainarray https://www.biostars.org/p/49557/) that explains how to:
- Generate a custom CDF using a GTF file for updated annotations?
- Convert the GTF to the appropriate format required by the pdInfoBuilder or similar tools?
I would also appreciate input on the following:
- Should probes be remapped to a newer genome build as part of the process? If so, how this should be done in R or using CLI?
I think if we could make such CFDs it would help others too, for example when you want compare array data with RNA-seq and need to map data using the same annotations.
Any guidance would be much appreciated.
Thanks in advance for your help!