Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.2 years ago
Dear all,
I would like to make a count table to use it in DESeq. I??ve tried to
use easyRNAseq but I have a problem with the annotation file. I???ve
downloaded the file Gallus_gallus.Galgal4.73.gtf from Ensembl. As I
run into the problem Error in .doBasicCount(obj) : The
genomicAnnotation slot is empty, I modified the file and added chr
before the chromosome number. The next problem was this:
Your gtf file: Gallus_gallus.Galgal4.73.gtf does not contain all the
required fields: gene_id, transcript_id, exon_number, gene_name.
To solve this problem:
- I deleted all the entries without gene_name (first example):
gene_id "ENSGALG00000009771"; transcript_id "ENSGALT00000015891";
exon_number "1"; gene_biotype "protein_coding"; exon_id
"ENSGALE00000301221";
gene_id "ENSGALG00000009783"; transcript_id "ENSGALT00000015914";
exon_number "2"; gene_name "GOLGB1"; gene_biotype "protein_coding";
transcript_name "GOLGB1-201"; exon_id "ENSGALE00000105891";
- I checked the chromosome numbers and deleted the entries that
didn???t match any chromosome from BSgenome.Ggallus.UCSC.galGal4 (I
can???t find any entry corresponding to chr32 in the
Gallus_gallus.Galgal4.73.gtf file, I don???t know if it is a problem):
- I searched for semicolons and single quotes ??? in the gene names,
but I didn???t find any on the final file.
- I deleted all the columns after gene_name.
So finally the annotation file entries look like this:
chr1 protein_coding exon 19962541 19963992
. + . gene_id "ENSGALG00000000003";
transcript_id "ENSGALT00000000003"; exon_number "2"; gene_name
"PANX2";
Nothing works; the error message is always the same. So, I don???t
know what else I can do. Could you please help me?
Thank you in advance!
Cheers
Natalia
here is my code:
> count.table <- easyRNASeq("/RNAseqGallus", organism="Ggallus",
chrSizes="chrSizes", annotationMethod="gtf",
annotationFile="Gallus_gallus.Galgal4.73.gtf", count="genes",
summarization="geneModels", format="bam", gapped=TRUE,
filenames=c("NS1gallus.bam","NS2gallus.bam"), outputFormat="DESeq",
conditions=conditions)
Checking arguments...
Fetching annotations...
Read 334620 records
Error en .getGtfRange(organismName(obj), filename = filename,
ignoreWarnings = ignoreWarnings, :
Your gtf file: Gallus_gallus.Galgal4.73.gtf does not contain all the
required fields: gene_id, transcript_id, exon_number, gene_name.
Adem??s: Mensajes de aviso perdidos
1: In easyRNASeq("/RNAseqGallus", organism = "Ggallus", chrSizes =
"chrSizes", :
Your organism has no mapping defined to perform the validity check
for the UCSC compliance of the chromosome name.
Defined organism's mapping can be listed using the 'knownOrganisms'
function.
To benefit from the validity check, you can provide a 'chr.map' to
your 'easyRNASeq' function call.
As you did not do so, 'validity.check' is turned off
2: In .Method(..., deparse.level = deparse.level) :
number of columns of result is not a multiple of vector length (arg
1)
> traceback()
6: stop(paste("Your gtf file: ", filename, " does not contain all the
required fields: ",
paste(fields, collapse = ", "), ".", sep = ""))
5: .getGtfRange(organismName(obj), filename = filename, ignoreWarnings
= ignoreWarnings,
...)
4: fetchAnnotation(obj, method = annotationMethod, filename =
annotationFile,
ignoreWarnings = ignoreWarnings, ...)
3: fetchAnnotation(obj, method = annotationMethod, filename =
annotationFile,
ignoreWarnings = ignoreWarnings, ...)
2: easyRNASeq("/RNAseqGallus", organism = "Ggallus", chrSizes =
"chrSizes",
annotationMethod = "gtf", annotationFile = "
Gallus_gallus.Galgal4.73.gtf ",
count = "genes", summarization = "geneModels", format = "bam",
gapped = TRUE, filenames = c("NS1gallus.bam", "NS2gallus.bam"),
outputFormat = "DESeq", conditions = conditions)
1: easyRNASeq("/RNAseqGallus", organism = "Ggallus", chrSizes =
"chrSizes",
annotationMethod = "gtf", annotationFile = "
Gallus_gallus.Galgal4.73.gtf ",
count = "genes", summarization = "geneModels", format = "bam",
gapped = TRUE, filenames = c("NS1gallus.bam", "NS2gallus.bam"),
outputFormat = "DESeq", conditions = conditions)
-- output of sessionInfo():
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
LC_TIME=Spanish_Spain.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] BSgenome.Ggallus.UCSC.galGal4_1.3.18 BSgenome_1.30.0
easyRNASeq_1.8.1 ShortRead_1.20.0
[5] Rsamtools_1.14.1 GenomicRanges_1.14.3
DESeq_1.14.0 lattice_0.20-23
[9] locfit_1.5-9.1 Biostrings_2.30.0
XVector_0.2.0 IRanges_1.20.4
[13] edgeR_3.4.0 limma_3.18.2
biomaRt_2.18.0 Biobase_2.22.0
[17] genomeIntervals_1.18.0 BiocGenerics_0.8.0
intervals_0.14.0 BiocInstaller_1.12.0
loaded via a namespace (and not attached):
[1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6
DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0
grid_3.0.2
[8] hwriter_1.3 latticeExtra_0.6-26 LSD_2.5
RColorBrewer_1.0-5 RCurl_1.95-4.1 RSQLite_0.11.4
splines_3.0.2
[15] stats4_3.0.2 survival_2.37-4 tools_3.0.2
XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0
--
Sent via the guest posting facility at bioconductor.org.