Hi,
I got over my errors using SGSeq's analyzeFeatures() but I am stumbling on something else (or is it?). Following Leonard's suggestions, I did this:
seqlevels_in_bam <- seqlevels(BamFile(data.r$file_bam[1])) sgfdb <- keepSeqlevels(sgfdb, seqlevels_in_bam)
But I get this warning message:
#Warning message: #In keepSeqlevels(sgfdb, seqlevels_in_bam) : # invalid seqlevels'chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr14_KI270722v1_random', 'chr14_KI270723v1_random', 'chr14_KI270724v1_random', 'chr14_KI270725v1_random', 'chr15', 'chr16', 'chr17', 'chr17_KI270729v1_random', 'chr17_KI270730v1_random', 'chr18', 'chr19', 'chr1_KI270707v1_random', 'chr1_KI270708v1_random', 'chr1_KI270709v1_random', 'chr1_KI270710v1_random', 'chr2', 'chr20', 'chr21', 'chr22', 'chr22_KI270732v1_random', 'chr22_KI270735v1_random', 'chr22_KI270736v1_random', 'chr22_KI270737v1_random', 'chr22_KI270738v1_random', 'chr22_KI270739v1_random', 'chr2_KI270715v1_random', 'chr2_KI270716v1_random', 'chr3', 'chr4', 'chr5', 'chr5_GL000208v1_random', 'chr6', 'chr7', 'chr8', 'chr9', 'chr9_KI270717v1_random', 'chr9_KI270718v1_random', 'chr9_KI270719v1_random', 'chr9_KI270720v1_random', 'chrEBV', 'chrM', 'chrUn_GL000216v2', 'chrUn_GL000226v1', 'chrUn_KI270302v1', 'chrUn_KI270303v1', 'chrUn_KI270304v1', 'chrUn_KI270305v1', 'chrUn_KI270310v1', 'chrUn_KI270311v1', 'chrUn [... truncated]
Still related to the original issues of non-present chromosomes in my BAM files? After this, I try the following:
sgfc_ncbi<-analyzeFeatures(data.r,features=sgfdb,cores=4) sgfc_ncbi # class: SGFeatureCounts # dim: 1555 24 # metadata(0): # assays(2): counts FPKM # rownames: NULL # rowData names(0): # colnames(24): MONOCYTES_169377 MONOCYTES_201678 ... puit1_984968 # puit1_999046 # colData names(6): sample_name file_bam ... frag_length lib_size lmna <- plotFeatures(sgfc_ncbi,geneID=4000) lmna # NULL
So the plotFeatures()
method seems to fail... Any idea? FYI, I am running the latest R-3.3.1/SGSeq combo.
Best regards
S
Hi,
Sorry for the late answer: I was out and offline for a while... OK, here goes: I did this bit of code:
> txdb<-importTranscripts("/path/toward/Annotation/used/for/alignment/genes.gtf")
> txdb<-convertToTxFeatures(txdb)
seqlevelsStyle(txdb) <- "NCBI"
> sgfdb<-convertToSGFeatures(txdb)
# See C: Issues with SGSeq: analyzeFeatures() returns errors on BAM files
> seqlevels_in_bam <- seqlevels(BamFile(data.r$file_bam[1]))
# See A: Issues with SGSeq: plotFeatures method returns NULL
> seqlevels_in_bam <- intersect(seqlevels_in_bam, seqlevels(sgfdb))
> sgfdb <- keepSeqlevels(sgfdb, seqlevels_in_bam
I get no more warnings... I than do this:
> sgfc_ncbi<-analyzeFeatures(data.r,features=sgfdb,cores=8)
> plotFeatures(sgfc_ncbi,geneName="4000")
NULL
> plotFeatures(sgfc_ncbi)
Error in getExonCoordinates(g, toscale) :
features must be on the same chromosome and strand
How can I see what I can use from sgfc_ncbi?
Thanks in advance
S
Hi Sylvain,
You need to provide a gene name that can be found in your annotation. If you import annotation from GTF format, by default transcript names and gene names are obtained from GTF attribute tags 'transcript_id' and 'gene_id' respectively. If you are unsure you can print the SGFeatures object using e.g.
> sgfdb
or
> rowRanges(sgfc_ncbi)
Column 'geneName' contains the gene names for your annotation, and these can be used with the 'geneName' argument in plotFeatures(). I hope this helps
Leonard
Hi,
Apologies for the lengthy delay at feedback... School is back up and I had a pretty heavy teaching agenda :-) Ok, I went back on what I got and came back kind of puzzled... The seqlevels_in_bam line above seems to be creating some issues: when I perform the intersect with its outcome, I only get 35 locations, mostly of the type chrz_blablabla_random or chrUn_blablabla. So it seems that I did my analysis only on the ranges that were + not + in my BAM files. If I try to use dropSeqlevels instead of keepSeqlevels, I get this error:
sgfc_ncbi<-analyzeFeatures(data.r,features=sgfdb,cores=4)
Obtain counts...
Error:
Error in getSGFeatureCountsPerSample for MONOCYTES_169377:
Error in .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file), :
seqlevels(param) not in BAM header:
seqlevels: '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y', 'MT', 'chr1_GL383518v1_alt', 'chr1_GL383519v1_alt', 'chr1_GL383520v2_alt', 'chr1_KI270759v1_alt', 'chr1_KI270761v1_alt', 'chr1_KI270762v1_alt', 'chr1_KI270763v1_alt', 'chr1_KI270765v1_alt', 'chr1_KI270766v1_alt', 'chr1_KI270892v1_alt', 'chr2_GL383521v1_alt', 'chr2_GL383522v1_alt', 'chr2_GL582966v2_alt', 'chr2_KI270767v1_alt', 'chr2_KI270768v1_alt', 'chr2_KI270769v1_alt', 'chr2_KI270770v1_alt', 'chr2_KI270772v1_alt', 'chr2_KI270773v1_alt', 'chr2_KI270774v1_alt', 'chr2_KI270776v1_alt', 'chr2_KI270893v1_alt', 'chr2_KI270894v1_alt', 'chr3_GL383526v1_alt', 'chr3_JH636055v2_alt', 'chr3_KI270777v1_alt', 'chr3_KI270779v1_alt', 'chr3_KI270780v1_alt', 'chr3_KI270782v1_a
In addition: Warning message:
In mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule, :
24 function calls resulted in an error
I apologize if these questions seem totally newbie, but I always find that some R/Bioconductor operations are rather counterintuitive for biologists... Which is why I am doing it to teach other members in my lab ;-)
Best regards
S
Hi Sylvain,
The seqlevels (or chromosome names) in your annotation have to match the reference sequence names in your BAM files. Can you post the output for the following commands:
Thanks,
Leonard
Hi,
Well, I finally got it sorted out: I did make a mistake in the creation of my txdb object by setting its style to NCBI. This made all my chrZ entries to be turned into Z only, therefore the mistake... Duh! Switching to UCSC fixed the problem. Just to be sure not to repeat these mistakes, I wrote down the recipe in our lab's wiki.
I started two analysis (two different types of stimulations), one turned out ok, the other one through me a new error after quite some time:
Error in `colnames<-`(`*tmp*`, value = c("puit1_169377", "puit1_201678", :
length of 'dimnames' [2] not equal to array extent
Calls: analyzeFeatures ... getSGFeatureCounts -> makeSGFeatureCounts -> colnames<- -> colnames<-
What did I did wrong this time?
Thanks in advance
S