Hello Everyone
I am using isoformswitchinganalysisR and I I have stuck in step of Create switchAnalyzeRlist. According to warning error it looks I have problem in gtf file. Really appreciate anybody help how can I find gtf file contain phaplotyps info """You need to supply the <Ensembl_version>.chr_patch_hapl_scaff.gtf file - NOT the <Ensembl_version>.chr.gtf"""
Here it is the error output
""" For mor
Create switchAnalyzeRlist
aSwitchList <- importRdata(
- isoformCountMatrix = Quant$counts,
- isoformRepExpression = Quant$abundance,
- designMatrix = myDesign,
- isoformExonAnnoation = "../../Kallisto/GCF_016699485.2_bGalGal1.mat.broiler.GRCg7b_genomic.gtf",
- isoformNtFasta = "../../Kallisto/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.cdna.all.fa",
- fixStringTieAnnotationProblem = TRUE,
- showProgress = FALSE
- ) Step 1 of 7: Checking data... Step 2 of 7: Obtaining annotation... importing GTF (this may take a while)... Error in importRdata(isoformCountMatrix = Quant$counts, isoformRepExpression = Quant$abundance, : The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). Either isforoms found in the annotation are not quantifed or vise versa. Specifically: 44937 isoforms were quantified. 85704 isoforms are annotated. Only 0 overlap. 44937 isoforms quantifed had no corresponding annoation
This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.
If there is no overlap (as in zero or close) there are two options: 1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files). 2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments. Examples from expression matrix are : ENSGALT00010031805.1, ENSGALT00010003870.1, ENSGALT00010064047.1 Examples of annoation are : XM_046899296.1, XR_005859820.2, XM_015278322.4 Examples of isoforms which were only found im the quantification are : ENSGALT00010063466.1, ENSGALT00010031621.1, ENSGALT00010000926.1
If there is a large overlap but still far from complete there are 3 possibilites: 1) The files do not fit together (e.g different databases versions etc.) (no fix except using propperly paired files). 2) If you are using Ensembl data you have supplied the GTF without phaplotyps. You need to supply the <Ensembl_version>.chr_patch_hapl_scaff.gtf file - NOT the <Ensembl_version>.chr.gtf 3) One file could contain non-chanonical chromosomes while the other do not (might be solved using the 'removeNonConvensionalChr' argument.) 4) It is somthing to do with how a subset of the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments. """
Definitely, you are right. I Checked the ref file (cDNA fasta and gtf file ) gene Id is different. I downloaded both gtf and cDNA fasts file from same source
Kinldy, how can I solve this problem? does it works if change ENSGA" to XM_ " in fasts file ?
Best regards
This first lines of fast file
I just double checked and the fasta files found here should work. Please also take a look at this if you want to use Ensembl.
Ok, have you looked cDNA fasta file for Chicken (maternal Broiler) Gallus gallus? I am using Kallisto for alignment and quantification. Here, I should use cDNA fasta.
Oh sorry I missed the species. Yes they also match for chicken :-) (both being called "ENSGALT..."). The other is from NCBI
Great... now it is same. but still I have problem with four possibilities during using IsoformSwitchAnalyzeR. Knidly, any idea where is the problem?
aSwitchList <- importRdata( isoformCountMatrix = Quant$counts, isoformRepExpression = Quant$abundance, designMatrix = myDesign, isoformExonAnnoation = "../../Kallisto/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.107.chr.gtf.gz", isoformNtFasta = "../../Kallisto/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.cdna.all.fa.gz", removeNonConvensionalChr = TRUE, fixStringTieAnnotationProblem = TRUE, ignoreAfterBar = TRUE, ignoreAfterSpace = TRUE, ignoreAfterPeriod = TRUE )
Step 2 of 7: Obtaining annotation... importing GTF (this may take a while)... Error in importRdata(isoformCountMatrix = Quant$counts, isoformRepExpression = Quant$abundance, : The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925). Either isforoms found in the annotation are not quantifed or vise versa. Specifically: 44937 isoforms were quantified. 44362 isoforms are annotated. Only 44362 overlap. 575 isoforms quantifed had no corresponding annoation
This combination cannot be analyzed since it will cause discrepencies between quantification and annotation thereby skewing all analysis.
If there is no overlap (as in zero or close) there are two options: 1) The files do not fit together (e.g. different databases, versions, etc) (no fix except using propperly paired files). 2) It is somthing to do with how the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments. Examples from expression matrix are : ENSGALT00010007474, ENSGALT00010044269, ENSGALT00010044286 Examples of annoation are : ENSGALT00010021236, ENSGALT00010047725, ENSGALT00010061756 Examples of isoforms which were only found im the quantification are : ENSGALT00010004235, ENSGALT00010003686, ENSGALT00010000320
If there is a large overlap but still far from complete there are 3 possibilites: 1) The files do not fit together (e.g different databases versions etc.) (no fix except using propperly paired files). 2) If you are using Ensembl data you have supplied the GTF without phaplotyps. You need to supply the <Ensembl_version>.chr_patch_hapl_scaff.gtf file - NOT the <Ensembl_version>.chr.gtf 3) One file could contain non-chanonical chromosomes while the other do not (might be solved using the 'removeNonConvensionalChr' argument.) 4) It is somthing to do with how a subset of the isoform ids are stored in the different files. This problem might be solvable using some of the 'ignoreAfterBar', 'ignoreAfterSpace' or 'ignoreAfterPeriod' arguments.
That sounds like you have the wrong version of the annotation compared to which version you used for the quantification?