I keep running into a problem while using customProDB to generate a database for mm10 genome. At first it gives a warning during preparation annotation:
Warning message: In .extract_cds_locs_from_UCSC_txtable(ucsc_txtable) : UCSC data anomaly in 119 transcript(s): the cds cumulative length is not a multiple of 3 for transcripts ‘NM_011633’ ‘NM_198024’ ‘NM_001160424’ ‘NM_009268’ ‘NM_001190454’ ‘NM_001290729’ ‘NM_025576’ ‘NM_001177397’ ‘NM_001081960’ ‘NM_010974’ ‘NM_001128086’ ‘NM_001142737’ ‘NM_001289428’ ‘NM_001267808’ ‘NM_001301307’ ‘NM_001109684’ ‘NM_021466’ ‘NM_025988’ ‘NM_016901’ ‘NM_001347054’ ‘NM_011261’ ‘NM_001142760’ ‘NM_011022’ ‘NM_008848’ ‘NM_024470’ ‘NM_010707’ ‘NM_001346422’ ‘NM_001301034’ ‘NM_001301737’ ‘NM_010039’ ‘NM_008264’ ‘NM_010646’ ‘NM_001347053’ ‘NM_001206926’ ‘NM_001177396’ ‘NM_009046’ ‘NM_207683’ ‘NM_146484’ ‘NM_001277980’ ‘NM_001114347’ ‘NM_001277958’ ‘NM_001130175’ ‘NM_001277959’ ‘NM_144531’ ‘NM_181398’ ‘NM_001177416’ ‘NM_001033980’ ‘NM_001358490’ ‘NM_008653’ ‘NM_00 [... truncated]
Then when i ignore the message and run the easyrun function it gives me this error:
Calculate RPKMs and Output proteins pass the cutoff into FASTA file ... Error in keepSeqlevels(anno, seqlevels(galn), pruning.mode = "coarse") : invalid seqlevels: NC_000067.7, NT_166280.1, NT_166281.1, NT_166282.1, NT_162750.1, NW_023337852.1, NT_166338.1, NC_000068.8, NC_000069.7, NC_000070.7, NT_187055.1, NC_000071.7, NT_187056.1, NT_187057.1, NT_187058.1, NT_166438.1, NT_187059.1, NC_000072.7, NC_000073.7, NT_166307.1, NC_000074.7, NC_000075.7, NC_000076.7, NC_000077.7, NC_000078.7, NC_000079.7, NC_000080.7, NC_000081.7, NC_000082.7, NC_000083.7, NC_000084.7, NC_000085.7, NC_000086.8, NT_165789.3, NC_000087.8, NT_187060.1, NT_187061.1, NT_187062.1, NT_187063.1, NT_166451.1, NT_166462.1, NT_166465.1, NT_166466.1, NT_166467.1, NT_166469.1, NT_166474.1, NT_166476.1, NT_166478.1, NT_166443.1, NT_166444.1, NT_166480.1, NT_166456.1, NT_166471.1, NT_166473.1, NT_166454.1, NT_166463.1, NT_166450.1, NT_166452.1, NT_187064.1, NW_023337853.1, NC_005089.1
I've searched everywhere and still cannot figure out how to overcome it, it is an urgent matter so if you would please help me i would be really grateful.
Thank you all in advance
It is impossible to help if you simply provide the error messages. Unless we know what you are doing, what functions you are using, and how you are calling those functions, the only thing we can say is probably not that helpful.
What I can say is that
keepSeqlevels
has two arguments that matter here. The first is the object that you want to reduce, and the second is the set of seqlevels that you want to remain. The error is telling you that you are asking to remove a bunch of seqlevels that don't exist in the object you are trying to subset. The seqlevels that you are trying to remove are not what I would normally consider to be seqlevels (which are usually chromosome or scaffold/haplotype names), but instead are NCBI RefSeq IDs, which are genes rather than chromosomes.