Entering edit mode
Hi all,
I am trying to compile the updated genome of the model plant Arabidopsis thaliana, from TAIR10. I am using the function forgeBSgenomeDataPkgFromNCBI but I am running to the error that the data contains ambiguity characters in sequences. I used Biostrings::replaceAmbiguities() but I am not sure how to save the updated version and I don't know what to do from that point.
forgeBSgenomeDataPkgFromNCBI(assembly_accession="GCF_000001735.4", pkg_maintainer="Bruno Guillotin", organism="Arabidopsis thaliana", destdir=tempdir())
Warning in .extract_NCBI_assembly_info(assembly_accession, chrominfo, organism = organism, :
"GCF_000001735.4" is a registered NCBI assembly for organism
"Arabidopsis thaliana" --> ignoring supplied 'organism' argument
trying URL 'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_genomic.fna.gz'
Content type 'application/x-gzip' length 37482399 bytes (35.7 MB)
==================================================
downloaded 35.7 MB
Error in .local(object, con, format, ...) :
One or more strings contain unsupported ambiguity characters.
Strings can contain only A, C, G, T or N.
See Biostrings::replaceAmbiguities().
#### i did
filepath <- downloadGenomicSequencesFromNCBI("GCF_000001735.4", destdir=tempdir())
genomic_sequences <- readDNAStringSet(filepath)
genomic_sequences
genomic_sequences2 <- replaceAmbiguities(genomic_sequences , new="N")
#Then ?....
I would also like to rename the different strings of the DNAStringSet as each chromosome have names such as NC_003070.9 and not chr1, chr2 etc....
Thanks in advance and sorry if it is an obvious question. Bruno