Is it possible to match specifically lower case nucleotides (e.g. agct). When genomes are repeat-masked they can be soft-masked which results in lower case regions - which might in certain cases be of interest vs non-masked regions.
Example:
>random
AGAGTAGTagtAGT
Can Biostrings account for this or is everything automatically converted to upper case under the hood for convenience?
DNAString and DNAStringSet objects in Biostrings don't keep track of the case.
Note that we provide "masked genomes" for some organisms (e.g. BSgenome.Hsapiens.UCSC.hg38.masked) where the chromosome sequences have various masks on them (e.g. RepeatMasker mask, but not only). You can use that if you need string matching tools like matchPattern() to ignore the masked regions.
Another approach is to use BString/BStringSet objects instead of DNAString/DNAStringSet objects. Unlike the latter, the former preserve the case. (The BStringSet container is the general purpose string container in Biostrings so is analog to an ordinary character vector in base R.) Note that some matchPattern functionalities specific to DNAString/DNAStringSet objects won't work with BString/BStringSet objects (e.g. fixed=FALSE).