How to use DECIPHER to design group-specific primers
0
0
Entering edit mode
hoonhuiyi • 0
@hoonhuiyi-23342
Last seen 4.7 years ago

Hi Im using R in DECIPHER to design group-specific primers to target a specific group out of other non target groups. Before designing the primers, according to the decipher tutorial, we need to perform the command to define groups in our sequence database. How is DECIPHER able to correctly define the groups? Should a phylogeny classification to correctly define the groups be carried out first?

Decipher • 3.5k views
ADD COMMENT
0
Entering edit mode

The vignette for DesignPrimers(), "Design Group-Specific Primers", has a section on "Defining Groups". As it says, it is up to users how they wish to define non-target groups. For example, this can be done automatically with IdClusters() or assigned manually based on a taxonomy.

ADD REPLY
0
Entering edit mode

I would like to design primers to target the accumulibacter group out of other non target bacteria groups. on the section of defining groups, How do I create one identifier for all sequences belonging to the accumulibacter group? one identifer is given to each sequence instead.

My fasta files have these sequence descriptions:

NR1234.1 Bacteria;Proteobacteria;Accumulibacter;accumulibacter phosphatis NR1234.2 Bacteria;Proteobacteria;Accumulibacter;accumulibacter uncultured NR1234.3 Bacteria;Proteobacteria;Accumulibacter;accumulibacter strain A234 NR2345.1 Bacteria; gammabacteri;Escherichia;Escherichia coli NR2345.1 Bacteria; gammabacteri;Escherichia;Escherichia pyroli

ADD REPLY
0
Entering edit mode

The sequence descriptions are stored in the description field within the sequence database. So it is possible to query that field, extract the genus name, and then use that as the identifier:

x <- dbGetQuery(dbConn, "select description from Seqs")$description
x <- strsplit(x, ";", fixed=TRUE)
x <- sapply(x, tail, n=1)
x <- strsplit(x, " ", fixed=TRUE)
x <- sapply(x, head, n=1)
Add2DB(data.frame(identifier=x, stringsAsFactors=FALSE), dbConn)
ADD REPLY
0
Entering edit mode

Do I replace the text in "select description from Seqs" with Accumulibacter? It has a syntax error when I try to do so:

x <- dbGetQuery(dbConn, "Accumulibacter")$description Error: near "Accumulibacter": syntax error

ADD REPLY
0
Entering edit mode

No, the code should work directly as written. Have you tried it?

ADD REPLY
0
Entering edit mode

Do I insert Accumulibacter in the indentifier? There is an error below:

primers <- DesignPrimers(tiles, identifier="Accumulibacter", minCoverage=1, minGroupCoverage=1)

AccumulibacterError in system(paste("hybrid-min -n DNA -t", temp, "-T", temp, "-N", : 'CreateProcess' failed to run 'C:\PROGRA~2\OLIGOA~1\bin\HYBRID~1.EXE -n DNA -t 64 -T 64 -N 0.224783720074173 -E -q TCTGTGAGCAGGAAAGC GCTTTCCTGCTCACAGA CTGTGAGCAGGAAAGCA TGCTTTCCTGCTCACAG TGTGAGCAGGAAAGCAG CTGCTTTCCTGCTCACA GTGAGCAGGAAAGCAGG CCTGCTTTCCTGCTCAC TGAGCAGGAAAGCAGGG CCCTGCTTTCCTGCTCA GAGCAGGAAAGCAGGGG CCCCTGCTTTCCTGCTC AGCAGGAAAGCAGGGGA TCCCCTGCTTTCCTGCT GCAGGAAAGCAGGGGAT ATCCCCTGCTTTCCTGC CAGGAAAGCAGGGGATC GATCCCCTGCTTTCCTG AGGAAAGCAGGGGATCG CGATCCCCTGCTTTCCT GGAAAGCAGGGGATCGC GCGATC

ADD REPLY
0
Entering edit mode

Do I insert Accumulibacter in the indentifier? There is an error below:

primers <- DesignPrimers(tiles, identifier="Accumulibacter", minCoverage=1, minGroupCoverage=1)

AccumulibacterError in system(paste("hybrid-min -n DNA -t", temp, "-T", temp, "-N", : 'CreateProcess' failed to run 'C:\PROGRA~2\OLIGOA~1\bin\HYBRID~1.EXE -n DNA -t 64 -T 64 -N 0.224783720074173 -E -q TCTGTGAGCAGGAAAGC GCTTTCCTGCTCACAGA CTGTGAGCAGGAAAGCA TGCTTTCCTGCTCACAG TGTGAGCAGGAAAGCAG CTGCTTTCCTGCTCACA GTGAGCAGGAAAGCAGG CCTGCTTTCCTGCTCAC TGAGCAGGAAAGCAGGG CCCTGCTTTCCTGCTCA GAGCAGGAAAGCAGGGG CCCCTGCTTTCCTGCTC AGCAGGAAAGCAGGGGA TCCCCTGCTTTCCTGCT GCAGGAAAGCAGGGGAT ATCCCCTGCTTTCCTGC CAGGAAAGCAGGGGATC GATCCCCTGCTTTCCTG AGGAAAGCAGGGGATCG CGATCCCCTGCTTTCCT GGAAAGCAGGGGATCGC GCGATC

ADD REPLY
0
Entering edit mode

This looks like an issue with accessing OligoArrayAux from R. What happens when you run?:

system("C:\PROGRA~2\OLIGOA~1\bin\HYBRID~1.EXE -h", intern=TRUE)
ADD REPLY
0
Entering edit mode

system("C:/PROGRA~2/OLIGOA~1/bin/HYBRID~1.EXE -h", intern=TRUE) 1] "Usage: hybrid-min [options] file1 file2"
[2] ""
[3] "Options:"
[4] "-V, --version"
[5] "-h, --help"
[6] "-n, --NA=(RNA | DNA) (defaults to RNA)"
[7] "-t, --tmin=<minimum temperature=""> (defaults to 37)"
[8] "-i, --tinc=<temperature increment=""> (defaults to 1)"
[9] "-T, --tmax=<maximum temperature=""> (defaults to 37)"
[10] "-s, --suffix=<free energy="" suffix="">"
[11] "-o, --output=<prefix>"
[12] "-N, --sodium=<[Na+] in M> (defaults to 1)"
[13] "-M, --magnesium=<[Mg++] in M> (defaults to 0)"
[14] "-p, --polymer"
[15] "-r, --prohibit=<i,j,k>"
[16] "-E, --energyOnly"
[17] "-I, --noisolate"
[18] "-z, --zip"
[19] "-F, --mfold[=<p,w,max>] (defaults to 5,*,100; W determined by sequence length)" [20] "-q, --quiet"
[21] "-c, --constraints[=<name of="" constraint="" file="">] (defaults to prefix.aux)"
[22] "-b, --basepairs=<name of="" basepairs="" file="">"
[23] ""
[24] "Obscure options:"
[25] " --allpairs"
[26] " --maxloop=<maximum bulge="" interior="" loop="" size=""> (defaults to 30)"
[27] " --nodangle"
[28] " --prefilter=<value1, value2="">"
[29] " --stream"
[30] ""
[31] "Report bugs to markhn@rpi.edu"

ADD REPLY
0
Entering edit mode

Try restarting R after installing OligoArrayAux. Also, try specifying batchSize=100 in DesignPrimers().

ADD REPLY
0
Entering edit mode

Thank you I have managed to design primers for my sample file of 100 sequences.

However, for the alignment of the SILVA SSU NR Ref database with 500,000 sequences, my Computer 8GB RAM is insufficient to align the RNA sequences:

AA <- AlignTranslation(dna, type="AAStringSet") # align the translation Error: memory exhausted (limit reached?) Any idea how much RAM would I need to upgrade or would you have the SILVA SSU aligned sequences?

ADD REPLY
0
Entering edit mode

You can download the aligned version of the SILVA database.

ADD REPLY
0
Entering edit mode

have imported the massively gigantic 25GB SILVA database and defined the groups!

at the creating tiles, The following Error message is displayed:

tiles <- TileSeqs(dbConn, add2tbl="Tiles", minCoverage=1) 0%Error: cannot allocate vector of size 49 Kb tiles <- TileSeqs(dbConn, add2tbl="Tiles", minCoverage=0.9) 0%Error: cannot allocate vector of size 49 Kb

and what is the usual recommended minCoverage &mingroupcoverage if targeting a particular genus e.g.Escherichia or a a particular kingdom like e.g. fungi, out of the 500 000 sequences of eukayota archaea and bacteria in the database.

ADD REPLY
0
Entering edit mode

How were your groups defined? Family level groups should not be too large to process.

Error: cannot allocate vector of size 49 Kb

I have never observed that error for such a small amount of memory. Could you provide the output of .traceback()?

what is the usual recommended minCoverage &mingroupcoverage if targeting a particular genus e.g.Escherichia or a a particular kingdom like e.g. fungi, out of the 500 000 sequences of eukayota archaea and bacteria in the database.

The defaults are recommended unless you have a specific reason to change them.

ADD REPLY
0
Entering edit mode

Defined the groups like this:

x <- dbGetQuery(dbConn, "select description from Seqs")$description x <- strsplit(x, ";", fixed=TRUE) x <- sapply(x, tail, n=1) x <- strsplit(x, " ", fixed=TRUE) x <- sapply(x, head, n=1) Add2DB(data.frame(identifier=x, stringsAsFactors=FALSE), dbConn)

Then created tiles:

tiles <- TileSeqs(dbConn, add2tbl="Tiles", minCoverage=0.9) | | 0%error: cannot allocate vector of size 49kb .traceback() [[1]] [1] "Codec(searchResult$sequence, processors = processors)" [[2]] [1] "SearchDB(dbConn, tblName = tblName, type = \"DNAStringSet\", identifier = identifier[k], " [2] " processors = processors, verbose = FALSE, ...)"

ADD REPLY
0
Entering edit mode

What is the distribution of group sizes? That is, how many sequences are there per identifier? For example: sort(table(x))

ADD REPLY
0
Entering edit mode

From the sort(table(x)), is about 4 sequences(e.g.centropages) to 14000(e.g.bacillus)to 28 000(e.g. uncultured) sequences per identifier.And the identifier is quite clearly seen from the table as genus to species names.

The Silva sequences that I have look like this:

AF171093.1.1786 Eukaryota;Amorphea;Obazoa;Opisthokonta;Nucletmycea;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Russulales;Hericiaceae;Hericium;Hericium coralloides MF062673.1.1434 Bacteria;Proteobacteria;Gammaproteobacteria;Burkholderiales;Comamonadaceae;Acidovorax;Acidovorax soli

Also, I would like to define group such that I can design a primer for not only targeting genus levels but also to target fungi!

ADD REPLY
0
Entering edit mode

It is difficult to investigate the issue without a reproducible example. Could you please send a reproducible example? Thanks.

ADD REPLY
0
Entering edit mode

I guess I shouldn't use such a huge database in the first place. have got some primers using smaller set of sequences

ADD REPLY
0
Entering edit mode

Hi Erik,

I have the following issues with the design probe function:

probes <- DesignProbes(tiles, identifier="Streptococcus",start=120, end=1450, batchSize=100,numProbeSets=5)

StreptococcusWarning message: In DesignProbes(tiles, identifier = "Streptococcus", start = 120, : No target sites met the specified constraints: Streptococcus

probes <- DesignProbes(tiles, identifier="Pseudomonas",start=120, end=1450, batchSize=100,numProbeSets=5)

PseudomonasWarning message: In DesignProbes(tiles, identifier = "Streptococcus", start = 120, : All target sites have too many permutations: Pseudomonas

what does it mean and how can I solve it?

ADD REPLY
0
Entering edit mode

See my reply in your other posting. Please post the same comment only once to avoid confusion.

ADD REPLY

Login before adding your answer.

Traffic: 568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6