Hello all,
I don't have a problem with a specific command. My problem is that I am stuck.
What I wish to do is this:
I have a list of sequences (42 with a length of app. 300-500bp) in a fasta file. I want to BLAST these in NCBI (blastn) and get a list of top-hits corresponding to my list.
(not very advanced I am completely new in R and/or other programming language)
I could (and am at the moment) doing this blast manually, however there will be many more sequences for me in the future and it feels so inefficient to do it manually when I know(!) a simple command could do this for me.
I started out following the description by Kevin Keenan here: http://rstudio-pubs-static.s3.amazonaws.com/12097_1352791b169f423f910d93222a4c2d85.html
which is very user-friendly, however he ends with the note: "For further information on how to proceed with this data structure, see the Biostringspackage in Bioconductor."
- I have tried searching for what to do next for hours now... So I decided to ask for help here:
Has anyone done the same and can guide me through it?
Will be much appreciated.
All the best,
Mathilde
What to do next depends on your specific research question and the purpose of your analysis. Regarding
Biostrings
, for learning what can be done and how to use it the best place to look would be the package web page here. Take a look at the vignettes, starting possibly with the one named "Biostrings Quick Overview".No but the result "the list of ID for best hit" is not produced at the end of K.Keenan's "guide".
I know what to do with it once I got the info :)
I have this now:
[[1]]
[[1]][[1]]
DNAMultipleAlignment with 2 rows and 324 columns
aln
[1] CTGAATCTGCGAACGGCTCCGCACACCAGTTGCAAA...GATGGCAGCGTAAGGGACATGCTATGGTTACAACGG
[2] CTGAATCTGCGTACGGCTCCGCACACCAGTTGCAAA...GATGGCAGCGTAAGGGACATGCTATGG-TACAACGG
[[1]][[2]]
DNAMultipleAlignment with 2 rows and 324 columns
aln
[1] CTGAATCTGCGAACGGCTCCGCACACCAGTTGCAAA...GATGGCAGCGTAAGGGACATGCTATGGTTACAACGG
[2] CTGAATCTGCGTACGGCTCCGCACACCAGTTGCAAA...GATGGCAGCGTAAGGGACATGCTATGG-TACAACGG
[[2]]
[[2]][[1]]
DNAMultipleAlignment with 2 rows and 307 columns
aln
[1] TTCTCTCTGAATCTGCGAACGGCTCCGCAAACCAGT...ACCTATCAACTAGACGGCAGCGTAATGGACATGCTA
[2] TTCTCTCTGAATCTGCGAACGGCTCCGCAAACCAGT...ACCTATCAACTAGACGGCAGCGTAATGGACATGCTA
[[2]][[2]]
DNAMultipleAlignment with 2 rows and 307 columns
aln
[1] TTCTCTCTGAATCTGCGAACGGCTCCGCAAACCAGT...ACCTATCAACTAGACGGCAGCGTAATGGACATGCTA
[2] TTCTCTCTGAATCTGCGAACGGCTCCGCAAACCAGT...ACCTATCAACTAGACGGCAGCGTAATGGACATGCTA
[[3]]
[[3]][[1]]
DNAMultipleAlignment with 2 rows and 519 columns
aln
[1] TGATCCTGCCAGTAGTGTATGCTTCTCCTAAAGACT...CCCTTGGGCAATGCCCGAGGGCGTTAGGGG-ACATA
[2] TGATCCTGCCAGTAGTGTATGCTTCTCCTAAAGACT...CCCTTGGGCAATGCCCGAGGGCGTTAGGGGCACATA
[[3]][[2]]
DNAMultipleAlignment with 2 rows and 502 columns
aln
[1] TATGCTTCTCCTAAAGACTAAGCCATGCATGCCTTC...CCCTTGGGCAATGCCCGAGGGCGTTAGGGG-ACATA
[2] TATGCTTCTCCTAAAGACTAAGCCATGCATGCCTTC...CCCTTGGGCAATGCCCGAGGGCGTTAGGGGCACATA
And so on....