Entering edit mode
Norman Pavelka
▴
190
@norman-pavelka-1214
Last seen 10.4 years ago
Hi Lingsheng,
On 15 Nov 2005, at 19:05, Lingsheng Dong wrote:
> Hi, Norman,
>
> Nice to see you are doing the similar project as I am doing.
>
> Another bug I found was in the function "get.RNA.ID":
> get.RNA.IDs <- function(x) {
> reg <- regexpr("(Hs#|NM)[^[:blank:]|]+", x)
> r <- substr(my.entries$headers, reg, reg + attr(reg,
"match.length")
> -1)
> return(r)
> }
> I am not sure how to correct it yet. But it couldn't get ID for
> sequences without a "NMxxxxxx" ID in the header.
I won't call that a bug. You simply have to change the regular
expression in order to match the IDs you have in your particular FASTA
file.
I'm using the following function that simply gets the first string it
encounters after the ">" sign in a FASTA header and strips away the
space character after the string as well as all other characters that
come after the space character. In this way you will get any ID
regardless of how it begins with... You only have to check if the
space
character is OK also in your situation, or if another separator would
be more appropriate. Oftern "|" or ";" signs are used to subdivide
different pieces of information in a FASTA header.
get.transcript.ids <- function(x) {
tmpstring <- sub("^>","",x)
tmpstring <- sub(" .+","",tmpstring)
return(tmpstring)
}
> Still another problem you may want consider:
> The "matchprobes" function gives all possible matches. In my case, a
> lot of probes match hundreds of target sequences. It means there
will
> be too many crossing hybredization probes if you put all probes
> matching a target sequence into one probe set.
> I couldn't find a ready to use funciton to solve this problem yet. I
> am thinking to export the matching result into a database software
and
> manually delete crossing hybridezaiton probes.
> Not sure if this a quick solution.
> Hope you can give some suggetion.
I also thought of that problem, but Laurent Gautier already gave some
clues in his BMC Bioinformatics paper on how to handle this situation.
Though I still didn't try, I guess that everything could be done very
quickly inside R, without the need of exporting into an external
database. If you like, I can share with you my experience, as soon as
I
have done some trials...
> Thanks.
> LIngsheng
Good luck!
Norman