Dear all i'd like to use iupac2meme (https://meme-suite.org/meme/doc/iupac2meme.html). So far i've been using universalmotifs() which has basicall all functions of MEME suite. Apparently except iupac2meme, which is pretty useful. I've tried to use it in bash and i get an output in MEME4 format which cannot be used with other motifs in MEME5 format with the compare_motifs() analysis, which is what i want to accomplish.
Thanks in advance for any help.
Unfortunately the package does not support regular expressions for consensus sequence input in create_motif(). However in this case, [AT] is the same as W, so you could simply do create_motif("AAAGNBGGW") to achieve the same result.
Hi Benjamin, it is still unclear to me if scan_sequences() uses a background to correct the results as in FIMO.
I am trying to replicate an analysis im doing with FIMO on core promoters (DNA seqs) using curated motifs from PlantTFDB. With universalmotifs im getting three times more motifs mapped and i would like to understand how to understand why this happens. Here following the code:
scan_sequences() works by converting your motifs, whatever format they be in, into position weight matrices (PWMs) (also known as log-odds matrices) which are constructed taking into account the background frequencies (see my "Introduction to sequence motifs" vignette). So yes, the scanning is background-corrected.
As for why Fimo is getting more hits: I'm not entirely sure. The Fimo documentation says it uses a "dynamic programming algorithm to convert log-odds scores into p-values", which is kind of vague (whereas scan_sequences() only uses log-odds scores; you can generate a scanning log-odds threshold from a p-value, and afterwards calculate a p-value from a motif hit log-odds score in scan_sequences(), but I'm not sure if that's the same thing). I would have to look through the Fimo source code to be sure.
The resulting object from scan_sequences() is rather comprehensive, so you can check the motif hits for yourself if you are curious as to how different thresholds affect motif hit sensitivity, and compare the results at different thresholds by changing the threshold option or by filtering afterwards using the score column (or the calc.pvals P-values as I see you are also calculating). It is not difficult to increase the stringency of scan_sequences() as you wish, so perhaps with the right combination you will be able to replicate your Fimo analysis.
Maybe that was not a very helpful answer, but ultimately I am limited by my meagre understanding of Fimo.
Thank you very much! You indeed clarified to me some important points. Ideally that could be cool to compare the scan_sequences() and FIMO outputs for TFs tested with Chip-seq or DAP-seq data.
You could try using runFimo from memes to compare them. My guess as to why FIMO detects more matches in your example is you're setting the --thresh argument to 0.05 which I would consider to be rather liberal unless you are scanning a small number of sequences.
Is there any way to import also consensus written like this e.g. "AAAGNBGG[AT]". Thanks in advance! Amazing package.
Is there any way to import also consensus written like this e.g. "AAAGNBGG[AT]". Thanks in advance! Amazing package.
Unfortunately the package does not support regular expressions for consensus sequence input in
create_motif()
. However in this case, [AT] is the same as W, so you could simply docreate_motif("AAAGNBGGW")
to achieve the same result.Hi Benjamin, it is still unclear to me if scan_sequences() uses a background to correct the results as in FIMO.
I am trying to replicate an analysis im doing with FIMO on core promoters (DNA seqs) using curated motifs from PlantTFDB. With universalmotifs im getting three times more motifs mapped and i would like to understand how to understand why this happens. Here following the code:
Sorry if i post the question here. Also, it would be cool if you organize a webinar on such a great tool. Just an idea!
No worries, happy to help.
scan_sequences()
works by converting your motifs, whatever format they be in, into position weight matrices (PWMs) (also known as log-odds matrices) which are constructed taking into account the background frequencies (see my "Introduction to sequence motifs" vignette). So yes, the scanning is background-corrected.As for why Fimo is getting more hits: I'm not entirely sure. The Fimo documentation says it uses a "dynamic programming algorithm to convert log-odds scores into p-values", which is kind of vague (whereas
scan_sequences()
only uses log-odds scores; you can generate a scanning log-odds threshold from a p-value, and afterwards calculate a p-value from a motif hit log-odds score inscan_sequences()
, but I'm not sure if that's the same thing). I would have to look through the Fimo source code to be sure.The resulting object from
scan_sequences()
is rather comprehensive, so you can check the motif hits for yourself if you are curious as to how different thresholds affect motif hit sensitivity, and compare the results at different thresholds by changing thethreshold
option or by filtering afterwards using the score column (or thecalc.pvals
P-values as I see you are also calculating). It is not difficult to increase the stringency ofscan_sequences()
as you wish, so perhaps with the right combination you will be able to replicate your Fimo analysis.Maybe that was not a very helpful answer, but ultimately I am limited by my meagre understanding of Fimo.
Thank you very much! You indeed clarified to me some important points. Ideally that could be cool to compare the scan_sequences() and FIMO outputs for TFs tested with Chip-seq or DAP-seq data.
You could try using
runFimo
from memes to compare them. My guess as to why FIMO detects more matches in your example is you're setting the--thresh
argument to 0.05 which I would consider to be rather liberal unless you are scanning a small number of sequences.