Thank you for putting together a nice package. I'm looking forward to the v4 release.
In the meantime, I'd like to confirm exactly what MetaPhlAn database was used in cMD 3. It looks like cMD 3 used MetaPhlAn v3.0, and based on this Github page, I assume this means mpa_v30_CHOCOPhlAn_201901. Can you please confirm?
Hi Lev, you are correct. Here are additional details about pre-processing of curatedMetagenomicData v3 in case useful:
MetapPhlAn version was 3.0. HUMAnN3 was version v3.0.0.alpha.3.
We didn't run any preprocessing, as we simply downloaded the data from NCBI. Data which originated in our lab have been preprocessed with a pipeline similar to Kneaddata, but there is no general preprocessing method adopted for the studies included as a whole (we rely on the original authors for this, including ourselves).
We run metaphlan and humann with default settings always. HUMAnN was run taking as metaphlan-profile the above-described profile, and adding the param --metaphlan-options -t rel_ab --index mpa_v30_CHOCOPhlAn_201901, the protein database was the uniref90_201901, while the nucleotide database was chocophlan version 201901. Both HUMANn and Metaphlan were called by their conda environments. HUMAnN used therefore diamond version 2.0.4 and bowtie2 version 2.4.1.
Metaphlan was first run specifying --index mpa_v30_CHOCOPhlAn_201901, and bowtie2 version 2.3.4.3.