1 Introduction

This is an R markdown document intended to compare the performances of FROGS, MOTHUR, UPARSE and QIIME in terms of accuracy on simulated microbial communities. We consider two variants of MOTHUR, UPARSE and QIIME called respectively SOP and MA. SOP correspond to Standard Operating Procedures (program guidelines) whereas MA stands for Multi-Affiliation and correspond to a multiple affiliation strategy used to propagate uncertainty when multiple equally good affiliations are found for a given OTU. This is the default strategy used in FROGS.

Throughout, QIIME (MA) refers to QIIME with Multi-Affiliation strategy, QIIME (SOP) to the affiliation strategy suggested in QIIME SOP and QIIME to both variants: QIIME (MA) and QIIME (SOP).

1.1 Metrics used for comparison

The results of FROGS, MOTHUR and UPARSE, QIIME are compared using three different metrics:

  1. Divergence: Bray-Curtis distance (expressed in percent) between the true taxonomic composition of the community and the one inferred by the otu-picking tool. The divergence is measured at all taxonomic levels from Phylum to either Genus (utax) or Species (Silva).
  2. FN: Number of false negative taxa (i.e. present in the original bacterial community but not discovered by the otu picking method);
  3. FP: Number of false positive taxa (i.e. discovered by the otu picking method but not present in the original bacterial community)

1.2 Experimental design of simulated bacterial communities

The experimental design for the simulated communities uses a full-factorial design.

The simulated communities were built according to the following design:

  1. Databank: Biobank from which taxa were drawn to construct theoretical communities, either Silva (silva) or Utax (utax).
  2. Number of OTUs: 20, 100, 200, 500 and 1000;
  3. Abundance distribution: abundances of OTUs were either uniform (uniform) or sampled from a power distribution (power_law);
  4. Dataset: Theoretical communities. Each dataset (5 for each combinaison of abundance distribution and number of OTUs) correspond to a unique ideal bacterial community specified by its own taxa set and corresponding vector of relative abundances.
  5. Set Number: Biological replicates (10 for each dataset), i.e. communities created by sampling organisms with replacement in the theoretical communities.
  6. Amplicon: variable region of the 16S rRNA used to produce the ampicon sequences, either the V3-V4 (V3V4) variable region or the V4-V4 (V4) variable region

This resulted in a total of 2 databanks \(\times\) 5 community sizes \(\times\) 2 abundance distribution \(\times\) 10 theoretical communities \(\times\) 10 replicates for each theoretical community \(\times\) 2 amplicons \(=\) 2000 samples (1000 per databank) (100 000 sequences per sample).

2 Material and Methods of Statistical analysis

For each of the three metrics (divergence, FN and FP) we performed two-sided paired test, either parametric (paired t-test) or non-parametric (signed rank test, also known as paired mann-whitney test) to assess the difference in accuracy between FROGS and each of the competitors.

The tests were peformed at the theoretical community levels (dataset) using biological replicates (set_number) as replicates. We chose to compare the methods at this level because it the finest one for which we have replication. Pooling different theoretical communities and/or abundance distributions to compare the method at higher levels (e.g community size \(\times\) amplicon) will blur the signal as a method may be outclass the others for uniform abundances but perform worse on different abundance disrtibutions.

For each theoretical community, we declared FROGS better (resp. worse) than its competitor when the test was significant at the 0.05 level and FROGS had a lower (resp. higher) metric than its competitor. When the test was not significant, the methods were declared tied. Finally, we aggregated the results to count for each condition (community size \(\times\) abundance distribution \(\times\) amplicon) the number of theoretical communities favoring one or none of the methods.


Simulated data from UTAX databank.

3.1 Vizualisation

3.1.1 Divergence

The comparisons of divergence at the sample level in the scatterplots shows that on average, FROGS has comparable but better performances than MOTHUR, UPARSE and QIIME (SOP): most samples end up in the upper left corner (corresponding to the region “divergence FROGS < divergence competitor”) but no too far away from the first diagonal (grey line). It also has comparable performances to QIIME (MA) but in certain conditions, samples are not mostly contained in the upper left half of the graph, meaning that QIIME (MA) outperforms FROGS for those parameter values.


A more traditional representation using boxplot of the excess divergence of FROGS, with samples from all theoretical communities pooled together, confirms the results: FROGS has similar (compared to UPARSE) or lower (compared to MOTHUR) divergence for the vast majority of samples. Note that the y-range was reduced from \([-85, 31]\) to \([-15, 3]\) in order to exclude outliers (4% of outliers with excess divergence < -15 and 0.02% with excess divergence > 3) and zoom in on the boxplots. As expected, all methods perform quite similarly up to the order level and the main differences appear at the Family and Genus levels, where MOTHUR and QIIME (SOP) produces much larger divergences than competing methods. The only configuration where FROGS is consistently outperformed is complex communities (number of species > 200) with uniform abundances and sequenced on the V4 region. In that configuration, FROGS is outperformed by QIIME (MA).