Command lines

General

FROGS, MOTHUR, USEARCH and QIIME have been launched on all the datasets by the script <FROGS_dir>/assessment/assessment.py

To get it: github.com/geraldinepascal/FROGS/tree/master/assessment/bin

This script is called with the following command:

<FROGS_dir>/assessment/bin/assessment.py \
	--nb-cpus 1 \
	--datasets-directory /save/frogs/assessment_datasets/datasets_utax \
	--frogs-directory <FROGS_dir> \
	--affiliation-databank-fasta /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa \
	--affiliation-databank-tax /save/frogs/assessment_datasets/databank/uparse/refdb_clean.tax \
	--affiliation-databank-udb /save/frogs/assessment_datasets/databank/uparse/refdb.udb \
	--mothur-databank /save/frogs/assessment_datasets/databank/mothur/silva.bacteria.fasta \
	--mothur-taxonomy /save/frogs/assessment_datasets/databank/mothur/silva.bacteria.rdp6.tax \
	--output-directory /work/frogs/assessment_datasets_utax/last/results

FROGS

Version: 1.4.0

Protocol: guidelines at sept-2016

Command lines example:

preprocess.py illumina \
	--nb-cpus 1 \
	--min-amplicon-size 350 --max-amplicon-size 550 \
	--without-primers --already-contiged \
	--input-R1 reads/sample01-20sp-Powerlaw.fastq reads/sample02-20sp-Powerlaw.fastq reads/sample03-20sp-Powerlaw.fastq reads/sample04-20sp-Powerlaw.fastq reads/sample05-20sp-Powerlaw.fastq reads/sample06-20sp-Powerlaw.fastq reads/sample07-20sp-Powerlaw.fastq reads/sample08-20sp-Powerlaw.fastq reads/sample09-20sp-Powerlaw.fastq reads/sample10-20sp-Powerlaw.fastq \
	--output-dereplicated frogs/prepro.fasta \
	--output-count frogs/prepro.tsv \
	--summary frogs/prepro_summary.html \
	--log-file frogs/prepro_log.txt

clustering.py \
	--nb-cpus 1 \
	--input-fasta frogs/prepro.fasta \
	--input-count frogs/prepro.tsv \
	--output-biom frogs/clustering.biom \
	--output-fasta frogs/clustering.fasta \
	--output-compo frogs/clustering_compo.tsv \
	--log-file frogs/clustering_log.txt \
	--distance 3 --denoising

remove_chimera.py \
	--nb-cpus 1 \
	--input-fasta frogs/clustering.fasta \
	--input-biom frogs/clustering.biom \
	--non-chimera frogs/removeChimera.fasta \
	--out-abundance frogs/removeChimera.biom \
	--summary frogs/removeChimera_summary.html \
	--log-file frogs/removeChimera_log.txt

filters.py \
	--input-biom frogs/removeChimera.biom \
	--input-fasta frogs/removeChimera.fasta \
	--output-fasta frogs/frogs.fasta \
	--output-biom frogs/filters.biom \
	--excluded frogs/filters_excluded.txt \
	--summary frogs/filters_summary.html \
	--log-file frogs/filters_log.txt \
	--min-abundance 0.00005

affiliation_OTU.py \
	--nb-cpus 1 \
	--reference /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa \
	--input-fasta frogs/frogs.fasta \
	--input-biom frogs/filters.biom \
	--output-biom frogs/frogs.biom \
	--summary frogs/affiliationOTU_summary.html \
	--log-file frogs/affiliationOTU_log.txt \
	--java-mem 20

Usearch

Version: v8.1.1861_i86linux32

Protocol: guidelines at sept-2016 from http://drive5.com/usearch/manual/uparse_pipeline.html

Command lines example:

usearch -fastq_filter reads/sample02-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_1.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample03-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_2.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample04-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_3.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample05-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_4.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample06-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_5.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample07-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_6.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample08-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_7.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample09-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_8.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample10-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_9.fastq -threads 1 -relabel @
usearch -fastq_filter reads/sample01-20sp-Powerlaw.fastq -fastqout uparse/uparse_tmp_spl_10.fastq -threads 1 -relabel @

cat uparse/uparse_tmp_spl_1.fastq uparse/uparse_tmp_spl_2.fastq uparse/uparse_tmp_spl_3.fastq uparse/uparse_tmp_spl_4.fastq uparse/uparse_tmp_spl_5.fastq uparse/uparse_tmp_spl_6.fastq uparse/uparse_tmp_spl_7.fastq uparse/uparse_tmp_spl_8.fastq uparse/uparse_tmp_spl_9.fastq uparse/uparse_tmp_spl_10.fastq > uparse/uparse_tmp_merged.fastq 

usearch \
	-fastq_filter uparse/uparse_tmp_merged.fastq \
	-fastaout uparse/uparse_tmp_filtered.fasta \
	-threads 1 \
	-fastq_maxee 1.0 -fastq_maxns 0

usearch \
	-derep_fulllength uparse/uparse_tmp_filtered.fasta \
	-fastaout uparse/uparse_tmp_uniques.fasta \
	-threads 1 \
	-sizeout 

usearch \
	-sortbysize uparse/uparse_tmp_uniques.fasta \
	-fastaout uparse/uparse_tmp_sorted \
	-minsize 2

usearch \
	-cluster_otus uparse/uparse_tmp_sorted \
	-otus uparse/uparse_tmp_seeds.fasta \
	-uparseout uparse/uparse_tmp_clusters.txt \
	-relabel Cluster_ \
	-sizein -sizeout

usearch \
	-utax uparse/uparse_tmp_seeds.fasta \
	-db /save/frogs/assessment_datasets/databank/uparse/refdb.udb \
	-fastaout uparse/uparse_tmp_affiliation.fasta \
	-strand both \
	-threads 1

usearch \
	-usearch_global uparse/uparse_tmp_merged.fastq \
	-db uparse/uparse_tmp_seeds.fasta \
	-biomout uparse/uparse_tmp_woAffi.biom \
	-strand both -id 0.97 \
	-threads 1

addUtaxFromFasta.py \
	--input-fasta uparse/uparse_tmp_affiliation.fasta \
	--input-biom uparse/uparse_tmp_woAffi.biom \
	--output-biom uparse/uparse.biom \
	--taxonomy-tag taxonomy

addSeedsRef.py \
	--seeds-fasta uparse/uparse_tmp_seeds.fasta \
	--reads reads/sample02-20sp-Powerlaw.fastq reads/sample03-20sp-Powerlaw.fastq reads/sample04-20sp-Powerlaw.fastq reads/sample05-20sp-Powerlaw.fastq reads/sample06-20sp-Powerlaw.fastq reads/sample07-20sp-Powerlaw.fastq reads/sample08-20sp-Powerlaw.fastq reads/sample09-20sp-Powerlaw.fastq reads/sample10-20sp-Powerlaw.fastq reads/sample01-20sp-Powerlaw.fastq \
	--annotated-fasta uparse/uparse.fasta

Mothur

Version: v.1.33.1

Protocol: guidelines at sept-2016 from http://www.mothur.org/wiki/MiSeq_SOP

rvc.py --input reads/sample10-20sp-Powerlaw.fastq --output reads/sample10-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample06-20sp-Powerlaw.fastq --output reads/sample06-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample03-20sp-Powerlaw.fastq --output reads/sample03-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample05-20sp-Powerlaw.fastq --output reads/sample05-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample01-20sp-Powerlaw.fastq --output reads/sample01-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample08-20sp-Powerlaw.fastq --output reads/sample08-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample02-20sp-Powerlaw.fastq --output reads/sample02-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample07-20sp-Powerlaw.fastq --output reads/sample07-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample04-20sp-Powerlaw.fastq --output reads/sample04-20sp-Powerlaw.fastq.RC
rvc.py --input reads/sample09-20sp-Powerlaw.fastq --output reads/sample09-20sp-Powerlaw.fastq.RC

mothur "#make.contigs(file=stability.file, processors=1)"

mothur "#screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, maxambig=0, maxn=0, minlength=350, maxlength=550, processors=1)"

mothur "#unique.seqs(fasta=stability.trim.contigs.good.fasta)"

mothur "#count.seqs(name=stability.trim.contigs.good.names, group=stability.contigs.good.groups)"

cp /save/frogs/assessment_datasets/databank/mothur/silva.bacteria.fasta restriction_db.fasta

mothur "#pcr.seqs(fasta=restriction_db.fasta, keepprimer=T, start=6000, end=27000, keepdots=F, processors=1)"

mothur "#align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=restriction_db.pcr.fasta, processors=1)"

mothur "#summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, processors=1)"

mothur "#screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=428, end=16451, processors=1)"

mothur "#filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=., processors=1)"

mothur "#unique.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, count=stability.trim.contigs.good.good.count_table)"

mothur "#pre.cluster(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)"

mothur "#chimera.uchime(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t, processors=1)"

mothur "#remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos)"

cp /save/frogs/assessment_datasets/databank/mothur/silva.bacteria.rdp6.tax restriction_db.tax

mothur "#classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=restriction_db.fasta, taxonomy=restriction_db.tax, cutoff=80)"

mothur "#remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.restriction_db.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)"

mothur "#dist.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20, processors=1)"

mothur "#cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table)"

cp /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa affiliation_db.fasta

cp /save/frogs/assessment_datasets/databank/uparse/refdb_clean.tax affiliation_db.tax

mothur "#classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=affiliation_db.fasta, taxonomy=affiliation_db.tax, cutoff=0)"

mothur "#classify.otu(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.affiliation_db.wang.taxonomy, label=0.03)"

mothur "#make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, label=0.03)"

mothur "#make.biom(shared=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.shared, constaxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.taxonomy)"

ln -sf stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.biom mothur/mothur.biom

mothur "#get.oturep(method=abundance, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.03)"

mothurDeGapSeeds.py \
	--input stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.rep.fasta \
	--output stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.rep.degap.fasta

mothurAddSeedRef.py \
	--input stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.rep.degap.fasta \
	--reads reads/sample10-20sp-Powerlaw.fastq reads/sample06-20sp-Powerlaw.fastq reads/sample03-20sp-Powerlaw.fastq reads/sample05-20sp-Powerlaw.fastq reads/sample01-20sp-Powerlaw.fastq reads/sample08-20sp-Powerlaw.fastq reads/sample02-20sp-Powerlaw.fastq reads/sample07-20sp-Powerlaw.fastq reads/sample04-20sp-Powerlaw.fastq reads/sample09-20sp-Powerlaw.fastq \
	--trimmed-reads stability.trim.contigs.good.unique.good.filter.fasta \
	--output mothur/mothur.fasta

Qiime

Version: v.1.9.0

Protocol: guidelines at sept-2016 from http://qiime.org/tutorials/otu_picking.html and from Rideout et al. (2014) with denovo chimera removal.

Command lines example:


qiime; split_libraries_fastq.py \
	-i reads/sample01-20sp-Powerlaw.fastq,reads/sample02-20sp-Powerlaw.fastq,reads/sample03-20sp-Powerlaw.fastq,reads/sample04-20sp-Powerlaw.fastq,reads/sample05-20sp-Powerlaw.fastq,reads/sample06-20sp-Powerlaw.fastq,reads/sample07-20sp-Powerlaw.fastq,reads/sample08-20sp-Powerlaw.fastq,reads/sample09-20sp-Powerlaw.fastq,reads/sample10-20sp-Powerlaw.fastq \
	--sample_ids sample01,sample02,sample03,sample04,sample05,sample06,sample07,sample08,sample09,sample10 \
	-o qiime/qiime_workdir/qiime_preprocess \
	--barcode_type 'not-barcoded'  \
	--phred_offset 33

qiime; identify_chimeric_seqs.py \
	-i qiime/qiime_workdir/qiime_preprocess/seqs.fna \
	-m usearch61 \
	--suppress_usearch61_ref \
	-o qiime/qiime_workdir/usearch61_chimeras

qiime; filter_fasta.py \
	-f qiime/qiime_workdir/qiime_preprocess/seqs.fna \
	-o qiime/qiime_workdir/usearch61_chimeras/seqs_chimeras_filtered.fna \
	-s qiime/qiime_workdir/usearch61_chimeras/chimeras.txt \
	-n

qiime; pick_open_reference_otus.py \
	-i qiime/qiime_workdir/usearch61_chimeras/seqs_chimeras_filtered.fna \
	-o qiime/qiime_workdir/pick_open_reference_otus \
	-r /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa \
	--suppress_align_and_tree \
	--suppress_taxonomy_assignment

qiime; assign_taxonomy.py \
	-o qiime/qiime_workdir/uclust_assigned_taxonomy \
	-i qiime/qiime_workdir/pick_open_reference_otus/rep_set.fna \
	-t /save/frogs/assessment_datasets/databank/uparse/refdb_clean.tax \
	-r /save/frogs/assessment_datasets/databank/uparse/refdb_clean.fa

qiime; completTax.py \
	-i qiime/qiime_workdir/uclust_assigned_taxonomy/rep_set_tax_assignments.txt  \
	-o qiime/qiime_workdir/uclust_assigned_taxonomy/rep_set_completeTax_assignments.txt

biom add-metadata \
	-i qiime/qiime_workdir/pick_open_reference_otus/otu_table_mc2.biom \
	--observation-metadata-fp qiime/qiime_workdir/uclust_assigned_taxonomy/rep_set_completeTax_assignments.txt \
	-o qiime/qiime_workdir/otu_table_mc2_w_tax.biom \
	--sc-separated taxonomy \
	--observation-header OTUID,taxonomy 

biom convert \
	-i qiime/qiime_workdir/otu_table_mc2_w_tax.biom \
	-o qiime/qiime.biom \
	--table-type="OTU table" \
	--to-json

addSeedsRef.py \
	-s qiime/qiime_workdir/pick_open_reference_otus/rep_set.fna \
	-r reads/sample01-20sp-Powerlaw.fastq reads/sample02-20sp-Powerlaw.fastq reads/sample03-20sp-Powerlaw.fastq reads/sample04-20sp-Powerlaw.fastq reads/sample05-20sp-Powerlaw.fastq reads/sample06-20sp-Powerlaw.fastq reads/sample07-20sp-Powerlaw.fastq reads/sample08-20sp-Powerlaw.fastq reads/sample09-20sp-Powerlaw.fastq reads/sample10-20sp-Powerlaw.fastq \
	-a qiime/qiime.fasta
					

Execution time

For this evaluation the FROGS pipeline has been run on an Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz.