FROGSFUNC_3_pathways

tutorial

tool

FROGSFUNC_3_pathways

Context

PICRUSt2 is a software for predicting functional abundances based only on marker gene sequences. This tool is integrated inside FROGS suite as FROGSFUNC tools. They are split into 4 steps :

FROGSFUNC_1_placeseqs_copynumber : Places the ASVs into a reference phylogenetic tree and predicts the copy numbers of the marker gene (16S, ITS or 18S).
FROGSFUNC_2_functions: Predicts number of function copy number in each ASV and calculates functions abundances in each sample and ASV abundances according to marker copy number.
FROGSFUNC_3_pathways : Calculates pathway abundances in each sample.

This data can be useful for generating hypotheses, but should always be interpreted cautiously especially when focused on a single function or predictions for a single ASV.

PICRUSt2 are based on 3 markers only, 16S, ITS and 18S. If you used another one (rpob, 23S, coi, ef1 etc.), you cannot used these 3 tools.

What it does

FROGSFUNC_3_pathways is the last step of PICRUSt2. It infers MetaCyc/KEGG pathway abundances based on EC/KO number abundances. There are three steps performed at this stage:

Regroups EC or KO numbers to MetaCyc or KEGG reactions, depending of the unstrat abundances input file.
Infers that MetaCyc/KEGG pathways are present based on these reactions with MinPath.
Calculates and returns the abundance of pathways identified as present.

FROGSFUNC_3_pathways tool summary

Command line

v4.1.0

usage: frogsfunc_pathways.py [-h] [--debug] [--per-sequence-contrib] -i
                             INPUT_FILE [-m MAP]
                             [--per-sequence-abun PER_SEQUENCE_ABUN]
                             [--per-sequence-function PER_SEQUENCE_FUNCTION]
                             [--hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]]]
                             [--normalisation] [-o OUTPUT_PATHWAYS_ABUND]
                             [--output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB]
                             [--output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS]
                             [--output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ]
                             [-v] [-l LOG_FILE] [-t SUMMARY]

Infer the presence and abundances of pathways based on gene family abundances
in a sample.

optional arguments:
  -h, --help            show this help message and exit
  --debug               Keep temporary files to debug program.
  --per-sequence-contrib
                        If stratified option is activated, a new table is
                        built. It will contain the abundances of each function
                        of each OTU in each sample. (in contrast to the
                        default stratified output, which is the contribution
                        to the community-wide pathway abundances.) Options
                        --per-sequence-abun and --per-sequence-function need
                        to be set when this option is used (default: False)

Inputs:
  -i INPUT_FILE, --input-file INPUT_FILE
                        Input TSV function abundances table from
                        FROGSFUNC_step3_function (unstratified table :
                        frogsfunc_functions_unstrat.tsv).
  -m MAP, --map MAP     File required if you are not analyzing 16S sequences
                        with the Metacyc ("EC" function in the previous step)
                        database. IF MARKER STUDYED STILL 16S: it must
                        indicate the path to the PICRUSt2 KEGG pathways
                        mapfile, if you chose "KO" in the previous step (the
                        mapfile is available here : $PICRUSt2_PATH/default_fil
                        es/pathway_mapfiles/KEGG_pathways_to_KO.tsv) IF MARKER
                        STUDYED IS ITS OR 18S: Path to mapping file of
                        pathways to fungi reactions (the mapfile is available
                        here : $PICRUSt2_PATH/default_files/pathway_mapfiles/m
                        etacyc_path2rxn_struc_filt_fungi.txt ).
  --per-sequence-abun PER_SEQUENCE_ABUN
                        Path to table of sequence abundances across samples
                        normalized by marker copy number (typically the
                        normalized sequence abundance table output at the
                        metagenome pipeline step:
                        frogsfunc_functions_marker_norm.tsv by default). This
                        input is required when the --per-sequence-contrib
                        option is set. (default: None).
  --per-sequence-function PER_SEQUENCE_FUNCTION
                        Path to table of function abundances per sequence,
                        which was outputted at the hidden-state prediction
                        step (frogsfunc_copynumbers_predicted_functions.tsv by
                        default). This input is required when the --per-
                        sequence-contrib option is set. Note that this file
                        should be the same input table as used for the
                        metagenome pipeline step (default: None).
  --hierarchy-ranks [HIERARCHY_RANKS [HIERARCHY_RANKS ...]]
                        The ordered ranks levels used in the metadata
                        hierarchy pathways. [Default: ['Level1', 'Level2',
                        'Level3', 'Pathway']]
  --normalisation       To normalise data after analysis. Values are divided
                        by sum of columns , then multiplied by 10^6 (CPM
                        values). [Default: False]

Outputs:
  -o OUTPUT_PATHWAYS_ABUND, --output-pathways-abund OUTPUT_PATHWAYS_ABUND
                        Pathway abundance file output. Default:
                        frogsfunc_pathways_unstrat.tsv]
  --output-pathways-contrib OUTPUT_PATHWAYS_CONTRIB
                        Stratified output corresponding to contribution of
                        predicted gene family abundances within each predicted
                        genome.
  --output-pathways-predictions OUTPUT_PATHWAYS_PREDICTIONS
                        Stratified output corresponding to contribution of
                        predicted gene family abundances within each predicted
                        genome.
  --output-pathways-abund-per-seq OUTPUT_PATHWAYS_ABUND_PER_SEQ
                        Pathway abundance file output per sequences (if --per-
                        sequence-contrib set)
  -v, --version         show programs version number and exit
  -l LOG_FILE, --log-file LOG_FILE
                        This output file will contain several information on
                        executed commands.
  -t SUMMARY, --summary SUMMARY
                        Path to store resulting html file. [Default:
                        frogsfunc_pathways_summary.html]

Example of command line:

./frogsfunc_pathways.py \
    --input-file frogsfunc_functions_unstrat_EC.tsv \
    --normalisation \
    --per-sequence-contrib \
    --per-sequence-abun frogsfunc_functions_marker_norm.tsv \
    --per-sequence-function EC_copynumbers_predicted.tsv \
    --output-pathways-abund frogsfunc_pathways_unstrat.tsv \
    --output-pathways-contrib frogsfunc_pathways_strat.tsv \
    --output-pathways-predictions frogsfunc_pathways_predictions.tsv \
    --output-pathways-abund-per-seq frogsfunc_pathways_unstrat_per_seq.tsv \
    --summary frogsfunc_pathways_summary.html

Stratified output (–per-sequence-contrib, --per-sequence-abun, --per-sequence-function related paramaters ) is optionnal.

Galaxy

Function abundance file:
TSV function abundances table from FROGSFUNC_2_functions tool, frogsfunc_functions_unstrat_EC.tsv or frogsfunc_functions_unstrat_KO.tsv (unstratified table).

This input must be the unstratified table (the default table)

Taxonomic marker:
Output table of predicted marker gene copy numbers per sequence from FROGSFUNC_1_placeseqs_and_copynumbers tool.

Pathway reference:
Mapping of pathways to reactions.

For 16S marker, choose Metacyc or KEGG in accordance with your choice in the FROGSFUNC_2_functions tool. If you want both, run this tool twice.
For ITS or 18S marker, Metacyc is the only valid option.

Do you want to normalize the final output table ?
normalization = values are divided by sum of columns, then multiplied by 10^6 (Count Per Million values).

If this option is set, the pathway abundances file (frogsfunc_functions_unstrat.tsv) is normalized: values are divided by sum of columns, then multiplied by 10^6 (Count Per Million values).

This normalization allows to compare the samples between them. But to perform more precise statistical analysis, some tools as DESeq2 need the non-normalized abundance table to perform the normalization by themselves. So be careful which table to use for further analysis.

Outputs

HTML report

The HTML file summarizes information about pathway abundances within each sample.

What is the distribution of pathway abundances in the samples ?

Samples: Mean of NSTI values of ASVs present in the sample, normalized by their abundances.
Nb pathway retrieved : Number of pathway present in the sample.
Display global distribution button allows to view the distribution of pathway abundances across all samples.

To view this distribution only on some samples, you check the boxes of the samples (first column of the table above), and click on the “Show distribution” button at the bottom of the table.

pathway distribution for selected samples

The innermost circle represents the highest hierarchical level of pathways according to Metacyc or Kegg databases. The more we go outwards, the more the hierarchical level becomes precise until indicating the identifier of the pathway.

For exemple :
Generation of Precursor Metabolites and Energy > Fermentation > Fermentation of Pyruvate > PWY-6588

For more pathway details, double-click on a the interest pathway name.

Pathway abundance tables

Pathway abundances table - “unstratified”.

It is the pathways abundance predictions of metagenome, per sample.

Classification column: the hierarchy classification of the pathway.
db_link column: the url on the link accession ID (observation_name) of the pathway.
observation_name: Accession identifier
last columns: Abundances of these pathway in each samples.

Pathway abundances table - stratified (optional and command only).

optional and only for command line - not available on galaxy version

This default stratified pathway abundance table represents how much each ASV is contributing to the community-wide pathway abundance and not what the pathway abundance is predicted to be within the predicted genome of that ASV alone.

N.B.: In this above example, the first N lines of the file correspond to the N ASVs in the sample SC1703-104TTGCCC-B6TMLL001R, and so on for each sample.

Please note that requesting the stratified output files implies a longer process time. And, this file is very large, there are as many lines as there are samples x ASVs x pathways.

sample: sample names
function: accession ID from pathway database
taxon: ASVs names
taxon_abun: sequence number of ASV in the sample divided by number of marker copy number.
taxon_rel_abun: This is the same as the “taxon_abun” column, but in terms of relative abundance (so that the sum of all ASV abundances per sample is 100).
genome_function_count: Predicted copy number of this pathway per ASV.
taxon_function_abun: Multiplication of “taxon_abun” column by “genome_function_count” column.
taxon_rel_function_abun: Multiplication of “taxon_rel_abun” column by “genome_function_count” column.
norm_taxon_function_contrib: This is the same as the “taxon_rel_function_abun” column, but in terms of relative abundance in the sample (so that the sum of all number of this column equals 1).

Abundance table of pathways per ASV (only with stratified option).

A work by FROGS team