OpenMS
Epifany

EPIFANY - Efficient protein inference for any peptide-protein network is a Bayesian protein inference engine. It uses PSM (posterior) probabilities from Percolator, OpenMS' IDPosteriorErrorProbability or similar tools to calculate posterior probabilities for proteins and protein groups.

Experimental classes:
This tool is work in progress and usage and input requirements might change.
pot. predecessor tools → Epifany → pot. successor tools
PercolatorAdapter IDFilter
IDPosteriorErrorProbability

It is a protein inference engine based on a Bayesian network. Currently the same model like Fido is used with the main parameters alpha (pep_emission), beta (pep_spurious_emission) and gamma (prot_prior). If not specified, these parameters are trained based on their classification performance and calibration via a grid search by simply running with several possible combinations and evaluating. Unless you see very extreme output probabilities (e.g. many close to 1.0) or you know good parameters (e.g. from an earlier run), grid search is recommended although slower. The tool will merge multiple idXML files (union of proteins and concatenation of PSMs) when given more than one. It assumes one search engine run per input file but might work on more. Proteins need to be indexed by OpenMS's PeptideIndexer but this is usually done before Percolator/IDPEP since target/decoy associations are needed there already. Make sure that the input PSM probabilities are not too extreme already (garbage in - garbage out). After merging the input probabilities are preprocessed with a low posterior probability cutoff to neglect very unreliable matches. Then the probabilities are aggregated with the maximum per peptide and the graph is built and split into connected components. When compiled with the OpenMP flag (default enabled in the release binaries) the tool is multi-threaded which can be activated at runtime by the threads parameter. Note that peak memory requirements may rise significantly when processing multiple components of the graph at the same time.

The command line parameters of this tool are:

Epifany -- Runs a Bayesian protein inference.
Full documentation: http://www.openms.de/doxygen/release/3.2.0/html/TOPP_Epifany.html
Version: 3.2.0 Sep 18 2024, 16:00:56, Revision: e231942
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
   trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  Epifany <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option

Options (mandatory options marked with '*'):
  -in <file>*                            Input: identification results (valid formats: 'idXML', 'consensusXML
                                         ')
  -exp_design <file>                     (Currently unused) Input: experimental design (valid formats: 'tsv')

  -out <file>*                           Output: identification results with scored/grouped proteins (valid 
                                         formats: 'idXML', 'consensusXML')
  -out_type <file>                       Output type: auto detected by file extension but can be overwritten 
                                         here. (valid: 'idXML', 'consensusXML')
  -protein_fdr <option>                  Additionally calculate the target-decoy FDR on protein-level based 
                                         on the posteriors (default: 'false') (valid: 'true', 'false')
  -greedy_group_resolution <option>      Post-process inference output with greedy resolution of shared pepti
                                         des based on the parent protein probabilities. Also adds the resolve
                                         d ambiguity groups to output. (default: 'none') (valid: 'none', 'rem
                                         ove_associations_only', 'remove_proteins_wo_evidence')
  -max_psms_extreme_probability <float>  Set PSMs with probability higher than this to this maximum probabili
                                         ty. (default: '1.0')
                                         
                                         
Common TOPP options:
  -ini <file>                            Use the given TOPP INI file
  -threads <n>                           Sets the number of threads allowed to be used by the TOPP tool (defa
                                         ult: '1')
  -write_ini <file>                      Writes the default configuration file
  --help                                 Shows options
  --helphelp                             Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   Parameters for the Algorithm section

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:
  - http://www.openms.de/doxygen/release/3.2.0/html/TOPP_Epifany.html

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+EpifanyRuns a Bayesian protein inference.
version3.2.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'Epifany'
in[] Input: identification resultsinput file*.idXML, *.consensusXML
exp_design (Currently unused) Input: experimental designinput file*.tsv
out Output: identification results with scored/grouped proteinsoutput file*.idXML, *.consensusXML
out_type Output type: auto detected by file extension but can be overwritten here.idXML, consensusXML
protein_fdrfalse Additionally calculate the target-decoy FDR on protein-level based on the posteriorstrue, false
conservative_fdrtrue Use (D+1)/(T) instead of (D+1)/(T+D) for reporting protein FDRs.true, false
picked_fdrtrue Use picked protein FDRs.true, false
picked_decoy_string If using picked protein FDRs, which decoy string was used? Leave blank for auto-detection.
picked_decoy_prefixprefix If using picked protein FDRs, was the decoy string a prefix or suffix? Ignored during auto-detection.prefix, suffix
greedy_group_resolutionnone Post-process inference output with greedy resolution of shared peptides based on the parent protein probabilities. Also adds the resolved ambiguity groups to output.none, remove_associations_only, remove_proteins_wo_evidence
min_psms_extreme_probability0.0 Set PSMs with probability lower than this to this minimum probability.
max_psms_extreme_probability1.0 Set PSMs with probability higher than this to this maximum probability.
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++algorithmParameters for the Algorithm section
psm_probability_cutoff1.0e-03 Remove PSMs with probabilities less than this cutoff0.0:1.0
top_PSMs1 Consider only top X PSMs per spectrum. 0 considers all.0:∞
keep_best_PSM_onlytrue Epifany uses the best PSM per peptide for inference. Discard the rest (true) or keepe.g. for quantification/reporting?true, false
update_PSM_probabilitiestrue (Experimental:) Update PSM probabilities with their posteriors under consideration of the protein probabilities.true, false
user_defined_priorsfalse (Experimental:) Uses the current protein scores as user-defined priors.true, false
annotate_group_probabilitiestrue Annotates group probabilities for indistinguishable protein groups (indistinguishable by experimentally observed PSMs).true, false
use_ids_outside_featuresfalse (Only consensusXML) Also use IDs without associated features for inference?true, false
++++model_parametersModel parameters for the Bayesian network
prot_prior-1.0 Protein prior probability ('gamma' parameter). Negative values enable grid search for this param.-1.0:1.0
pep_emission-1.0 Peptide emission probability ('alpha' parameter). Negative values enable grid search for this param.-1.0:1.0
pep_spurious_emission-1.0 Spurious peptide identification probability ('beta' parameter). Usually much smaller than emission from proteins. Negative values enable grid search for this param.-1.0:1.0
pep_prior0.1 Peptide prior probability (experimental, should be covered by combinations of the other params).0.0:1.0
regularizefalse Regularize the number of proteins that produce a peptide together (experimental, should be activated when using higher p-norms).true, false
extended_modelfalse Uses information from different peptidoforms also across runs (automatically activated if an experimental design is given!)true, false
++++loopy_belief_propagationSettings for the loopy belief propagation algorithm.
scheduling_typepriority (Not used yet) How to pick the next message: priority = based on difference to last message (higher = more important). fifo = first in first out. subtree = message passing follows a random spanning tree in each iterationpriority, fifo, subtree
convergence_threshold1.0e-05 Initial threshold under which MSE difference a message is considered to be converged.1.0e-09:1.0
dampening_lambda1.0e-03 Initial value for how strongly should messages be updated in each step. 0 = new message overwrites old completely (no dampening; only recommended for trees),0.5 = equal contribution of old and new message (stay below that),In-between it will be a convex combination of both. Prevents oscillations but hinders convergence.0.0:0.49999
max_nr_iterations2147483647 (Usually auto-determined by estimated but you can set a hard limit here). If not all messages converge, how many iterations should be done at max per connected component?
p_norm_inference1.0 P-norm used for marginalization of multidimensional factors. 1 == sum-product inference (all configurations vote equally) (default),<= 0 == infinity = max-product inference (only best configurations propagate)The higher the value the more important high probability configurations get.
++++param_optimizeSettings for the parameter optimization.
aucweight0.3 How important is target decoy AUC vs calibration of the posteriors? 0 = maximize calibration only, 1 = maximize AUC only, between = convex combination.0.0:1.0
conservative_fdrtrue Use (D+1)/(T) instead of (D+1)/(T+D) for parameter estimation.true, false
regularized_fdrtrue Use a regularized FDR for proteins without unique peptides.true, false