OpenMS
|
Calculates false discovery rates (FDR) from identifications. More...
#include <OpenMS/ANALYSIS/ID/FalseDiscoveryRate.h>
Classes | |
class | DecoyStringHelper |
Finds decoy strings in ProteinIdentification runs. More... | |
Public Member Functions | |
FalseDiscoveryRate () | |
Default constructor. More... | |
void | apply (std::vector< PeptideIdentification > &fwd_ids, std::vector< PeptideIdentification > &rev_ids) const |
Calculates the FDR of two runs, a forward run and a decoy run on peptide level. More... | |
void | apply (std::vector< PeptideIdentification > &id, bool annotate_peptide_fdr=false) const |
Calculates the FDR of one run from a concatenated sequence DB search. More... | |
void | apply (std::vector< ProteinIdentification > &fwd_ids, std::vector< ProteinIdentification > &rev_ids) const |
Calculates the FDR of two runs, a forward run and decoy run on protein level. More... | |
void | apply (std::vector< ProteinIdentification > &ids) const |
Calculate the FDR of one run from a concatenated sequence db search. More... | |
void | applyEstimated (std::vector< ProteinIdentification > &ids) const |
Calculate the FDR based on PEPs or PPs (if present) and modifies the IDs inplace. More... | |
double | applyEvaluateProteinIDs (const std::vector< ProteinIdentification > &ids, double pepCutoff=1.0, UInt fpCutoff=50, double diffWeight=0.2) const |
Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives). More... | |
double | applyEvaluateProteinIDs (const ProteinIdentification &ids, double pepCutoff=1.0, UInt fpCutoff=50, double diffWeight=0.2) const |
Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives). More... | |
double | applyEvaluateProteinIDs (ScoreToTgtDecLabelPairs &score_to_tgt_dec_fraction_pairs, double pepCutoff=1.0, UInt fpCutoff=50, double diffWeight=0.2) const |
Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives). More... | |
void | applyBasic (const std::vector< ProteinIdentification > &run_info, std::vector< PeptideIdentification > &ids) |
simpler reimplementation of the apply function above for PSMs. With charge and identifier info from run_info More... | |
void | applyBasic (std::vector< PeptideIdentification > &ids, bool higher_score_better, int charge=0, String identifier="", bool only_best_per_pep=false) |
simpler reimplementation of the apply function above for PSMs or peptides. More... | |
void | applyBasicPeptideLevel (std::vector< PeptideIdentification > &ids) |
void | applyBasicPeptideLevel (ConsensusMap &ids, bool use_unassigned_peptides=true) |
void | applyBasic (ConsensusMap &cmap, bool use_unassigned_peptides=true) |
simpler reimplementation of the apply function above for peptides in ConsensusMaps. More... | |
void | applyBasic (ProteinIdentification &id, bool groups_too=true) |
simpler reimplementation of the apply function above for proteins. More... | |
void | applyPickedProteinFDR (ProteinIdentification &id, String decoy_string="", bool prefix=true, bool groups_too=true) |
Applies a picked protein FDR. Behaves like a normal target-decoy FDR where only the score of the best protein per target-decoy pair is used. A pair is calculated by checking accession equality after removing the decoy string. If decoy_string is empty, we try to guess it. If you set decoy_string you should also set prefix and say if the string is a prefix (true) or suffix (false). groups_too decides if also a (indistinguishable) group-level FDR will be calculated. Here a group score will be taken if not ALL proteins in the group were picked already. Targets preferred. More... | |
double | rocN (const std::vector< PeptideIdentification > &ids, Size fp_cutoff) const |
double | rocN (const std::vector< PeptideIdentification > &ids, Size fp_cutoff, const String &identifier) const |
double | rocN (const ConsensusMap &ids, Size fp_cutoff, bool include_unassigned_peptides=false) const |
double | rocN (const ConsensusMap &ids, Size fp_cutoff, const String &identifier, bool include_unassigned_peptides=false) const |
double | diffEstimatedEmpirical (const ScoreToTgtDecLabelPairs &scores_labels, double pepCutoff=1.0) const |
calculates the area of the difference between estimated and empirical FDR on the fly. Does not store results. More... | |
double | rocN (const ScoreToTgtDecLabelPairs &scores_labels, Size fpCutoff=50) const |
IdentificationData::ScoreTypeRef | applyToObservationMatches (IdentificationData &id_data, IdentificationData::ScoreTypeRef score_ref) const |
Calculate FDR on the level of observation matches (e.g. peptide-spectrum matches) for "general" identification data. More... | |
Public Member Functions inherited from DefaultParamHandler | |
DefaultParamHandler (const String &name) | |
Constructor with name that is displayed in error messages. More... | |
DefaultParamHandler (const DefaultParamHandler &rhs) | |
Copy constructor. More... | |
virtual | ~DefaultParamHandler () |
Destructor. More... | |
DefaultParamHandler & | operator= (const DefaultParamHandler &rhs) |
Assignment operator. More... | |
virtual bool | operator== (const DefaultParamHandler &rhs) const |
Equality operator. More... | |
void | setParameters (const Param ¶m) |
Sets the parameters. More... | |
const Param & | getParameters () const |
Non-mutable access to the parameters. More... | |
const Param & | getDefaults () const |
Non-mutable access to the default parameters. More... | |
const String & | getName () const |
Non-mutable access to the name. More... | |
void | setName (const String &name) |
Mutable access to the name. More... | |
const std::vector< String > & | getSubsections () const |
Non-mutable access to the registered subsections. More... | |
Private Member Functions | |
FalseDiscoveryRate (const FalseDiscoveryRate &) | |
Not implemented. More... | |
FalseDiscoveryRate & | operator= (const FalseDiscoveryRate &) |
Not implemented. More... | |
void | calculateFDRs_ (std::map< double, double > &score_to_fdr, std::vector< double > &target_scores, std::vector< double > &decoy_scores, bool q_value, bool higher_score_better) const |
calculates the FDR, given two vectors of scores More... | |
void | handleObservationMatch_ (IdentificationData::ObservationMatchRef match_ref, IdentificationData::ScoreTypeRef score_ref, std::vector< double > &target_scores, std::vector< double > &decoy_scores, std::map< IdentificationData::IdentifiedMolecule, bool > &molecule_to_decoy, std::map< IdentificationData::ObservationMatchRef, double > &match_to_score) const |
Helper function for applyToObservationMatches() More... | |
void | calculateEstimatedQVal_ (std::map< double, double > &scores_to_FDR, ScoreToTgtDecLabelPairs &scores_labels, bool higher_score_better) const |
void | calculateFDRBasic_ (std::map< double, double > &scores_to_FDR, ScoreToTgtDecLabelPairs &scores_labels, bool qvalue, bool higher_score_better) const |
double | trapezoidal_area_xEqy (double exp1, double exp2, double act1, double act2) const |
double | trapezoidal_area (double x1, double x2, double y1, double y2) const |
calculates the trapezoidal area for a trapezoid with a flat horizontal base e.g. for an AUC More... | |
Additional Inherited Members | |
Static Public Member Functions inherited from DefaultParamHandler | |
static void | writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="") |
Writes all parameters to meta values. More... | |
Protected Member Functions inherited from DefaultParamHandler | |
virtual void | updateMembers_ () |
This method is used to update extra member variables at the end of the setParameters() method. More... | |
void | defaultsToParam_ () |
Updates the parameters after the defaults have been set in the constructor. More... | |
Protected Attributes inherited from DefaultParamHandler | |
Param | param_ |
Container for current parameters. More... | |
Param | defaults_ |
Container for default parameters. This member should be filled in the constructor of derived classes! More... | |
std::vector< String > | subsections_ |
Container for registered subsections. This member should be filled in the constructor of derived classes! More... | |
String | error_name_ |
Name that is displayed in error messages during the parameter checking. More... | |
bool | check_defaults_ |
If this member is set to false no checking if parameters in done;. More... | |
bool | warn_empty_defaults_ |
If this member is set to false no warning is emitted when defaults are empty;. More... | |
Calculates false discovery rates (FDR) from identifications.
Either two runs of forward and decoy database identification or one run containing both (with annotations) can be used to annotate each of the peptide hits with an FDR or q-value.
q-values are basically only adjusted p-values, also ranging from 0 to 1, with lower values being preferable. When looking at the list of hits ordered by q-values, then a specific q-value of x means that x*100 percent of hits with a q-value <= x are expected to be false positives.
Only simple target-decoy FDRs are supported with a formula depending on the "conservative" parameter:
For peptide hits, a hit is considered target also if it maps to both a target and a decoy protein (i.e. "target+decoy") as value in the "target_decoy" metavalue e.g. annotated by PeptideIndexer
Name | Type | Default | Restrictions | Description |
---|---|---|---|---|
no_qvalues | string | false | true, false | If 'true' strict FDRs will be calculated instead of q-values (the default) |
use_all_hits | string | false | true, false | If 'true' not only the first hit, but all are used (peptides only) |
split_charge_variants | string | false | true, false | If 'true' charge variants are treated separately (for peptides of combined target/decoy searches only). |
treat_runs_separately | string | false | true, false | If 'true' different search runs are treated separately (for peptides of combined target/decoy searches only). |
add_decoy_peptides | string | false | true, false | If 'true' decoy peptides will be written to output file, too. The q-value is set to the closest target score. |
add_decoy_proteins | string | false | true, false | If 'true' decoy proteins will be written to output file, too. The q-value is set to the closest target score. |
conservative | string | true | true, false | If 'true' (D+1)/T instead of (D+1)/(T+D) is used as a formula. |
Default constructor.
|
private |
Not implemented.
void apply | ( | std::vector< PeptideIdentification > & | fwd_ids, |
std::vector< PeptideIdentification > & | rev_ids | ||
) | const |
Calculates the FDR of two runs, a forward run and a decoy run on peptide level.
fwd_ids | forward peptide identifications |
rev_ids | reverse peptide identifications |
void apply | ( | std::vector< PeptideIdentification > & | id, |
bool | annotate_peptide_fdr = false |
||
) | const |
Calculates the FDR of one run from a concatenated sequence DB search.
id | peptide identifications, containing target and decoy hits |
annotate_peptide_fdr | adds the peptide q-value or peptide fdr meta value to each PSM. Calculation uses best PSM per peptide. |
void apply | ( | std::vector< ProteinIdentification > & | fwd_ids, |
std::vector< ProteinIdentification > & | rev_ids | ||
) | const |
Calculates the FDR of two runs, a forward run and decoy run on protein level.
fwd_ids | forward protein identifications |
rev_ids | reverse protein identifications |
void apply | ( | std::vector< ProteinIdentification > & | ids | ) | const |
Calculate the FDR of one run from a concatenated sequence db search.
ids | protein identifications, containing target and decoy hits |
void applyBasic | ( | ConsensusMap & | cmap, |
bool | use_unassigned_peptides = true |
||
) |
simpler reimplementation of the apply function above for peptides in ConsensusMaps.
void applyBasic | ( | const std::vector< ProteinIdentification > & | run_info, |
std::vector< PeptideIdentification > & | ids | ||
) |
simpler reimplementation of the apply function above for PSMs. With charge and identifier info from run_info
void applyBasic | ( | ProteinIdentification & | id, |
bool | groups_too = true |
||
) |
simpler reimplementation of the apply function above for proteins.
void applyBasic | ( | std::vector< PeptideIdentification > & | ids, |
bool | higher_score_better, | ||
int | charge = 0 , |
||
String | identifier = "" , |
||
bool | only_best_per_pep = false |
||
) |
simpler reimplementation of the apply function above for PSMs or peptides.
void applyBasicPeptideLevel | ( | ConsensusMap & | ids, |
bool | use_unassigned_peptides = true |
||
) |
like applyBasic with "only_best_per_peptide" but it assigns a score to EVERY PSM sharing the peptide sequence with the best representative. Useful if all hits need to have a peptide score (e.g., for mzTab report). No support for specific charges, runs etc. yet
void applyBasicPeptideLevel | ( | std::vector< PeptideIdentification > & | ids | ) |
like applyBasic with "only_best_per_peptide" but it assigns a score to EVERY PSM sharing the peptide sequence with the best representative. Useful if all hits need to have a peptide score (e.g., for mzTab report). No support for specific charges, runs etc. yet
void applyEstimated | ( | std::vector< ProteinIdentification > & | ids | ) | const |
Calculate the FDR based on PEPs or PPs (if present) and modifies the IDs inplace.
ids | protein identifications, containing PEP scores (not necessarily) annotated with target decoy. |
double applyEvaluateProteinIDs | ( | const ProteinIdentification & | ids, |
double | pepCutoff = 1.0 , |
||
UInt | fpCutoff = 50 , |
||
double | diffWeight = 0.2 |
||
) | const |
Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives).
ids | protein identifications, containing PEP scores annotated with target decoy. |
pepCutoff | up to which PEP should the differences between the two FDRs be calculated |
fpCutoff | up to which nr. of false positives should the target-decoy AUC be evaluated |
diffWeight | which weight should the difference get. The ROC-N value gets 1 - this weight. |
double applyEvaluateProteinIDs | ( | const std::vector< ProteinIdentification > & | ids, |
double | pepCutoff = 1.0 , |
||
UInt | fpCutoff = 50 , |
||
double | diffWeight = 0.2 |
||
) | const |
Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives).
ids | protein identifications, containing PEP scores annotated with target decoy. Only first run will be evaluated. |
pepCutoff | up to which PEP should the differences between the two FDRs be calculated |
fpCutoff | up to which nr. of false positives should the target-decoy AUC be evaluated |
diffWeight | which weight should the difference get. The ROC-N value gets 1 - this weight. |
double applyEvaluateProteinIDs | ( | ScoreToTgtDecLabelPairs & | score_to_tgt_dec_fraction_pairs, |
double | pepCutoff = 1.0 , |
||
UInt | fpCutoff = 50 , |
||
double | diffWeight = 0.2 |
||
) | const |
Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives).
score_to_tgt_dec_fraction_pairs | extracted scores of protein(group) identifications, containing PEP scores annotated with target decoy fractions. Simple case target=1, decoy=0. |
pepCutoff | up to which PEP should the differences between the two FDRs be calculated |
fpCutoff | up to which nr. of false positives should the target-decoy AUC be evaluated |
diffWeight | which weight should the difference get. The ROC-N value gets 1 - this weight. |
void applyPickedProteinFDR | ( | ProteinIdentification & | id, |
String | decoy_string = "" , |
||
bool | prefix = true , |
||
bool | groups_too = true |
||
) |
Applies a picked protein FDR. Behaves like a normal target-decoy FDR where only the score of the best protein per target-decoy pair is used. A pair is calculated by checking accession equality after removing the decoy string. If decoy_string
is empty, we try to guess it. If you set decoy_string
you should also set prefix
and say if the string is a prefix (true) or suffix (false). groups_too
decides if also a (indistinguishable) group-level FDR will be calculated. Here a group score will be taken if not ALL proteins in the group were picked already. Targets preferred.
IdentificationData::ScoreTypeRef applyToObservationMatches | ( | IdentificationData & | id_data, |
IdentificationData::ScoreTypeRef | score_ref | ||
) | const |
Calculate FDR on the level of observation matches (e.g. peptide-spectrum matches) for "general" identification data.
id_data | Identification data |
score_ref | Key of the score to use for FDR calculation |
Referenced by NucleicAcidSearchEngine::calculateAndFilterFDR_().
|
private |
calculates an estimated FDR (based on P(E)Ps) given a vector of score value pairs and fills a map for lookup in scores_to_FDR
|
private |
calculates the FDR with a basic and faster algorithm Just goes through the sorted scores and counts the number of decoys and targets and annotates the FDR for this score as it goes. Q-values are optionally annotated by calculating the cumulative minimum in reversed order afterwards. Since I never understood our other algorithm, I can not explain the difference.
|
private |
calculates the FDR, given two vectors of scores
double diffEstimatedEmpirical | ( | const ScoreToTgtDecLabelPairs & | scores_labels, |
double | pepCutoff = 1.0 |
||
) | const |
calculates the area of the difference between estimated and empirical FDR on the fly. Does not store results.
|
private |
Helper function for applyToObservationMatches()
|
private |
Not implemented.
double rocN | ( | const ConsensusMap & | ids, |
Size | fp_cutoff, | ||
bool | include_unassigned_peptides = false |
||
) | const |
calculates the AUC until the first fp_cutoff
False positive pep IDs (takes all runs together) if fp_cutoff = 0, it will calculate the full AUC
double rocN | ( | const ConsensusMap & | ids, |
Size | fp_cutoff, | ||
const String & | identifier, | ||
bool | include_unassigned_peptides = false |
||
) | const |
calculates the AUC until the first fp_cutoff
False positive pep IDs. if fp_cutoff = 0, it will calculate the full AUC. Restricted to IDs from a specific ID run with identifier
.
double rocN | ( | const ScoreToTgtDecLabelPairs & | scores_labels, |
Size | fpCutoff = 50 |
||
) | const |
calculates AUC of empirical FDR up to the first fpCutoff false positives on the fly. Does not store results. use e.g. fpCutoff = scores_labels.size() for complete AUC
double rocN | ( | const std::vector< PeptideIdentification > & | ids, |
Size | fp_cutoff | ||
) | const |
calculates the AUC until the first fp_cutoff False positive pep IDs (currently only takes all runs together) if fp_cutoff = 0, it will calculate the full AUC
double rocN | ( | const std::vector< PeptideIdentification > & | ids, |
Size | fp_cutoff, | ||
const String & | identifier | ||
) | const |
calculates the AUC until the first fp_cutoff False positive pep IDs (currently only takes all runs together) if fp_cutoff = 0, it will calculate the full AUC. Restricted to IDs from a specific ID run.
|
private |
calculates the trapezoidal area for a trapezoid with a flat horizontal base e.g. for an AUC
|
private |
calculates the error area around the x=x line between two consecutive values of expected and actual i.e. it assumes exp2 > exp1