|
double | getDecoyDiff_ (const PeptideIdentification &pep_id) const |
| Calculates the xcorr difference between the top two hits marked as decoy. More...
|
|
double | getDecoyCutOff_ (const std::vector< PeptideIdentification > &pep_ids, double reranking_cutoff_percentile) const |
| Calculates a xcorr cut-off based on decoy hits. More...
|
|
bool | isNovoHit_ (const PeptideHit &hit) const |
| Tests if a PeptideHit is considered a deNovo hit. More...
|
|
bool | checkScoreBetterThanThreshold_ (const PeptideHit &hit, double threshold, bool higher_score_better) const |
| Tests if a PeptideHit has a score better than the given threshold. More...
|
|
std::pair< String, Param > | extractSearchAdapterInfoFromMetaValues_ (const ProteinIdentification::SearchParameters &meta_values) const |
| Looks through meta values of SearchParameters to find out which search adapter was used. More...
|
|
void | writeIniFile_ (const Param ¶meters, const String &filename) const |
| Writes parameters into a given file. More...
|
|
std::vector< PeptideIdentification > | runIdentificationSearch_ (const MSExperiment &exp, const std::vector< FASTAFile::FASTAEntry > &fasta_data, const String &adapter_name, Param ¶meters) const |
| Executes the workflow from search adapter, followed by PeptideIndexer and finishes with FDR. More...
|
|
std::vector< FASTAFile::FASTAEntry > | getSubsampledFasta_ (const std::vector< FASTAFile::FASTAEntry > &fasta_data, double subsampling_rate) const |
| Creates a subsampled fasta with the given subsampling rate. More...
|
|
void | calculateSuitability_ (const std::vector< PeptideIdentification > &pep_ids, SuitabilityData &data) const |
| Calculates all suitability data from a combined deNovo+database search. More...
|
|
void | appendDecoys_ (std::vector< FASTAFile::FASTAEntry > &fasta) const |
| Calculates and appends decoys to a given vector of FASTAEntry. More...
|
|
double | extractScore_ (const PeptideHit &pep_hit) const |
| Returns the cross correlation score normalized by MW (if existing), else if the 'force' flag is set the current main score is returned. More...
|
|
double | calculateCorrectionFactor_ (const SuitabilityData &data, const SuitabilityData &data_sampled, double sampling_rate) const |
| Calculates the correction factor from two suitability calculations. More...
|
|
UInt | numberOfUniqueProteins_ (const std::vector< PeptideIdentification > &peps, UInt number_of_hits=1) const |
| Determines the number of unique proteins found in the protein accessions of PeptideIdentifications. More...
|
|
Size | getIndexWithMedianNovoHits_ (const std::vector< SuitabilityData > &data) const |
| Finds the SuitabilityData object with the median number of de novo hits. More...
|
|
double | getScoreMatchingFDR_ (const std::vector< PeptideIdentification > &pep_ids, double FDR, const String &score_name, bool higher_score_better) const |
| Extracts the worst score that still passes a FDR (q-value) threshold. More...
|
|
This class holds the functionality of calculating the database suitability.
To calculate the suitability of a database for a specific mzML for identification search, it is vital to perform a combined deNovo+database identification search. Meaning that the database should be appended with an additional entry derived from concatenated deNovo sequences from said mzML. Currently only Comet search is supported.
This class will calculate q-values by itself and will throw an error if any q-value calculation was done beforehand.
The algorithm parameters can be set using setParams().
Allows for multiple usage of the compute function. The result of each call is stored internally in a vector. Therefore old results will not be overridden by a new call. This vector then can be returned using getResults().
This class serves as the library representation of DatabaseSuitability
Calculates all suitability data from a combined deNovo+database search.
Counts top database and top deNovo hits.
Calculates a decoy score cut-off to compare high scoring deNovo hits with lower scoring database hits. If the score difference is smaller than the cut-off the database hit is counted and the deNovo hit ignored.
Suitability is calculated: # database hits / # all hits
- Parameters
-
pep_ids | peptide identifications coming from the combined search, each peptide identification should be sorted |
data | SuitabilityData object where the result should be written into |
- Exceptions
-
MissingInformation | if no target/decoy annotation is found on pep_ids |
MissingInformation | if no xcorr is found, this happens when another adapter than CometAdapter was used |
Computes suitability of a database used to search a mzML.
Top deNovo and top database hits from a combined deNovo+database search are counted. The ratio of db hits vs all hits yields the suitability. To re-rank cases, where a de novo peptide scores just higher than the database peptide, a decoy cut-off is calculated. This functionality can be turned off. This will result in an underestimated suitability, but it can solve problems like different search engines or to few decoy hits.
Parameters can be set using the functionality of DefaultParamHandler. Parameters are: no_rerank - re-ranking can be turned off with this (will be set automatically if no cross correlation score is found) reranking_cutoff_percentile - percentile that determines which cut-off will be returned FDR - q-value that should be filtered for Preliminary tests have shown that database suitability is rather stable across common FDR thresholds from 0 - 5 % keep_search_files - temporary files created for and by the internal ID search are kept disable_correction - disables corrected suitability calculations force - forces re-ranking to be done even without a cross correlation score, in which case the default main score is used
The calculated suitability is then tried to be corrected. For this a correction factor for the number of found top deNovo hits is calculated. This is done by perfoming an additional combined identification search with a smaller sample of the database. It was observed that the number of top deNovo and db hits behave linear according to the sampling ratio of the database. This can be used to extrapolate the number of database hits that would be needed to get a suitability of 1. This number in combination with the maximum number of deNovo hits (found with an identification search where only deNovo is used as a database) can be used to calculate a correction factor like this: #database hits for suitability of 1 / #maximum deNovo hits This formula can be simplified in a way that the maximum number of deNovo hits isn't needed:
- (database hits slope) / deNovo hits slope Both of these values can easily be calculated with the original suitability data in conjunction with the one sampled search.
Correcting the number of found top deNovo hits with this factor results in them being more comparable to the top database hits. This in return results in a more linear behaviour of the suitability according to the sampling ratio. The corrected suitability reflects what sampling ratio your database represents regarding to the theoretical 'perfect' database. Or in other words: Your database needs to be (1 - corrected suitability) bigger to get a suitability of 1.
Both the original suitability as well as the corrected one are reported in the result.
Since q-values need to be calculated the identifications are taken by copy. Since decoys need to be calculated for the fasta input those are taken by copy as well.
Result is appended to the result member. This allows for multiple usage.
- Parameters
-
pep_ids | vector containing pepIDs with target/decoy annotation coming from a deNovo+database identification search without FDR (Comet is recommended - to use other search engines either disable reranking or set the '-force' flag) vector is modified internally, and is thus copied |
exp | MSExperiment that was searched to produce the identifications given in pep_ids |
original_fasta | FASTAEntries of the database used for the ID search (without decoys) |
novo_fasta | FASTAEntry derived from deNovo peptides |
search_params | SearchParameters object containing information which adapter was used with which settings for the identification search that resulted in pep_ids |
- Exceptions
-
MissingInformation | if no target/decoy annotation is found on pep_ids |
MissingInformation | if no xcorr is found, this happens when another adapter than CometAdapter was used |
Precondition | if a q-value is found in pep_ids |
Calculates the xcorr difference between the top two hits marked as decoy.
Searches for the top two decoys hits and returns their score difference. By default the xcorr from Comet is used. If no xcorr can be found and the 'force' flag is set the main score from the peptide hit is used, else an error is thrown.
If there aren't two decoys, DBL_MAX is returned.
- Parameters
-
pep_id | pepID from where the decoy difference will be calculated |
- Returns
- xcorr difference
- Exceptions
-
MissingInformation | if no target/decoy annotation is found |
MissingInformation | if no xcorr is found |
Tests if a PeptideHit is considered a deNovo hit.
To test this the function looks into the protein accessions. If only the deNovo protein is found, 'true' is returned. If at least one database protein is found, 'false' is returned.
This function also uses boost::regex_search to make sure the deNovo accession doesn't contain a decoy string. This is needed for 'target+decoy' hits.
- Parameters
-
- Returns
- true/false
Executes the workflow from search adapter, followed by PeptideIndexer and finishes with FDR.
Which adapter should run with which parameters can be controlled. Make sure the search adapter you wish to use is built on your system and the executable is on your PATH variable.
Indexing and FDR are always done the same way.
The inputs are stored in temporary files to execute the Adapter. (MSExperiment -> .mzML, vector<FASTAEntry> -> .fasta, Param -> .INI)
- Parameters
-
exp | MSExperiment that will be searched |
fasta_data | represents the database that should be used to search |
adapter_name | name of the adapter to search with |
parameters | parameters for the adapter |
- Returns
- peptide identifications with annotated q-values
- Exceptions
-
MissingInformation | if no adapter name is given |
InvalidParameter | if a not supported adapter name is given |
InternalToolError | if any error occures while running the adapter |
InternalToolError | if any error occures while running PeptideIndexer functionalities |
InvalidParameter | if the needed FDR parameters are not found |