OpenMS
MZTrafoModel Class Reference

Create and apply models of a mass recalibration function. More...

#include <OpenMS/PROCESSING/CALIBRATION/MZTrafoModel.h>

Collaboration diagram for MZTrafoModel:
[legend]

Classes

struct  RTLess
 Comparator by position. As this class has dimension 1, this is basically an alias for MZLess. More...
 

Public Types

enum  MODELTYPE {
  LINEAR , LINEAR_WEIGHTED , QUADRATIC , QUADRATIC_WEIGHTED ,
  SIZE_OF_MODELTYPE
}
 

Public Member Functions

 MZTrafoModel ()
 Default constructor. More...
 
 MZTrafoModel (bool ppm_model)
 Default constructor. More...
 
bool isTrained () const
 Does the model have coefficients (i.e. was trained successfully). More...
 
double getRT () const
 Get RT associated with the model (training region) More...
 
double predict (double mz) const
 Apply the model to an uncalibrated m/z value. More...
 
bool train (const CalibrationData &cd, MODELTYPE md, bool use_RANSAC, double rt_left=-std::numeric_limits< double >::max(), double rt_right=std::numeric_limits< double >::max())
 Train a model using calibrant data. More...
 
bool train (std::vector< double > error_mz, std::vector< double > theo_mz, std::vector< double > weights, MODELTYPE md, bool use_RANSAC)
 Train a model using calibrant data. More...
 
void getCoefficients (double &intercept, double &slope, double &power)
 Get model coefficients. More...
 
void setCoefficients (const MZTrafoModel &rhs)
 Copy model coefficients from another model. More...
 
void setCoefficients (double intercept, double slope, double power)
 Manually set model coefficients. More...
 
String toString () const
 String representation of the model parameters. More...
 

Static Public Member Functions

static MODELTYPE nameToEnum (const std::string &name)
 Convert string to enum. More...
 
static const std::string & enumToName (MODELTYPE mt)
 Convert enum to string. More...
 
static void setRANSACParams (const Math::RANSACParam &p)
 Set the global (program wide) parameters for RANSAC. More...
 
static void setRANSACSeed (int seed)
 Set RANSAC seed. More...
 
static void setCoefficientLimits (double offset, double scale, double power)
 Set coefficient boundaries for which the model coefficient must not exceed to be considered a valid model. More...
 
static bool isValidModel (const MZTrafoModel &trafo)
 Predicate to decide if the model has valid parameters, i.e. coefficients. More...
 
static Size findNearest (const std::vector< MZTrafoModel > &tms, double rt)
 Binary search for the model nearest to a specific RT. More...
 

Static Public Attributes

static const std::string names_of_modeltype []
 strings corresponding to enum MODELTYPE More...
 

Private Attributes

std::vector< double > coeff_
 Model coefficients (for both linear and quadratic models), estimated from the data. More...
 
bool use_ppm_
 during training, model is build on absolute or relative(ppm) predictions. predict(), i.e. applying the model, requires this information too More...
 
double rt_
 retention time associated to the model (i.e. where the calibrant data was taken from) More...
 

Static Private Attributes

static Math::RANSACParamransac_params_
 global pointer, init to NULL at startup; set class-global RANSAC params More...
 
static int ransac_seed_
 seed used for all RANSAC invocations More...
 
static double limit_offset_
 acceptable boundary for the estimated offset; if estimated offset is larger (absolute) the model does not validate (isValidModel()) More...
 
static double limit_scale_
 acceptable boundary for the estimated scale; if estimated scale is larger (absolute) the model does not validate (isValidModel()) More...
 
static double limit_power_
 acceptable boundary for the estimated power; if estimated power is larger (absolute) the model does not validate (isValidModel()) More...
 

Detailed Description

Create and apply models of a mass recalibration function.

The input is a list of calibration points (ideally spanning a wide m/z range to prevent extrapolation when applying to model).

Models (LINEAR, LINEAR_WEIGHTED, QUADRATIC, QUADRATIC_WEIGHTED) can be trained using CalData points (or a subset of them). Calibration points can have different retention time points, and a model should be build such that it captures the local (in time) decalibration of the instrument, i.e. choose appropriate time windows along RT to calibrate the spectra in this RT region. From the available calibrant data, a model is build. Later, any uncalibrated m/z value can be fed to the model, to obtain a calibrated m/z.

The input domain can either be absolute mass differences in [Th], or relative differences in [ppm]. The models are build based on this input.

Outlier detection before model building via the RANSAC algorithm is supported for LINEAR and QUADRATIC models.

Member Enumeration Documentation

◆ MODELTYPE

enum MODELTYPE
Enumerator
LINEAR 
LINEAR_WEIGHTED 
QUADRATIC 
QUADRATIC_WEIGHTED 
SIZE_OF_MODELTYPE 

Constructor & Destructor Documentation

◆ MZTrafoModel() [1/2]

Default constructor.

◆ MZTrafoModel() [2/2]

MZTrafoModel ( bool  ppm_model)

Default constructor.

If you have external coefficients, use this constructor and the setCoefficients() method to build a 'manual' model. Afterwards, use applyTransformation() or predict() to calibrate your data. If you call train(), the ppm-setting will be overwritten, depending on the type of training data.

Parameters
ppm_modelAre the coefficients derived from ppm calibration data, or from absolute deltas?

Member Function Documentation

◆ enumToName()

static const std::string& enumToName ( MODELTYPE  mt)
static

Convert enum to string.

Parameters
mtThe enum value
Returns
Stringified version

◆ findNearest()

static Size findNearest ( const std::vector< MZTrafoModel > &  tms,
double  rt 
)
static

Binary search for the model nearest to a specific RT.

Parameters
tmsVector of models, sorted by RT
rtThe target retention time
Returns
Returns the index into 'tms' with the closest RT.
Note
Make sure the vector is sorted with respect to RT! Otherwise the result is undefined.
Exceptions
Exception::Preconditionis thrown if the vector is empty (not only in debug mode)

◆ getCoefficients()

void getCoefficients ( double &  intercept,
double &  slope,
double &  power 
)

Get model coefficients.

Parameters will be filled with internal model parameters. The model must be trained before; Exception is thrown otherwise!

Parameters
interceptThe intercept
slopeThe slope
powerThe coefficient for x*x (will be 0 for linear models)
Exceptions
Exception::Preconditionif model is not trained yet

◆ getRT()

double getRT ( ) const

Get RT associated with the model (training region)

◆ isTrained()

bool isTrained ( ) const

Does the model have coefficients (i.e. was trained successfully).

Having coefficients does not mean its valid (see isValidModel(); since coeffs might be too large).

◆ isValidModel()

static bool isValidModel ( const MZTrafoModel trafo)
static

Predicate to decide if the model has valid parameters, i.e. coefficients.

If the model coefficients are empty, no model was trained yet (or unsuccessful), causing a return value of 'false'.

Also, if the model has coefficients, we check if they are within the acceptable boundaries (if boundaries were given via setCoeffientLimits()).

◆ nameToEnum()

static MODELTYPE nameToEnum ( const std::string &  name)
static

Convert string to enum.

Returns 'SIZE_OF_MODELTYPE' if string is unknown.

Parameters
nameA string from names_of_modeltype[].
Returns
The corresponding enum value.

◆ predict()

double predict ( double  mz) const

Apply the model to an uncalibrated m/z value.

Make sure the model was trained (train()) and is valid (isValidModel()) before calling this function!

Applies the function y = intercept + slope*mz + power*mz^2 and returns y.

Parameters
mzThe uncalibrated m/z value
Returns
The calibrated m/z value

◆ setCoefficientLimits()

static void setCoefficientLimits ( double  offset,
double  scale,
double  power 
)
static

Set coefficient boundaries for which the model coefficient must not exceed to be considered a valid model.

Use std::numeric_limits<double>::max() for no limit (default). If isValidModel() is called these limits are checked. Negative input run through fabs() to get positive values (since comparison is done in absolute terms).

◆ setCoefficients() [1/2]

void setCoefficients ( const MZTrafoModel rhs)

Copy model coefficients from another model.

◆ setCoefficients() [2/2]

void setCoefficients ( double  intercept,
double  slope,
double  power 
)

Manually set model coefficients.

Can be used instead of train(), so manually set coefficients. It must be exactly three values. If you want a linear model, set 'power' to zero. If you want a constant model, set slope to zero in addition.

Parameters
interceptThe offset
slopeThe slope
powerThe x*x coefficient (for quadratic models)

◆ setRANSACParams()

static void setRANSACParams ( const Math::RANSACParam p)
static

Set the global (program wide) parameters for RANSAC.

This is not done via member, to keep a small memory footprint since hundreds of MZTrafoModels are expected to be build at the same time and the RANSAC params should be identical for all of them.

Parameters
pRANSAC params

◆ setRANSACSeed()

static void setRANSACSeed ( int  seed)
static

Set RANSAC seed.

◆ toString()

String toString ( ) const

String representation of the model parameters.

Empty if model is not trained.

◆ train() [1/2]

bool train ( const CalibrationData cd,
MODELTYPE  md,
bool  use_RANSAC,
double  rt_left = -std::numeric_limits< double >::max(),
double  rt_right = std::numeric_limits< double >::max() 
)

Train a model using calibrant data.

If the CalibrationData was created using peak groups (usually corresponding to mass traces), the median for each group is used as a group representative. This is more robust, and reduces the number of data points drastically, i.e. one value per group.

Internally, these steps take place:

  • apply RT filter
  • [compute median per group] (only if groups were given in 'cd')
  • set Model's rt position
  • call train() (see overloaded method)
Parameters
cdList of calibrants
mdType of model (linear, quadratic, ...)
use_RANSACRemove outliers before computing the model?
rt_leftFilter 'cd' by RT; all calibrants with RT < 'rt_left' are removed
rt_rightFilter 'cd' by RT; all calibrants with RT > 'rt_right' are removed
Returns
True if model was build, false otherwise

◆ train() [2/2]

bool train ( std::vector< double >  error_mz,
std::vector< double >  theo_mz,
std::vector< double >  weights,
MODELTYPE  md,
bool  use_RANSAC 
)

Train a model using calibrant data.

Given theoretical and observed mass values (and corresponding weights), a model (linear, quadratic, ...) is build. Outlier removal is applied before. The 'obs_mz' can be either given as absolute masses in [Th] or relative deviations in [ppm]. The MZTrafoModel must be constructed accordingly (see constructor). This has no influence on the model building itself, but rather on how 'predict()' works internally.

Outlier detection before model building via the RANSAC algorithm is supported for LINEAR and QUADRATIC models.

Internally, these steps take place:

  • [apply RANSAC] (depending on 'use_RANSAC')
  • build model and store its parameters internally
Parameters
error_mzObserved Mass error (in ppm or Th)
theo_mzTheoretical m/z values, corresponding to 'error_mz'
weightsFor weighted models only: weight of calibrants; ignored otherwise
mdType of model (linear, quadratic, ...)
use_RANSACRemove outliers before computing the model?
Returns
True if model was build, false otherwise

Member Data Documentation

◆ coeff_

std::vector<double> coeff_
private

Model coefficients (for both linear and quadratic models), estimated from the data.

◆ limit_offset_

double limit_offset_
staticprivate

acceptable boundary for the estimated offset; if estimated offset is larger (absolute) the model does not validate (isValidModel())

◆ limit_power_

double limit_power_
staticprivate

acceptable boundary for the estimated power; if estimated power is larger (absolute) the model does not validate (isValidModel())

◆ limit_scale_

double limit_scale_
staticprivate

acceptable boundary for the estimated scale; if estimated scale is larger (absolute) the model does not validate (isValidModel())

◆ names_of_modeltype

const std::string names_of_modeltype[]
static

strings corresponding to enum MODELTYPE

◆ ransac_params_

Math::RANSACParam* ransac_params_
staticprivate

global pointer, init to NULL at startup; set class-global RANSAC params

◆ ransac_seed_

int ransac_seed_
staticprivate

seed used for all RANSAC invocations

◆ rt_

double rt_
private

retention time associated to the model (i.e. where the calibrant data was taken from)

Referenced by MZTrafoModel::RTLess::operator()().

◆ use_ppm_

bool use_ppm_
private

during training, model is build on absolute or relative(ppm) predictions. predict(), i.e. applying the model, requires this information too