OpenMS
AASequence Class Reference

Representation of a peptide/protein sequence. More...

#include <OpenMS/CHEMISTRY/AASequence.h>

Collaboration diagram for AASequence:
[legend]

Classes

class  ConstIterator
 ConstIterator for AASequence. More...
 
class  Iterator
 Iterator class for AASequence. More...
 

Public Member Functions

Constructors and Destructors
 AASequence ()
 Default constructor. More...
 
 AASequence (const AASequence &)=default
 Copy constructor. More...
 
 AASequence (AASequence &&) noexcept=default
 Move constructor. More...
 
virtual ~AASequence ()
 Destructor. More...
 
AASequenceoperator= (const AASequence &)=default
 Assignment operator. More...
 
AASequenceoperator= (AASequence &&)=default
 Move assignment operator. More...
 
bool empty () const
 check if sequence is empty More...
 
Accessors
String toString () const
 returns the peptide as string with modifications embedded in brackets More...
 
String toUnmodifiedString () const
 returns the peptide as string without any modifications or (e.g., "PEPTIDER") More...
 
String toUniModString () const
 returns the peptide as string with UniMod-style modifications embedded in brackets More...
 
String toBracketString (bool integer_mass=true, bool mass_delta=false, const std::vector< String > &fixed_modifications=std::vector< String >()) const
 create a TPP compatible string of the modified sequence using bracket notation. More...
 
void setModification (Size index, const String &modification)
 
void setModification (Size index, const Residue *modification)
 sets the modification of AA at index by providing an already, potentially modified residue More...
 
void setModification (Size index, const ResidueModification *modification)
 sets the modification of AA at index by providing a pointer to a ResidueModification object found in the ModificationsDB More...
 
void setModification (Size index, const ResidueModification &modification)
 
void setModificationByDiffMonoMass (Size index, double diffMonoMass)
 modifies the residue at index in the sequence and potentially in the ResidueDB More...
 
void setNTerminalModification (const String &modification)
 
void setNTerminalModification (const ResidueModification *modification)
 sets the N-terminal modification More...
 
void setNTerminalModification (const ResidueModification &mod)
 sets the N-terminal modification (copies and adds to database if not present) More...
 
void setNTerminalModificationByDiffMonoMass (double diffMonoMass, bool protein_term)
 sets the N-terminal modification by the monoisotopic mass difference it introduces (creates a "user-defined" mod if not present) More...
 
const StringgetNTerminalModificationName () const
 returns the name (ID) of the N-terminal modification, or an empty string if none is set More...
 
const ResidueModificationgetNTerminalModification () const
 returns a pointer to the N-terminal modification, or zero if none is set More...
 
void setCTerminalModification (const String &modification)
 
void setCTerminalModification (const ResidueModification *modification)
 sets the C-terminal modification (must be present in the database) More...
 
void setCTerminalModification (const ResidueModification &mod)
 sets the C-terminal modification (copies and adds to database if not present) More...
 
void setCTerminalModificationByDiffMonoMass (double diffMonoMass, bool protein_term)
 sets the C-terminal modification by the monoisotopic mass difference it introduces (creates a "user-defined" mod if not present) More...
 
const StringgetCTerminalModificationName () const
 returns the name (ID) of the C-terminal modification, or an empty string if none is set More...
 
const ResidueModificationgetCTerminalModification () const
 returns a pointer to the C-terminal modification, or zero if none is set More...
 
const ResiduegetResidue (Size index) const
 returns a pointer to the residue at position index More...
 
EmpiricalFormula getFormula (Residue::ResidueType type=Residue::Full, Int charge=0) const
 returns the formula of the peptide More...
 
double getAverageWeight (Residue::ResidueType type=Residue::Full, Int charge=0) const
 returns the average weight of the peptide More...
 
double getMonoWeight (Residue::ResidueType type=Residue::Full, Int charge=0) const
 
double getMZ (Int charge, Residue::ResidueType type=Residue::Full) const
 
const Residueoperator[] (Size index) const
 returns a pointer to the residue at given position More...
 
AASequence operator+ (const AASequence &peptide) const
 adds the residues of the peptide More...
 
AASequenceoperator+= (const AASequence &)
 adds the residues of a peptide More...
 
AASequence operator+ (const Residue *residue) const
 adds the residues of the peptide More...
 
AASequenceoperator+= (const Residue *)
 adds the residues of a peptide More...
 
Size size () const
 returns the number of residues More...
 
AASequence getPrefix (Size index) const
 returns a peptide sequence of the first index residues More...
 
AASequence getSuffix (Size index) const
 returns a peptide sequence of the last index residues More...
 
AASequence getSubsequence (Size index, UInt number) const
 returns a peptide sequence of number residues, beginning at position index More...
 
void getAAFrequencies (std::map< String, Size > &frequency_table) const
 compute frequency table of amino acids More...
 
Predicates
bool has (const Residue &residue) const
 returns true if the peptide contains the given residue More...
 
bool hasSubsequence (const AASequence &peptide) const
 
bool hasPrefix (const AASequence &peptide) const
 
bool hasSuffix (const AASequence &peptide) const
 
bool hasNTerminalModification () const
 predicate which is true if the peptide is N-term modified More...
 
bool hasCTerminalModification () const
 predicate which is true if the peptide is C-term modified More...
 
bool isModified () const
 returns true if any of the residues or termini are modified More...
 
bool operator== (const AASequence &rhs) const
 equality operator. Two sequences are equal iff all amino acids including PTMs are equal More...
 
bool operator< (const AASequence &rhs) const
 lesser than operator which compares the C-term mods, sequence including PTMS and N-term mods; can be used for maps More...
 
bool operator!= (const AASequence &rhs) const
 inequality operator. Complement of equality operator. More...
 
Iterators
Iterator begin ()
 
ConstIterator begin () const
 
Iterator end ()
 
ConstIterator end () const
 

Stream operators

std::vector< const Residue * > peptide_
 
const ResidueModificationn_term_mod_
 
const ResidueModificationc_term_mod_
 
std::ostream & operator<< (std::ostream &os, const AASequence &peptide)
 writes a peptide to an output stream More...
 
std::istream & operator>> (std::istream &is, const AASequence &peptide)
 reads a peptide from an input stream More...
 
static AASequence fromString (const String &s, bool permissive=true)
 create AASequence object by parsing an OpenMS string More...
 
static AASequence fromString (const char *s, bool permissive=true)
 create AASequence object by parsing a C string (character array) More...
 
static String::ConstIterator parseModRoundBrackets_ (const String::ConstIterator str_it, const String &str, AASequence &aas, const ResidueModification::TermSpecificity &specificity)
 Parses modifications in round brackets (an identifier) More...
 
static String::ConstIterator parseModSquareBrackets_ (const String::ConstIterator str_it, const String &str, AASequence &aas, const ResidueModification::TermSpecificity &specificity)
 Parses modifications in square brackets (a mass) More...
 
static void parseString_ (const String &peptide, AASequence &aas, bool permissive=true)
 

Detailed Description

Representation of a peptide/protein sequence.

This class represents amino acid sequences in OpenMS. An AASequence instance primarily contains a sequence of residues. The sequence is represented as a vector of pointers to instances of Residue. Each amino acid has only one instance, which is accessible using the ResidueDB instance (singleton).

To create an AASequence instance for a specific amino acid sequence, use the AASequence::fromString function. For example, AASequence::fromString(".DFPIANGER.") produces an instance of AASequence for the peptide "DFPIANGER". Please note that both the N- and the C-terminal are explicitly represented by dots.

A critical property of amino acid sequences is that they can be modified. Which means that one or more amino acids are chemically modified, e.g. oxidized. This is represented via Residue instances which carry a ResidueModification object. This is also handled in the ResidueDB.

Modifications are specified using a unique string identifier present in the ModificationsDB in round brackets after the modified amino acid or by providing the mass of the residue in square brackets. For example AASequence::fromString(".DFPIAM(Oxidation)GER.") creates an instance of the peptide "DFPIAMGER" with an oxidized methionine (AASequence::fromString(".DFPIAM(UniMod:35)GER."), AASequence::fromString(".DFPIAM[+16]GER.") and AASequence::fromString(".DFPIAM[147]GER.") are all equivalent). N- and C-terminal modifications are represented by brackets to the right of the dots terminating the sequence. For example, ".(Dimethyl)DFPIAMGER." and ".DFPIAMGER.(Label:18O(2))" represent the labelling of the N- and C-terminus respectively, but ".DFPIAMGER(Phospho)." will be interpreted as a phosphorylation of the last arginine at its side chain.

Note there is a subtle difference between AASequence::fromString(".DFPIAM[+16]GER.") and AASequence::fromString(".DFPIAM[+15.9949]GER.") – while the former will try to find the first modification matching to a mass difference of 16 +/- 0.5, the latter will try to find the closest matching modification to the exact mass. This usually gives the intended results while the first approach may not.

Arbitrary/unknown amino acids (usually due to an unknown modification) can be specified using tags preceded by X: "X[weight]". This indicates a new amino acid ("X") with the specified weight, e.g. "RX[148.5]T"". Note that this tag does not alter the amino acids to the left (R) or right (T). Rather, X represents an amino acid on its own. Be careful when converting such AASequence objects to an EmpiricalFormula using getFormula(), as tags will not be considered in this case (there exists no formula for them). However, they have an influence on getMonoWeight() and getAverageWeight()!

Note
For C/N terminal modifications, the absolute mass is assumed to be 1 (H) for the N-terminus and 17 (OH) for the C-terminus, therefore a modification specified as absolute n[43]PEPTIDE would translate to n[+42]PEPTIDE using relative masses. Note that there can be ambiguity in cases where the loss includes the terminal amino acids.

Constructor & Destructor Documentation

◆ AASequence() [1/3]

Default constructor.

◆ AASequence() [2/3]

AASequence ( const AASequence )
default

Copy constructor.

◆ AASequence() [3/3]

AASequence ( AASequence &&  )
defaultnoexcept

Move constructor.

◆ ~AASequence()

virtual ~AASequence ( )
virtual

Destructor.

Member Function Documentation

◆ begin() [1/2]

Iterator begin ( )
inline

◆ begin() [2/2]

ConstIterator begin ( ) const
inline

◆ empty()

bool empty ( ) const

check if sequence is empty

Referenced by OPXLDataStructs::ProteinProteinCrossLink::getType().

◆ end() [1/2]

Iterator end ( )
inline

◆ end() [2/2]

ConstIterator end ( ) const
inline

◆ fromString() [1/2]

static AASequence fromString ( const char *  s,
bool  permissive = true 
)
static

create AASequence object by parsing a C string (character array)

Parameters
sInput string
permissiveIf set, skip spaces and replace stop codon symbols ("*", "#", "+") by "X" (unknown amino acid) during parsing
Exceptions
Exception::ParseErrorif an invalid string representation of an AA sequence is passed

◆ fromString() [2/2]

static AASequence fromString ( const String s,
bool  permissive = true 
)
static

create AASequence object by parsing an OpenMS string

Parameters
sInput string
permissiveIf set, skip spaces and replace stop codon symbols ("*", "#", "+") by "X" (unknown amino acid) during parsing
Exceptions
Exception::ParseErrorif an invalid string representation of an AA sequence is passed

Referenced by IDFilter::DigestionFilter::operator()().

◆ getAAFrequencies()

void getAAFrequencies ( std::map< String, Size > &  frequency_table) const

compute frequency table of amino acids

◆ getAverageWeight()

double getAverageWeight ( Residue::ResidueType  type = Residue::Full,
Int  charge = 0 
) const

returns the average weight of the peptide

◆ getCTerminalModification()

const ResidueModification* getCTerminalModification ( ) const

returns a pointer to the C-terminal modification, or zero if none is set

◆ getCTerminalModificationName()

const String& getCTerminalModificationName ( ) const

returns the name (ID) of the C-terminal modification, or an empty string if none is set

◆ getFormula()

EmpiricalFormula getFormula ( Residue::ResidueType  type = Residue::Full,
Int  charge = 0 
) const

returns the formula of the peptide

◆ getMonoWeight()

double getMonoWeight ( Residue::ResidueType  type = Residue::Full,
Int  charge = 0 
) const

returns the mono isotopic weight of the peptide in the given ionic form

Note
will not (and cannot) control whether the required ion can exist (e.g. x/c ions for monomers) as it does not do fragmentation but rather supplementing/deduction of the sequence to its ionic form.

◆ getMZ()

double getMZ ( Int  charge,
Residue::ResidueType  type = Residue::Full 
) const

returns mass-to-charge ratio of the peptide in the given ionic form

Note
will not (and cannot) control whether the required ion can exist (e.g. x/c ions for monomers) as it does not do fragmentation but rather supplementing/deduction of the sequence to its ionic form.
Exceptions
Exception::InvalidValueif charge==0

◆ getNTerminalModification()

const ResidueModification* getNTerminalModification ( ) const

returns a pointer to the N-terminal modification, or zero if none is set

◆ getNTerminalModificationName()

const String& getNTerminalModificationName ( ) const

returns the name (ID) of the N-terminal modification, or an empty string if none is set

◆ getPrefix()

AASequence getPrefix ( Size  index) const

returns a peptide sequence of the first index residues

◆ getResidue()

const Residue& getResidue ( Size  index) const

returns a pointer to the residue at position index

◆ getSubsequence()

AASequence getSubsequence ( Size  index,
UInt  number 
) const

returns a peptide sequence of number residues, beginning at position index

◆ getSuffix()

AASequence getSuffix ( Size  index) const

returns a peptide sequence of the last index residues

◆ has()

bool has ( const Residue residue) const

returns true if the peptide contains the given residue

◆ hasCTerminalModification()

bool hasCTerminalModification ( ) const

predicate which is true if the peptide is C-term modified

◆ hasNTerminalModification()

bool hasNTerminalModification ( ) const

predicate which is true if the peptide is N-term modified

◆ hasPrefix()

bool hasPrefix ( const AASequence peptide) const

returns true if the peptide has the given prefix n-term mod is also checked (c-term as well, if prefix is of same length)

◆ hasSubsequence()

bool hasSubsequence ( const AASequence peptide) const

returns true if the peptide contains the given peptide

Note
c-term and n-term mods are ignored

◆ hasSuffix()

bool hasSuffix ( const AASequence peptide) const

returns true if the peptide has the given suffix c-term mod is also checked (n-term as well, if suffix is of same length)

◆ isModified()

bool isModified ( ) const

returns true if any of the residues or termini are modified

◆ operator!=()

bool operator!= ( const AASequence rhs) const

inequality operator. Complement of equality operator.

◆ operator+() [1/2]

AASequence operator+ ( const AASequence peptide) const

adds the residues of the peptide

◆ operator+() [2/2]

AASequence operator+ ( const Residue residue) const

adds the residues of the peptide

◆ operator+=() [1/2]

AASequence& operator+= ( const AASequence )

adds the residues of a peptide

◆ operator+=() [2/2]

AASequence& operator+= ( const Residue )

adds the residues of a peptide

◆ operator<()

bool operator< ( const AASequence rhs) const

lesser than operator which compares the C-term mods, sequence including PTMS and N-term mods; can be used for maps

◆ operator=() [1/2]

AASequence& operator= ( AASequence &&  )
default

Move assignment operator.

◆ operator=() [2/2]

AASequence& operator= ( const AASequence )
default

Assignment operator.

◆ operator==()

bool operator== ( const AASequence rhs) const

equality operator. Two sequences are equal iff all amino acids including PTMs are equal

◆ operator[]()

const Residue& operator[] ( Size  index) const

returns a pointer to the residue at given position

◆ parseModRoundBrackets_()

static String::ConstIterator parseModRoundBrackets_ ( const String::ConstIterator  str_it,
const String str,
AASequence aas,
const ResidueModification::TermSpecificity specificity 
)
staticprotected

Parses modifications in round brackets (an identifier)

If dot notation is used it resolves cterm ambiguity based on the presence of the dot.

Parameters
str_itCurrent position in the string to be parsed
strFull input string
aasCurrent AASequence object (will be modified with the correct residue added)
specificityWhether the current modification should be interpreted as N- or C-terminal
Returns
Position at which to continue parsing

◆ parseModSquareBrackets_()

static String::ConstIterator parseModSquareBrackets_ ( const String::ConstIterator  str_it,
const String str,
AASequence aas,
const ResidueModification::TermSpecificity specificity 
)
staticprotected

Parses modifications in square brackets (a mass)

If dot notation is used it resolves cterm ambiguity based on the presence of the dot.

Parameters
str_itCurrent position in the string to be parsed
strFull input string
aasCurrent AASequence object (will be modified with the correct residue added)
specificityWhether the current modification should be interpreted as N- or C-terminal
Returns
Position at which to continue parsing

◆ parseString_()

static void parseString_ ( const String peptide,
AASequence aas,
bool  permissive = true 
)
staticprotected

◆ setCTerminalModification() [1/3]

void setCTerminalModification ( const ResidueModification mod)

sets the C-terminal modification (copies and adds to database if not present)

◆ setCTerminalModification() [2/3]

void setCTerminalModification ( const ResidueModification modification)

sets the C-terminal modification (must be present in the database)

◆ setCTerminalModification() [3/3]

void setCTerminalModification ( const String modification)

sets the C-terminal modification (by lookup in the mod names of the ModificationsDB) throws if nothing is found (since the name is not enough information to create a new mod)

◆ setCTerminalModificationByDiffMonoMass()

void setCTerminalModificationByDiffMonoMass ( double  diffMonoMass,
bool  protein_term 
)

sets the C-terminal modification by the monoisotopic mass difference it introduces (creates a "user-defined" mod if not present)

◆ setModification() [1/4]

void setModification ( Size  index,
const Residue modification 
)

sets the modification of AA at index by providing an already, potentially modified residue

◆ setModification() [2/4]

void setModification ( Size  index,
const ResidueModification modification 
)

sets the modification of AA at index by providing a ResidueModification object stricter than just looking for the name and adds the Modification to the DB if not present

◆ setModification() [3/4]

void setModification ( Size  index,
const ResidueModification modification 
)

sets the modification of AA at index by providing a pointer to a ResidueModification object found in the ModificationsDB

◆ setModification() [4/4]

void setModification ( Size  index,
const String modification 
)

set the modification of the residue at position index. if an empty string is passed replaces the residue with its unmodified version

◆ setModificationByDiffMonoMass()

void setModificationByDiffMonoMass ( Size  index,
double  diffMonoMass 
)

modifies the residue at index in the sequence and potentially in the ResidueDB

◆ setNTerminalModification() [1/3]

void setNTerminalModification ( const ResidueModification mod)

sets the N-terminal modification (copies and adds to database if not present)

◆ setNTerminalModification() [2/3]

void setNTerminalModification ( const ResidueModification modification)

sets the N-terminal modification

◆ setNTerminalModification() [3/3]

void setNTerminalModification ( const String modification)

sets the N-terminal modification (by lookup in the mod names of the ModificationsDB) throws if nothing is found (since the name is not enough information to create a new mod)

◆ setNTerminalModificationByDiffMonoMass()

void setNTerminalModificationByDiffMonoMass ( double  diffMonoMass,
bool  protein_term 
)

sets the N-terminal modification by the monoisotopic mass difference it introduces (creates a "user-defined" mod if not present)

◆ size()

Size size ( ) const

returns the number of residues

Referenced by AAIndex::calculateGB().

◆ toBracketString()

String toBracketString ( bool  integer_mass = true,
bool  mass_delta = false,
const std::vector< String > &  fixed_modifications = std::vector< String >() 
) const

create a TPP compatible string of the modified sequence using bracket notation.

Instead of using the modification names, it writes the modification masses in brackets

i.e.:

  • n[43]PEPC[160]PEPM[147]PEPRc[16]
  • n[+42]PEPC[+57]PEPM[+16]PEPRc[-1]

will be produced, depending on whether relative or absolute masses are used.

Parameters
integer_massWhether to use integer masses in brackets (default is true, if false, accurate masses will be written)
mass_deltaWhether to write absolute masses M[147] or relative mass deltas M[+16] (default is false)
fixed_modificationsOptional list of fixed modifications that should not be added to the output (they are considered to be present in all cases)
Note
Using integer masses may mean that there could be multiple modifications mapping to the same mass

◆ toString()

String toString ( ) const

returns the peptide as string with modifications embedded in brackets

Uses round brackets when possible (id is known) or square brackets for unknown modifications where only the mass is known.

i.e.: .[43]PEPC(Carbamidomethyl)PEPM[147]PEPR.[-1]

Note
For unknown modifications, the function will attempt to use the exact same format used in the input

Referenced by IDFilter::annotateBestPerPeptideWithData(), and IDBoostGraph::LabelVisitor::operator()().

◆ toUniModString()

String toUniModString ( ) const

returns the peptide as string with UniMod-style modifications embedded in brackets

Annotates modification with UniMod identifier (when identifier is known) and uses square brackets for unknown modifications (only mass is known).

i.e.: .[43]PEPC(UniMod:4)PEPM[147]PEPR.[16]

◆ toUnmodifiedString()

String toUnmodifiedString ( ) const

returns the peptide as string without any modifications or (e.g., "PEPTIDER")

Referenced by IDFilter::annotateBestPerPeptideWithData(), IDFilter::PeptideDigestionFilter::operator()(), and IDBoostGraph::PrintAddressVisitor< CharT >::operator()().

Friends And Related Function Documentation

◆ operator<<

std::ostream& operator<< ( std::ostream &  os,
const AASequence peptide 
)
friend

writes a peptide to an output stream

◆ operator>>

std::istream& operator>> ( std::istream &  is,
const AASequence peptide 
)
friend

reads a peptide from an input stream

Member Data Documentation

◆ c_term_mod_

const ResidueModification* c_term_mod_
protected

◆ n_term_mod_

const ResidueModification* n_term_mod_
protected

◆ peptide_

std::vector<const Residue*> peptide_
protected