OpenMS
|
Matches tandem mass spectra to nucleic acid sequences.
Given a FASTA file containing RNA sequences (and optionally decoys) and an mzML file from a nucleic acid mass spec experiment:
Output is in the form of an mzTab-like text file containing the search results. Optionally, an idXML file suitable for visualizing search results in TOPPView (parameter id_out
) and a "target coordinates" file for label-free quantification using FeatureFinderMetaboIdent (parameter lfq_out
) can be generated.
Modified ribonucleotides can either be specified in the FASTA input file (as fixed modifications), or set as variable modifications in the tool options. Information on available modifications is taken from the Modomics database (http://modomics.genesilico.pl/). In addition to these "standard" modifications, OpenMS defines "generic" and "ambiguous" ones:
A generic modification represents a group of modifications that cannot be distinguished by tandem mass spectrometry. For example, "mA" stands for any methyladenosine (could be "m1A", "m2A", "m6A" or "m8A"), "mmA" for any dimethyladenosine (with two methyl groups on the base), and "mAm" for any 2'-O-dimethyladenosine (with one methyl group each on base and ribose). There is no technical difference between searching for "mA" or e.g. "m1A", but the generic code better represents that no statement can be made about the position of the methyl group on the base.
In contrast, an ambiguous modification represents two isobaric modifications (or modification groups) with a methyl group on either the base or the ribose, that could in principle be distinguished based on a-B ions. For example, "mA?" stands for methyladenosine ("mA", see above) or 2'-O-methyladenosine ("Am"). When using ambiguous modifications in a search, NucleicAcidSearchEngine can optionally try to assign the alternative that generates better a-B ion matches in a spectrum (see parameter modifications:resolve_ambiguities
).
The command line parameters of this tool are:
NucleicAcidSearchEngine -- Annotate nucleic acid identifications to MS/MS spectra. Full documentation: http://www.openms.de/doxygen/release/3.2.0/html/TOPP_NucleicAcidSearchEngine.html Version: 3.2.0 Sep 18 2024, 16:00:56, Revision: e231942 To cite OpenMS: + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7. Usage: NucleicAcidSearchEngine <options> Options (mandatory options marked with '*'): -in <file>* Input file: spectra (valid formats: 'mzML') -database <file> Input file: sequence database. Required unless 'digest' is set. (valid formats: 'fasta') -digest <file> Input file: pre-digested sequence database. Can be used instea d of 'database'. Sets all 'oligo:...' parameters. (valid forma ts: 'oms') -out <file>* Output file: mzTab (valid formats: 'mzTab') -id_out <file> Output file: idXML (for visualization in TOPPView) (valid form ats: 'idXML') -db_out <file> Output file: oms (SQLite database) (valid formats: 'oms') -digest_out <file> Output file: sequence database digest. Ignored if 'digest' input is used. (valid formats: 'oms') -lfq_out <file> Output file: targets for label-free quantification using Featu reFinderMetaboIdent ('id' input) (valid formats: 'tsv') Precursor (parent ion) options: -precursor:mass_tolerance <tolerance> Precursor mass tolerance (+/- around uncharged precursor mass) (default: '10.0') -precursor:mass_tolerance_unit <unit> Unit of precursor mass tolerance (default: 'ppm') (valid: 'Da' , 'ppm') -precursor:min_charge <num> Minimum precursor charge to be considered (default: '-1') -precursor:max_charge <num> Maximum precursor charge to be considered (default: '-20') -precursor:include_unknown_charge Include MS2 spectra with unknown precursor charge - try to match them in any possible charge between 'min_charge' and 'max_charge', at the risk of a higher error rate -precursor:use_avg_mass Use average instead of monoisotopic precursor masses (appropri ate for low-resolution instruments) -precursor:use_adducts Consider possible salt adducts (see 'precursor:potential_adduc ts') when matching precursor masses -precursor:potential_adducts <list> Adducts considered to explain mass differences. Format: 'Eleme nt:Charge(+/-)', i.e. the number of '+' or '-' indicates the charge, e.g. 'Ca:++' indicates +2. Only used if 'precursor:use _adducts' is set. (default: '[Na:+]') -precursor:isotopes <list> Correct for mono-isotopic peak misassignments. E.g.: 1 = precu rsor may be misassigned to the first isotopic peak. Ignored if 'use_avg_mass' is set. (default: '[0 1 2 3 4]') Fragment (Product Ion) Options: -fragment:mass_tolerance <tolerance> Fragment mass tolerance (+/- around fragment m/z) (default: '10.0') -fragment:mass_tolerance_unit <unit> Unit of fragment mass tolerance (default: 'ppm') (valid: 'Da', 'ppm') -fragment:ions <choice> Fragment ions to include in theoretical spectra (default: '[a- B a b c d w x y z]') (valid: 'a-B', 'a', 'b', 'c', 'd', 'w', 'x', 'y', 'z') Modification options: -modifications:variable <mods> Variable modifications (valid: 'io6A', 's2U', 'k2C', 'm2Gm', 'Ym', 'f5Cm', 'Qbase', 'ac4Cm', 'imG-14', 'cm5s2U', 'mnm5s2U', 'm227G', 'yW-58', 'I', 'g6A', 'nm5U', 'm7G', 's2Um', 'Y', 'hm5C', 'm5U', 'preQ0', 'o2yW', 'm5Um', 'preQ1', 'm66Am', 'ac6 A', 'ms2io6A', 'Am', 'Im', 'mnm5U', 'm22G', 't6A', 'm8A', 'm7G pppN', 'm27GpppN', 'm227GpppN', 'mpppN', 'm28A', 'acp3D', 'acp 3Y', 'imG', 'D', 'N', 'C+', 'm27Gm', 'ho5C', 'inm5U', 'inm5Um' , 'inm5s2U', 'pppN', 'GpppN', 'CoApN', 'm44C', 'acCoApN', 'mal ... , 'dT*') -modifications:variable_max_per_oligo <num> Maximum number of residues carrying a variable modification per candidate oligonucleotide (default: '2') -modifications:resolve_ambiguities Attempt to resolve ambiguous modifications (e.g. 'mA?' for 'mA'/'Am') based on a-B ions. This incurs a performance cost because two modifications have to be considered for each case. Requires a-B ions to be enabled in parameter 'fragment:ions'. Oligonucleotide (digestion) options (ignored if 'digest' input is used): -oligo:min_size <num> Minimum size an oligonucleotide must have after digestion to be considered in the search (default: '5') -oligo:max_size <num> Maximum size an oligonucleotide must have after digestion to be considered in the search, leave at 0 for no limit (default: '0') -oligo:missed_cleavages <num> Number of missed cleavages (default: '1') -oligo:enzyme <choice> The enzyme used for RNA digestion (default: 'no cleavage') (valid: 'unspecific cleavage', 'RNase_T1_Phosphatase', 'RNase_ U2', 'RNase_MC1', 'RNase_H', 'RNase_4', 'RNase_4p', 'RNase_T1' , 'RNase_A', 'mazF', 'colicin_E5', 'no cleavage', 'cusativin') False Discovery Rate options: -fdr:decoy_pattern <string> String used as part of the accession to annotate decoy sequenc es (e.g. 'DECOY_'). Leave empty to skip the FDR/q-value calcul ation. -fdr:cutoff <value> Cut-off for FDR filtering; search hits with higher q-values will be removed (default: '1.0') (min: '0.0' max: '1.0') -fdr:remove_decoys Do not score hits to decoy sequences and remove them when filt ering Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced)
INI file documentation of this tool: