This implements the OpenSWATH workflow as described in Rost and Rosenberger et al. (Nature Biotechnology, 2014) and provides a complete, integrated analysis tool without the need to run multiple tools consecutively. See also http://openswath.org/ for additional documentation.
See below or have a look at the INI file (via "OpenSwathWorkflow -write_ini myini.ini") for available parameters and more functionality.
SWATH maps can be provided as mzML files, either as single file directly from the machine (this assumes that the SWATH method has 1 MS1 and then n MS2 spectra which are ordered the same way for each cycle). E.g. a valid method would be MS1, MS2 [400-425], MS2 [425-450], MS1, MS2 [400-425], MS2 [425-450] while an invalid method would be MS1, MS2 [400-425], MS2 [425-450], MS1, MS2 [425-450], MS2 [400-425] where MS2 [xx-yy] indicates an MS2 scan with an isolation window starting at xx and ending at yy. OpenSwathWorkflow will try to read the SWATH windows from the data, if this is not possible please provide a tab-separated list with the correct windows using the -swath_windows_file parameter (this is recommended). Note that the software expects extraction windows (e.g. which peptides to extract from which window) which cannot have overlaps, otherwise peptides will be extracted from two different windows.
Alternatively, a set of split files (n+1 mzML files) can be provided, each containing one SWATH map (or MS1 map).
Since the file size can become rather large, it is recommended to not load the whole file into memory but rather cache it somewhere on the disk using a fast-access data format. This can be specified using the -readOptions cache parameter (this is recommended!).
The current parameters are optimized for 2 hour gradients on SCIEX 5600 / 6600 TripleTOF instruments with a peak width of around 30 seconds using iRT peptides. If your chromatography differs, please consider adjusting -Scoring:TransitionGroupPicker:min_peak_width
to allow for smaller or larger peaks and adjust the -rt_extraction_window
to use a different extraction window for the retention time. In m/z domain, consider adjusting -mz_extraction_window
to your instrument resolution, which can be in Th or ppm.
If you encounter issues with peak picking, try to disable peak filtering by setting -Scoring:TransitionGroupPicker:compute_peak_quality
false which will disable the filtering of peaks by chromatographic quality. Furthermore, you can adjust the smoothing parameters for the peak picking, by adjusting -Scoring:TransitionGroupPicker:PeakPickerChromatogram:sgolay_frame_length
or using a Gaussian smoothing based on your estimated peak width. Adjusting the signal to noise threshold will make the peaks wider or smaller.
The output of the OpenSwathWorkflow is a feature list, either as FeatureXML or as tsv (use -out_features
or -out_tsv
) while the latter is more memory friendly and can be directly used as input to other tools such as mProphet or pyProphet. If you analyze large datasets, it is recommended to only use -out_tsv
and not -out_features
. For downstream analysis (e.g. using mProphet or pyProphet) also the -out_tsv
format is recommended.
Legend:
required parameter
advanced parameter
+OpenSwathWorkflowComplete workflow to run OpenSWATH
version3.2.0
Version of the tool that generated this parameters file.
++1Instance '1' section for 'OpenSwathWorkflow'
in[]
Input files separated by blankinput file*.mzML, *.mzXML, *.sqMass
tr
transition file ('TraML','tsv','pqp')input file*.traML, *.tsv, *.pqp
tr_type
input file type -- default: determined from file extension or content
traML, tsv, pqp
tr_irt
transition file ('TraML')input file*.traML, *.tsv, *.pqp
tr_irt_nonlinear
additional nonlinear transition file ('TraML')input file*.traML, *.tsv, *.pqp
rt_norm
RT normalization file (how to map the RTs of this run to the ones stored in the library). If set, tr_irt may be omitted.input file*.trafoXML
swath_windows_file
Optional, tab-separated file containing the SWATH windows for extraction: lower_offset upper_offset. Note that the first line is a header and will be skipped.input file
sort_swath_mapsfalse
Sort input SWATH files when matching to SWATH windows from swath_windows_filetrue, false
enable_ms1true
Extract the precursor ion trace(s) and use for scoring if presenttrue, false
enable_ipftrue
Enable additional scoring of identification assays using IPF (see online documentation)true, false
out_features
output fileoutput file*.featureXML
out_tsv
TSV output file (mProphet-compatible TSV file)output file*.tsv
out_osw
OSW output file (PyProphet-compatible SQLite file)output file*.osw
out_chrom
Also output all computed chromatograms output in mzML (chrom.mzML) or sqMass (SQLite format)output file*.mzML, *.sqMass
out_qc
Optional QC meta data (charge distribution in MS1). Only works with mzML input files.output file*.json
min_upper_edge_dist0.0
Minimal distance to the upper edge of a Swath window to still consider a precursor, in Thomson
sonarfalse
data is scanning SWATH datatrue, false
paseffalse
data is PASEF datatrue, false
rt_extraction_window600.0
Only extract RT around this value (-1 means extract over the whole range, a value of 600 means to extract around +/- 300 s of the expected elution).
extra_rt_extraction_window0.0
Output an XIC with a RT-window by this much larger (e.g. to visually inspect a larger area of the chromatogram)0.0:∞
ion_mobility_window-1.0
Extraction window in ion mobility dimension (in 1/k0 or milliseconds depending on library). This is the full window size, e.g. a value of 10 milliseconds would extract 5 milliseconds on either side. -1 means extract over the whole range or ion mobility is not present. (Default for diaPASEF data: 0.06 1/k0)
mz_extraction_window50.0
Extraction window in Thomson or ppm (see mz_extraction_window_unit)0.0:∞
mz_extraction_window_unitppm
Unit for mz extractionTh, ppm
mz_extraction_window_ms150.0
Extraction window used in MS1 in Thomson or ppm (see mz_extraction_window_ms1_unit)0.0:∞
mz_extraction_window_ms1_unitppm
Unit of the MS1 m/z extraction windowppm, Th
im_extraction_window_ms1-1.0
Extraction window in ion mobility dimension for MS1 (in 1/k0 or milliseconds depending on library). -1 means this is not ion mobility data.
use_ms1_ion_mobilitytrue
Also perform precursor extraction using the same ion mobility window as for fragment ion extractiontrue, false
matching_window_onlyfalse
Assume the input data is targeted / PRM-like data with potentially overlapping DIA windows. Will only attempt to extract each assay from the *best* matching DIA window (instead of all matching windows).true, false
irt_mz_extraction_window50.0
Extraction window used for iRT and m/z correction in Thomson or ppm (see irt_mz_extraction_window_unit)0.0:∞
irt_mz_extraction_window_unitppm
Unit for mz extractionTh, ppm
irt_im_extraction_window-1.0
Ion mobility extraction window used for iRT (in 1/K0 or milliseconds depending on library). -1 means do not perform ion mobility calibration
min_rsq0.95
Minimum r-squared of RT peptides regression
min_coverage0.6
Minimum relative amount of RT peptides to keep
split_file_inputfalse
The input files each contain one single SWATH (alternatively: all SWATH are in separate files)true, false
use_elution_model_scorefalse
Turn on elution model score (EMG fit to peak)true, false
readOptionsnormal
Whether to run OpenSWATH directly on the input data, cache data to disk first or to perform a datareduction step first. If you choose cache, make sure to also set tempDirectorynormal, cache, cacheWorkingInMemory, workingInMemory
mz_correction_functionnone
Use the retention time normalization peptide MS2 masses to perform a mass correction (linear, weighted by intensity linear or quadratic) of all spectra.none, regression_delta_ppm, unweighted_regression, weighted_regression, quadratic_regression, weighted_quadratic_regression, weighted_quadratic_regression_delta_ppm, quadratic_regression_delta_ppm
tempDirectory/tmp
Temporary directory to store cached files for example
extraction_functiontophat
Function used to extract the signaltophat, bartlett
batchSize1000
The batch size of chromatograms to process (0 means to only have one batch, sensible values are around 250-1000)0:∞
outer_loop_threads-1
How many threads should be used for the outer loop (-1 use all threads, use 4 to analyze 4 SWATH windows in memory at once).
ms1_isotopes3
The number of MS1 isotopes used for extraction0:∞
log
Name of log file (created only when specified)
debug0
Sets the debug level
threads1
Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse
Disables progress logging to command linetrue, false
forcefalse
Overrides tool-specific checkstrue, false
testfalse
Enables the test mode (needed for internal use only)true, false
+++DebuggingDebugging
irt_mzml
Chromatogram mzML containing the iRT peptidesoutput file*.mzML
irt_trafo
Transformation file for RT transformoutput file*.trafoXML
+++CalibrationParameters for the m/z and ion mobility calibration.
ms1_im_calibrationfalse
Whether to use MS1 precursor data for the ion mobility calibration (default = false, uses MS2 / fragment ions for calibration)true, false
im_correction_functionlinear
Type of normalization function for IM calibration.none, linear
debug_im_file
Debug file for Ion Mobility calibration.
debug_mz_file
Debug file for m/z calibration.
+++LibraryLibrary parameters section
retentionTimeInterpretationiRT
How to interpret the provided retention time (the retention time column can either be interpreted to be in iRT, minutes or seconds)iRT, seconds, minutes
override_group_label_checkfalse
Override an internal check that assures that all members of the same PeptideGroupLabel have the same PeptideSequence (this ensures that only different isotopic forms of the same peptide can be grouped together in the same label group). Only turn this off if you know what you are doing.true, false
force_invalid_modsfalse
Force reading even if invalid modifications are encountered (OpenMS may not recognize the modification)true, false
+++RTNormalizationParameters for the RTNormalization for iRT petides. This specifies how the RT alignment is performed and how outlier detection is applied. Outlier detection can be done iteratively (by default) which removes one outlier per iteration or using the RANSAC algorithm.
alignmentMethodlinear
How to perform the alignment to the normalized RT space using anchor points. 'linear': perform linear regression (for few anchor points). 'interpolated': Interpolate between anchor points (for few, noise-free anchor points). 'lowess' Use local regression (for many, noisy anchor points). 'b_spline' use b splines for smoothing.linear, interpolated, lowess, b_spline
outlierMethoditer_residual
Which outlier detection method to use (valid: 'iter_residual', 'iter_jackknife', 'ransac', 'none'). Iterative methods remove one outlier at a time. Jackknife approach optimizes for maximum r-squared improvement while 'iter_residual' removes the datapoint with the largest residual error (removal by residual is computationally cheaper, use this with lots of peptides).iter_residual, iter_jackknife, ransac, none
useIterativeChauvenetfalse
Whether to use Chauvenet's criterion when using iterative methods. This should be used if the algorithm removes too many datapoints but it may lead to true outliers being retained.true, false
RANSACMaxIterations1000
Maximum iterations for the RANSAC outlier detection algorithm.
RANSACMaxPercentRTThreshold3
Maximum threshold in RT dimension for the RANSAC outlier detection algorithm (in percent of the total gradient). Default is set to 3% which is around +/- 4 minutes on a 120 gradient.
RANSACSamplingSize10
Sampling size of data points per iteration for the RANSAC outlier detection algorithm.
estimateBestPeptidesfalse
Whether the algorithms should try to choose the best peptides based on their peak shape for normalization. Use this option you do not expect all your peptides to be detected in a sample and too many 'bad' peptides enter the outlier removal step (e.g. due to them being endogenous peptides or using a less curated list of peptides).true, false
InitialQualityCutoff0.5
The initial overall quality cutoff for a peak to be scored (range ca. -2 to 2)
OverallQualityCutoff5.5
The overall quality cutoff for a peak to go into the retention time estimation (range ca. 0 to 10)
NrRTBins10
Number of RT bins to use to compute coverage. This option should be used to ensure that there is a complete coverage of the RT space (this should detect cases where only a part of the RT gradient is actually covered by normalization peptides)
MinPeptidesPerBin1
Minimal number of peptides that are required for a bin to counted as 'covered'
MinBinsFilled8
Minimal number of bins required to be covered
++++lowess
span0.05
Span parameter for lowess0.0:1.0
++++b_spline
num_nodes5
Number of nodes for b spline0:∞
+++ScoringScoring parameters section
stop_report_after_feature5
Stop reporting after feature (ordered by quality; -1 means do not stop).
rt_normalization_factor100.0
The normalized RT is expected to be between 0 and 1. If your normalized RT has a different range, pass this here (e.g. it goes from 0 to 100, set this value to 100)
quantification_cutoff0.0
Cutoff in m/z below which peaks should not be used for quantification any more0.0:∞
write_convex_hullfalse
Whether to write out all points of all features into the featureXMLtrue, false
spectrum_addition_methodsimple
For spectrum addition, either use simple concatenation or use peak resamplingsimple, resample
add_up_spectra1
Add up spectra around the peak apex (needs to be a non-even integer)1:∞
spacing_for_spectra_resampling5.0e-03
If spectra are to be added, use this spacing to add them up0.0:∞
uis_threshold_sn-1
S/N threshold to consider identification transition (set to -1 to consider all)
uis_threshold_peak_area0
Peak area threshold to consider identification transition (set to -1 to consider all)
scoring_modeldefault
Scoring model to usedefault, single_transition
im_extra_drift0.0
Extra drift time to extract for IM scoring (as a fraction, e.g. 0.25 means 25% extra on each side)0.0:∞
stricttrue
Whether to error (true) or skip (false) if a transition in a transition group does not have a corresponding chromatogram.true, false
use_ms1_ion_mobilitytrue
Performs ion mobility extraction in MS1. Set to false if MS1 spectra do not contain ion mobility
++++TransitionGroupPicker
stop_after_feature-1
Stop finding after feature (ordered by intensity; -1 means do not stop).
min_peak_width-1.0
Minimal peak width (s), discard all peaks below this value (-1 means no action).
peak_integrationoriginal
Calculate the peak area and height either the smoothed or the raw chromatogram data.original, smoothed
background_subtractionnone
Remove background from peak signal using estimated noise levels. The 'original' method is only provided for historical purposes, please use the 'exact' method and set parameters using the PeakIntegrator: settings. The same original or smoothed chromatogram specified by peak_integration will be used for background estimation.none, original, exact
recalculate_peakstrue
Tries to get better peak picking by looking at peak consistency of all picked peaks. Tries to use the consensus (median) peak border if the variation within the picked peaks is too large.true, false
use_precursorsfalse
Use precursor chromatogram for peak picking (note that this may lead to precursor signal driving the peak picking)true, false
use_consensustrue
Use consensus peak boundaries when computing transition group picking (if false, compute independent peak boundaries for each transition)true, false
recalculate_peaks_max_z0.75
Determines the maximal Z-Score (difference measured in standard deviations) that is considered too large for peak boundaries. If the Z-Score is above this value, the median is used for peak boundaries (default value 1.0).
minimal_quality-1.5
Only if compute_peak_quality is set, this parameter will not consider peaks below this quality threshold
resample_boundary15.0
For computing peak quality, how many extra seconds should be sample left and right of the actual peak
compute_peak_qualityfalse
Tries to compute a quality value for each peakgroup and detect outlier transitions. The resulting score is centered around zero and values above 0 are generally good and below -1 or -2 are usually bad.true, false
compute_peak_shape_metricsfalse
Calculates various peak shape metrics (e.g., tailing) that can be used for downstream QC/QA.true, false
compute_total_mifalse
Compute mutual information metrics for individual transitions that can be used for OpenSWATH/IPF scoring.true, false
boundary_selection_methodlargest
Method to use when selecting the best boundaries for peaks.largest, widest
+++++PeakPickerChromatogram
sgolay_frame_length11
The number of subsequent data points used for smoothing.
This number has to be uneven. If it is not, 1 will be added.
sgolay_polynomial_order3
Order of the polynomial that is fitted.
gauss_width30.0
Gaussian width in seconds, estimated peak size.
use_gaussfalse
Use Gaussian filter for smoothing (alternative is Savitzky-Golay filter)false, true
peak_width-1.0
Force a certain minimal peak_width on the data (e.g. extend the peak at least by this amount on both sides) in seconds. -1 turns this feature off.
signal_to_noise0.1
Signal-to-noise threshold at which a peak will not be extended any more. Note that setting this too high (e.g. 1.0) can lead to peaks whose flanks are not fully captured.0.0:∞
write_sn_log_messagesfalse
Write out log messages of the signal-to-noise estimator in case of sparse windows or median in rightmost histogram bintrue, false
remove_overlapping_peakstrue
Try to remove overlapping peaks during peak pickingfalse, true
methodcorrected
Which method to choose for chromatographic peak-picking (OpenSWATH legacy on raw data, corrected picking on smoothed chromatogram or Crawdad on smoothed chromatogram).legacy, corrected, crawdad
+++++PeakIntegrator
integration_typeintensity_sum
The integration technique to use in integratePeak() and estimateBackground() which uses either the summed intensity, integration by Simpson's rule or trapezoidal integration.intensity_sum, simpson, trapezoid
baseline_typebase_to_base
The baseline type to use in estimateBackground() based on the peak boundaries. A rectangular baseline shape is computed based either on the minimal intensity of the peak boundaries, the maximum intensity or the average intensity (base_to_base).base_to_base, vertical_division, vertical_division_min, vertical_division_max
fit_EMGfalse
Fit the chromatogram/spectrum to the EMG peak model.false, true
++++DIAScoring
dia_extraction_window0.05
DIA extraction window in Th or ppm.0.0:∞
dia_extraction_unitTh
DIA extraction window unitTh, ppm
dia_centroidedfalse
Use centroided DIA data.true, false
dia_byseries_intensity_min300.0
DIA b/y series minimum intensity to consider.0.0:∞
dia_byseries_ppm_diff10.0
DIA b/y series minimal difference in ppm to consider.0.0:∞
dia_nr_isotopes4
DIA number of isotopes to consider.0:∞
dia_nr_charges4
DIA number of charges to consider.0:∞
peak_before_mono_max_ppm_diff20.0
DIA maximal difference in ppm to count a peak at lower m/z when searching for evidence that a peak might not be monoisotopic.0.0:∞
++++EMGScoring
max_iteration10
Maximum number of iterations using by Levenberg-Marquardt algorithm.
init_momfalse
Initialize parameters using method of moments estimators.true, false
++++Scores
use_shape_scoretrue
Use the shape score (this score measures the similarity in shape of the transitions using a cross-correlation)true, false
use_coelution_scoretrue
Use the coelution score (this score measures the similarity in coelution of the transitions using a cross-correlation)true, false
use_rt_scoretrue
Use the retention time score (this score measure the difference in retention time)true, false
use_library_scoretrue
Use the library scoretrue, false
use_intensity_scoretrue
Use the intensity scoretrue, false
use_nr_peaks_scoretrue
Use the number of peaks scoretrue, false
use_total_xic_scoretrue
Use the total XIC scoretrue, false
use_total_mi_scorefalse
Use the total MI scoretrue, false
use_sn_scoretrue
Use the SN (signal to noise) scoretrue, false
use_mi_scoretrue
Use the MI (mutual information) scoretrue, false
use_dia_scorestrue
Use the DIA (SWATH) scores. If turned off, will not use fragment ion spectra for scoring.true, false
use_ms1_correlationfalse
Use the correlation scores with the MS1 elution profilestrue, false
use_sonar_scoresfalse
Use the scores for SONAR scans (scanning swath)true, false
use_ion_mobility_scoresfalse
Use the scores for Ion Mobility scanstrue, false
use_ms1_fullscanfalse
Use the full MS1 scan at the peak apex for scoring (ppm accuracy of precursor and isotopic pattern)true, false
use_ms1_mitrue
Use the MS1 MI scoretrue, false
use_uis_scoresfalse
Use UIS scores for peptidoform identificationtrue, false
use_peak_shape_metricsfalse
Use peak shape metrics for scoring
use_ionseries_scorestrue
Use MS2-level b/y ion-series scores for peptidoform identificationtrue, false
use_ms2_isotope_scorestrue
Use MS2-level isotope scores (pearson & manhattan) across product transitions (based on ID if annotated or averagine)true, false