OpenMS
FileMerger

Merges several files. Multiple output formats supported, depending on the input format.

pot. predecessor tools → FileMerger → pot. successor tools
any tool/instrument producing mergeable files any tool operating merged files (e.g. XTandemAdapter for mzML, ProteinQuantifier for consensusXML)

Special attention should be given to the append_method for consensusXMLs. One column corresponds to one channel/label + raw file. Rows are quantified and linked features. More details on the use cases can be found at the parameter description.

For non-consensusXML or consensusXML merging with append_rows, the meta information that is valid for the whole experiment (e.g. MS instrument and sample) is taken from the first file only.

For spectrum-containing formats (no feature/consensusXML), the retention times for the individual scans are taken from either:

  • the input file meta data (e.g. mzML)
  • from the input file names (name must contain 'rt' directly followed by a number, e.g. 'myscan_rt3892.98_MS2.dta')
  • as a list (one RT for each file)
  • or are auto-generated (starting at 1 with 1 second increment).

The command line parameters of this tool are:

FileMerger -- Merges several MS files into one file.
Full documentation: http://www.openms.de/doxygen/release/3.2.0/html/TOPP_FileMerger.html
Version: 3.2.0 Sep 18 2024, 16:00:56, Revision: e231942
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
   trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  FileMerger <options>

Options (mandatory options marked with '*'):
  -in <files>*                  Input files separated by blank (valid formats: 'mzData', 'mzXML', 'mzML', 
                                'dta', 'dta2d', 'mgf', 'featureXML', 'consensusXML', 'fid', 'traML', 'fasta')

  -in_type <type>               Input file type (default: determined from file extension or content) (valid: 
                                'mzData', 'mzXML', 'mzML', 'dta', 'dta2d', 'mgf', 'featureXML', 'consensusXML
                                ', 'fid', 'traML', 'fasta')
  -out <file>*                  Output file (valid formats: 'mzML', 'featureXML', 'consensusXML', 'traML', 
                                'fasta')
  -annotate_file_origin         Store the original filename in each feature using meta value "file_origin" 
                                (for featureXML and consensusXML only).
  -append_method <choice>       (ConsensusXML-only) Append quantitative information about features row-wise 
                                or column-wise.
                                - 'append_rows' is usually used when the inputs come from the same MS run 
                                (e.g. caused by manual splitting or multiple algorithms on the same file)
                                - 'append_cols' when you want to combine consensusXMLs from e.g. different 
                                fractions to be summarized in ProteinQuantifier or jointly exported with MzTa
                                bExporter. (default: 'append_rows') (valid: 'append_rows', 'append_cols')

Options for concatenating files in the retention time (RT) dimension. The RT ranges of inputs are adjusted 
so they don't overlap in the merged file (traML input not supported):
  -rt_concat:gap <sec>          The amount of gap (in seconds) to insert between the RT ranges of different 
                                input files. RT concatenation is enabled if a value > 0 is set. (default: 
                                '0.0')
  -rt_concat:trafo_out <files>  Output of retention time transformations that were applied to the input files
                                 to produce non-overlapping RT ranges. If used, one output file per input 
                                file is required. (valid formats: 'trafoXML')

Options for raw data input/output (primarily for DTA files):
  -raw:rt_auto                  Assign retention times automatically (integers starting at 1)
  -raw:rt_custom <rts>          List of custom retention times that are assigned to the files. The number of 
                                given retention times must be equal to the number of input files.
  -raw:rt_filename              Try to guess the retention time of a file based on the filename. This option 
                                is useful for merging DTA files, where filenames should contain the string 
                                'rt' directly followed by a floating point number, e.g. 'my_spectrum_rt2795.1
                                5.dta'
  -raw:ms_level <num>           If 1 or higher, this number is assigned to spectra as the MS level. This opti
                                on is useful for DTA files which do not contain MS level information. (defaul
                                t: '0')

                                
Common TOPP options:
  -ini <file>                   Use the given TOPP INI file
  -threads <n>                  Sets the number of threads allowed to be used by the TOPP tool (default: '1')

  -write_ini <file>             Writes the default configuration file
  --help                        Shows options
  --helphelp                    Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+FileMergerMerges several MS files into one file.
version3.2.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'FileMerger'
in[] Input files separated by blankinput file*.mzData, *.mzXML, *.mzML, *.dta, *.dta2d, *.mgf, *.featureXML, *.consensusXML, *.fid, *.traML, *.fasta
in_type Input file type (default: determined from file extension or content)mzData, mzXML, mzML, dta, dta2d, mgf, featureXML, consensusXML, fid, traML, fasta
out Output fileoutput file*.mzML, *.featureXML, *.consensusXML, *.traML, *.fasta
annotate_file_originfalse Store the original filename in each feature using meta value "file_origin" (for featureXML and consensusXML only).true, false
append_methodappend_rows (ConsensusXML-only) Append quantitative information about features row-wise or column-wise.
- 'append_rows' is usually used when the inputs come from the same MS run (e.g. caused by manual splitting or multiple algorithms on the same file)
- 'append_cols' when you want to combine consensusXMLs from e.g. different fractions to be summarized in ProteinQuantifier or jointly exported with MzTabExporter.
append_rows, append_cols
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++rt_concatOptions for concatenating files in the retention time (RT) dimension. The RT ranges of inputs are adjusted so they don't overlap in the merged file (traML input not supported)
gap0.0 The amount of gap (in seconds) to insert between the RT ranges of different input files. RT concatenation is enabled if a value > 0 is set.
trafo_out[] Output of retention time transformations that were applied to the input files to produce non-overlapping RT ranges. If used, one output file per input file is required.output file*.trafoXML
+++rawOptions for raw data input/output (primarily for DTA files)
rt_autofalse Assign retention times automatically (integers starting at 1)true, false
rt_custom[] List of custom retention times that are assigned to the files. The number of given retention times must be equal to the number of input files.
rt_filenamefalse Try to guess the retention time of a file based on the filename. This option is useful for merging DTA files, where filenames should contain the string 'rt' directly followed by a floating point number, e.g. 'my_spectrum_rt2795.15.dta'true, false
ms_level0 If 1 or higher, this number is assigned to spectra as the MS level. This option is useful for DTA files which do not contain MS level information.