OpenMS
|
A class to analyze indexedmzML files and extract the offsets of individual tags. More...
#include <OpenMS/FORMAT/HANDLERS/IndexedMzMLDecoder.h>
Public Types | |
typedef std::vector< std::pair< std::string, std::streampos > > | OffsetVector |
The vector containing binary offsets. More... | |
Public Member Functions | |
int | parseOffsets (const String &filename, std::streampos indexoffset, OffsetVector &spectra_offsets, OffsetVector &chromatograms_offsets) |
Tries to extract the offsets of all spectra and chromatograms from an indexedmzML. More... | |
std::streampos | findIndexListOffset (const String &filename, int buffersize=1023) |
Tries to extract the indexList offset from an indexedmzML. More... | |
Protected Member Functions | |
int | domParseIndexedEnd_ (const std::string &in, OffsetVector &spectra_offsets, OffsetVector &chromatograms_offsets) |
Extract data from a string containing an <indexList> tag. More... | |
A class to analyze indexedmzML files and extract the offsets of individual tags.
Specifically, this class allows one to extract the offsets of the <indexList> tag and of all <spectrum> and <chromatogram> tag using the indices found at the end of the indexedmzML XML structure.
While findIndexListOffset tries extracts the offset of the indexList tag from the last 1024 bytes of the file, this offset allows the function parseOffsets to extract all elements contained in the <indexList> tag and thus get access to all spectra and chromatogram offsets.
typedef std::vector< std::pair<std::string, std::streampos> > OffsetVector |
The vector containing binary offsets.
|
protected |
Extract data from a string containing an <indexList> tag.
This function parses the contained <offset> tags inside the indexList tag and stores the contents in the spectra and chromatogram offset vectors.
This function expects an input string that contains a root XML tag and as one of its child an <indexList> tag as defined by the mzML 1.1.0 index wrapper schema. Usually the root would be an indexedmzML tag and must contain an indexList tag, while the dx:mzML, indexListOffset and fileChecksum are optional(their presence is not checked).
Still this means, don't stick non-valid XML in here (e.g. non matching open/close tags). Usually this means that you will at least have to add an opening </indexedmzML>. Valid input for this function would for example be:
in | String containing the XML with a indexedmzML parent and an indexList child tag |
spectra_offsets | Output vector containing the positions of all spectra in the file |
chromatograms_offsets | Output vector containing the positions of all chromatograms in the file |
std::streampos findIndexListOffset | ( | const String & | filename, |
int | buffersize = 1023 |
||
) |
Tries to extract the indexList offset from an indexedmzML.
This function reads by default the last few (1024) bytes of the given input file and tries to read the content of the <indexListOffset> tag. The idea is that somewhere in the last parts of the file specified by the input string, the string <indexListOffset>xxx</indexListOffset> occurs. This function returns the xxx part converted to an integer.
filename | Filename of the input indexedmzML file |
buffersize | How many bytes of the input file should be searched for the tag |
FileNotFound | is thrown if file cannot be found |
ParseError | if offset cannot be parsed |
int parseOffsets | ( | const String & | filename, |
std::streampos | indexoffset, | ||
OffsetVector & | spectra_offsets, | ||
OffsetVector & | chromatograms_offsets | ||
) |
Tries to extract the offsets of all spectra and chromatograms from an indexedmzML.
Given the start of the <indexList> element, this function tries to read this tag from the given the indexedmzML file. It stores the result in the spectra and chromatogram offset vectors.
filename | Filename of the input indexedmzML file |
indexoffset | Offset at which position in the file the XML tag '<indexList>' is expected to occur |
spectra_offsets | Output vector containing the positions of all spectra in the file |
chromatograms_offsets | Output vector containing the positions of all chromatograms in the file |