OpenMS
|
2D hierarchical clustering implementation optimized for large data sets containing many small clusters i.e. dimensions of clusters << dimension of entire dataset More...
#include <OpenMS/ML/CLUSTERING/GridBasedClustering.h>
Public Types | |
typedef GridBasedCluster::Point | Point |
cluster centre, cluster bounding box, grid index More... | |
typedef GridBasedCluster::Rectangle | Rectangle |
typedef ClusteringGrid::CellIndex | CellIndex |
typedef std::multiset< MinimumDistance >::const_iterator | MultisetIterator |
typedef std::unordered_multimap< int, MultisetIterator >::const_iterator | NNIterator |
Public Types inherited from ProgressLogger | |
enum | LogType { CMD , GUI , NONE } |
Possible log types. More... | |
Public Member Functions | |
GridBasedClustering (Metric metric, const std::vector< double > &data_x, const std::vector< double > &data_y, const std::vector< int > &properties_A, const std::vector< int > &properties_B, std::vector< double > grid_spacing_x, std::vector< double > grid_spacing_y) | |
initialises all data structures More... | |
GridBasedClustering (Metric metric, const std::vector< double > &data_x, const std::vector< double > &data_y, std::vector< double > grid_spacing_x, std::vector< double > grid_spacing_y) | |
initialises all data structures More... | |
void | cluster () |
performs the hierarchical clustering (merges clusters until their dimension exceeds that of cell) More... | |
void | extendClustersY () |
extends clusters in y-direction if possible (merges clusters further in y-direction, i.e. clusters can now span multiple cells) More... | |
void | removeSmallClustersY (double threshold_y) |
removes clusters with bounding box dimension in y-direction below certain threshold More... | |
std::map< int, GridBasedCluster > | getResults () const |
returns final results (mapping of cluster indices to clusters) More... | |
Public Member Functions inherited from ProgressLogger | |
ProgressLogger () | |
Constructor. More... | |
virtual | ~ProgressLogger () |
Destructor. More... | |
ProgressLogger (const ProgressLogger &other) | |
Copy constructor. More... | |
ProgressLogger & | operator= (const ProgressLogger &other) |
Assignment Operator. More... | |
void | setLogType (LogType type) const |
Sets the progress log that should be used. The default type is NONE! More... | |
LogType | getLogType () const |
Returns the type of progress log being used. More... | |
void | setLogger (ProgressLoggerImpl *logger) |
Sets the logger to be used for progress logging. More... | |
void | startProgress (SignedSize begin, SignedSize end, const String &label) const |
Initializes the progress display. More... | |
void | setProgress (SignedSize value) const |
Sets the current progress. More... | |
void | endProgress (UInt64 bytes_processed=0) const |
void | nextProgress () const |
increment progress by 1 (according to range begin-end) More... | |
Private Member Functions | |
void | init_ (const std::vector< double > &data_x, const std::vector< double > &data_y, const std::vector< int > &properties_A, const std::vector< int > &properties_B) |
initialises all data structures More... | |
bool | mergeVeto_ (const GridBasedCluster &c1, const GridBasedCluster &c2) const |
checks if two clusters can be merged Each point in a cluster can (optionally) have two properties A and B. Properties A need to be the same, properties B need to differ in each cluster. Methods checks if this is violated in the merged cluster. More... | |
bool | findNearestNeighbour_ (const GridBasedCluster &cluster, int cluster_index) |
determines the nearest neighbour for each cluster More... | |
void | eraseMinDistance_ (const std::multiset< MinimumDistance >::const_iterator it) |
remove minimum distance object and its related data More... | |
Private Attributes | |
Metric | metric_ |
metric for measuring the distance between points in the 2D plane More... | |
ClusteringGrid | grid_ |
grid on which the position of the clusters are registered used in cluster method More... | |
std::map< int, GridBasedCluster > | clusters_ |
list of clusters maps cluster indices to clusters More... | |
std::map< int, GridBasedCluster > | clusters_final_ |
list of final clusters i.e. clusters that are no longer merged More... | |
std::multiset< MinimumDistance > | distances_ |
list of minimum distances stores the smallest of the distances in the head More... | |
std::unordered_multimap< int, std::multiset< MinimumDistance >::const_iterator > | reverse_nns_ |
reverse nearest neighbor lookup table for finding out which clusters need to be updated faster More... | |
std::unordered_map< int, std::multiset< MinimumDistance >::const_iterator > | distance_it_for_cluster_idx_ |
cluster index to distance iterator lookup table for finding out which clusters need to be updated faster More... | |
Additional Inherited Members | |
Protected Attributes inherited from ProgressLogger | |
LogType | type_ |
time_t | last_invoke_ |
ProgressLoggerImpl * | current_logger_ |
Static Protected Attributes inherited from ProgressLogger | |
static int | recursion_depth_ |
2D hierarchical clustering implementation optimized for large data sets containing many small clusters i.e. dimensions of clusters << dimension of entire dataset
The clustering problem therefore simplifies to a number of local clustering problems. Each of the local problems can be solved on a couple of adjacent cells on a larger grid. The grid spacing is determined by the expected typical cluster size in this region.
Each data point can have two additional properties A and B. In each cluster all properties A need to be the same, all properties B different.
typedef ClusteringGrid::CellIndex CellIndex |
typedef std::multiset<MinimumDistance>::const_iterator MultisetIterator |
typedef std::unordered_multimap<int, MultisetIterator>::const_iterator NNIterator |
typedef GridBasedCluster::Point Point |
cluster centre, cluster bounding box, grid index
typedef GridBasedCluster::Rectangle Rectangle |
|
inline |
initialises all data structures
metric | Metric for measuring the distance between points in the 2D plane |
data_x | x-coordinates of points to be clustered |
data_y | y-coordinates of points to be clustered |
properties_A | property A of points (same in each cluster) |
properties_B | property B of points (different in each cluster) |
grid_spacing_x | grid spacing in x-direction |
grid_spacing_y | grid spacing in y-direction |
References GridBasedClustering< Metric >::init_().
|
inline |
initialises all data structures
metric | Metric for measuring the distance between points in the 2D plane |
data_x | x-coordinates of points to be clustered |
data_y | y-coordinates of points to be clustered |
grid_spacing_x | grid spacing in x-direction |
grid_spacing_y | grid spacing in y-direction |
References GridBasedClustering< Metric >::init_().
|
inline |
performs the hierarchical clustering (merges clusters until their dimension exceeds that of cell)
References ClusteringGrid::addCluster(), OpenMS::Constants::c, GridBasedClustering< Metric >::clusters_, GridBasedClustering< Metric >::distance_it_for_cluster_idx_, GridBasedClustering< Metric >::distances_, ProgressLogger::endProgress(), DBoundingBox< D >::enlarge(), GridBasedClustering< Metric >::eraseMinDistance_(), GridBasedClustering< Metric >::findNearestNeighbour_(), GridBasedCluster::getBoundingBox(), GridBasedCluster::getCentre(), ClusteringGrid::getIndex(), GridBasedCluster::getPoints(), GridBasedCluster::getPropertiesB(), GridBasedCluster::getPropertyA(), DPosition< D, TCoordinateType >::getX(), DPosition< D, TCoordinateType >::getY(), GridBasedClustering< Metric >::grid_, DIntervalBase< D >::maxPosition(), DIntervalBase< D >::minPosition(), ClusteringGrid::removeCluster(), GridBasedClustering< Metric >::reverse_nns_, ProgressLogger::setProgress(), and ProgressLogger::startProgress().
Referenced by GridBasedClustering< Metric >::extendClustersY(), GridBasedClustering< Metric >::findNearestNeighbour_(), and GridBasedClustering< Metric >::init_().
|
inlineprivate |
remove minimum distance object and its related data
Remove the distance object behind it
from distances_ and remove all corresponding data from the auxiliary data structures reverse_nns_ and distance_it_for_cluster_idx_.
it | Iterator of distance to be removed from distances_ |
References GridBasedClustering< Metric >::distance_it_for_cluster_idx_, GridBasedClustering< Metric >::distances_, and GridBasedClustering< Metric >::reverse_nns_.
Referenced by GridBasedClustering< Metric >::cluster().
|
inline |
extends clusters in y-direction if possible (merges clusters further in y-direction, i.e. clusters can now span multiple cells)
References ClusteringGrid::addCluster(), GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::clusters_final_, ClusteringGrid::getClusters(), ClusteringGrid::getGridSpacingX(), ClusteringGrid::getGridSpacingY(), ClusteringGrid::getIndex(), GridBasedClustering< Metric >::grid_, ClusteringGrid::isNonEmptyCell(), and ClusteringGrid::removeCluster().
|
inlineprivate |
determines the nearest neighbour for each cluster
cluster | cluster for which the nearest neighbour should be found |
cluster_index | index of cluster |
References GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::clusters_, GridBasedClustering< Metric >::clusters_final_, GridBasedClustering< Metric >::distance_it_for_cluster_idx_, GridBasedClustering< Metric >::distances_, GridBasedCluster::getCentre(), ClusteringGrid::getClusters(), ClusteringGrid::getIndex(), GridBasedClustering< Metric >::grid_, ClusteringGrid::isNonEmptyCell(), GridBasedClustering< Metric >::mergeVeto_(), GridBasedClustering< Metric >::metric_, and GridBasedClustering< Metric >::reverse_nns_.
Referenced by GridBasedClustering< Metric >::cluster(), and GridBasedClustering< Metric >::init_().
|
inline |
returns final results (mapping of cluster indices to clusters)
References GridBasedClustering< Metric >::clusters_final_.
|
inlineprivate |
initialises all data structures
data_x | x-coordinates of points to be clustered |
data_y | y-coordinates of points to be clustered |
properties_A | property A of points (same in each cluster) |
properties_B | property B of points (different in each cluster) |
References ClusteringGrid::addCluster(), GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::clusters_, GridBasedClustering< Metric >::findNearestNeighbour_(), ClusteringGrid::getIndex(), GridBasedClustering< Metric >::grid_, and ClusteringGrid::removeCluster().
Referenced by GridBasedClustering< Metric >::GridBasedClustering().
|
inlineprivate |
checks if two clusters can be merged Each point in a cluster can (optionally) have two properties A and B. Properties A need to be the same, properties B need to differ in each cluster. Methods checks if this is violated in the merged cluster.
c1 | cluster 1 |
c2 | cluster 2 |
References GridBasedCluster::getPropertiesB(), and GridBasedCluster::getPropertyA().
Referenced by GridBasedClustering< Metric >::findNearestNeighbour_().
|
inline |
removes clusters with bounding box dimension in y-direction below certain threshold
threshold_y | minimal dimension of the cluster bounding box |
References GridBasedClustering< Metric >::clusters_final_, DIntervalBase< D >::maxY(), and DIntervalBase< D >::minY().
|
private |
list of clusters maps cluster indices to clusters
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::findNearestNeighbour_(), and GridBasedClustering< Metric >::init_().
|
private |
list of final clusters i.e. clusters that are no longer merged
Referenced by GridBasedClustering< Metric >::extendClustersY(), GridBasedClustering< Metric >::findNearestNeighbour_(), GridBasedClustering< Metric >::getResults(), and GridBasedClustering< Metric >::removeSmallClustersY().
|
private |
cluster index to distance iterator lookup table for finding out which clusters need to be updated faster
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::eraseMinDistance_(), and GridBasedClustering< Metric >::findNearestNeighbour_().
|
private |
list of minimum distances stores the smallest of the distances in the head
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::eraseMinDistance_(), and GridBasedClustering< Metric >::findNearestNeighbour_().
|
private |
grid on which the position of the clusters are registered used in cluster method
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::extendClustersY(), GridBasedClustering< Metric >::findNearestNeighbour_(), and GridBasedClustering< Metric >::init_().
|
private |
metric for measuring the distance between points in the 2D plane
Referenced by GridBasedClustering< Metric >::findNearestNeighbour_().
|
private |
reverse nearest neighbor lookup table for finding out which clusters need to be updated faster
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::eraseMinDistance_(), and GridBasedClustering< Metric >::findNearestNeighbour_().