FUNCTION CALLS
Calls METRIC | DISTANCE
Class Description | Language | Constructor | ()-Operator | Default Parameters |
---|---|---|---|---|
Sorensen Distance | Python | f = distance.sorensen | result = f(dataA, dataB) | |
C++ | auto f = metric::sorensen | auto result = f(dataA, dataB) | ||
Euclidean Distance Metric | Python | f = distance.Euclidean() | result = f(dataA, dataB) | |
C++ | auto f = metric::Euclidean() | auto result = f(dataA, dataB) | ||
Manhatten Distance Metric | Python | f = distance.Manhatten() | result = f(dataA, dataB) | |
C++ | auto f = metric::Manhatten() | auto result = f(dataA, dataB) | ||
Minkowski (L general) Metric (P_norm) | Python | f = distance.P_norm(p=1) | result = f(dataA, dataB) | defaults: p=1 |
C++ | auto f = metric::P_norm(1) | auto result = f(dataA, dataB) | defaults: p=1 | |
Euclidean Metric Threshold | Python | f = distance.Euclidean_thresholded(thres=1, factor=3) | result = f(dataA, dataB) | defaults: thres=1000, factor=3000 |
C++ | auto f = metric::Euclidean_thresholded(1, 3) | auto result = f(dataA, dataB) | defaults: thres=1000, factor=3000 | |
Cosine Metric | Python | f = distance.Cosine() | result = f(dataA, dataB) | |
C++ | auto f = metric::Cosine() | auto result = f(dataA, dataB) | ||
Chebyshev Distance Metric (Maximum value distance) | Python | f = distance.Chebyshev() | result = f(dataA, dataB) | |
C++ | auto f = metric::Chebyshev() | auto result = f(dataA, dataB) | ||
Earth Mover's Distance Metric (EMD) | Python | f = distance.EMD(cost_mat, extra_mass_penalty) | result = f(dataA, dataB) | defaults: cost_mat={}, extra_mass_penalty=-1 |
C++ | auto f = metric::EMD(cost_mat, max_cost) | auto result = f(dataA, dataB) | defaults: extra_mass_penalty=-1 | |
Edit Distance Metric | Python | f = distance.Edit() | result = f("asdsd", "dffdf") | |
C++ | auto f = metric::Edit | auto result = f("asdsd", "dffdf") | ||
Structural Similarity Index (SSIM) | Python | f = distance.SSIM(dynamic_range=100, masking=1) | result = f(img1, img2) | defaults: dynamic_range=255, masking=2 |
C++ | auto f = metric::SSIM<double, std::vector>(100, 1) | auto result = f(img1, img2) | ||
Time Warp Edit Distance (TWED) | Python | f = distance.TWED(penalty=1, elastic=2) | result = f(dataA, dataB) | defaults: penalty=0, elastic=1 |
C++ | auto f = metric::TWED(1, 2) | auto result = f(dataA, dataB) | ||
Kohonen Distance Metric | Python | f = distance.Kohonen(train_data, w, h) | result = f(sample1, sample2) | defaults: start_learn_rate=0.8, finish_learn_rate=0.0, iterations=20 |
C++ | auto f = metric::kohonen_distance(train_data, w, h) | auto result = f(sample1, sample2) | defaults: start_learn_rate=0.8, finish_learn_rate=0.0, iterations=20 | |
Riemannian Distance Metric | C++ | auto rd = metric::RiemannianDistance<void, metric::Euclidean>() | auto result = rd(ds1, ds2) | defaults: metric=metric::Euclidean |
Calls METRIC | SPACE
Class Description | Language | Constructor | ()-Operator | Parameters |
---|---|---|---|---|
Distance matrix | Python | f = space.Matrix(data, Euclidean()) | result = f(i, j) | Default Parameters: data = {}, metric=Euclidean() |
Distance matrix | C++ | auto f = metric::Matrix<std::vector>(data) | auto result = f(i, j) | constructor defaults: d = Metric() /template argument/ |
A Search Tree works like a std-container to store data of some structure. Basically a Metric Search Tree has the same principle as a binary search tree or a kd-tree, but it works for arbitrary data structures. This Metric Search Tree is basically a Cover Tree Implementation. Additionally to the distiance (or similarity) between the data, a covering distance from level to level decides how the tree grows. | Python | f = space.Tree() | auto result = f(i, j) | |
C++ | auto f = metric::Tree<std::vector>() | auto result = f(i, j) |
Calls METRIC | MAPPING
Class Description | Language | Constructor | ()-Operator | Encode | Decode | Default Parameters |
---|---|---|---|---|---|---|
Domain split principle components compressor | Python | f = mapping.DSPCC(dataset, n_features=1) | - | f.encode(data) | result = f.decode(codes) | defaults: n_features=1, n_subbands=4, time_freq_balance=0.5, n_top_features=16 |
C++ | auto f = metric::DSPCC<vector, void>(dataset, 1) | defaults: n_features=1, n_subbands=4, time_freq_balance=0.5, n_top_features=16 | ||||
Kohonen Distance Clustering is one of the Neural Network unsupervised learning algorithms. This algorithm is used in solving problems in various areas, especially in clustering complex data sets. | Python | f = mapping.KOC_factory(w, h) | koc = f(samples, num_clusters) | - | - | defaults: nodes_width=5, nodes_height=4, anomaly_sigma=1.0, start_learn_rate=0.8, finish_learn_rate=0, iterations=20, distribution_min=-1, distribution_max=1, min_cluster_size=1, metric=distance.Euclidean() |
C++ | auto f = mapping::KOC_factory<std::vector, metric::Grid6, metric::Euclidean>(w, h) | auto koc = f(samples, num_clusters) | - | - | defaults: nodes_width=5, nodes_height=4, anomaly_sigma=1.0, start_learn_rate=0.8, finish_learn_rate=0, iterations=20, distribution_min=-1, distribution_max=1, min_cluster_size=1 | |
Autoencoder is an unsupervised artificial neural network that learns how to efficiently compress and encode data then learns how to reconstruct the data back from the reduced encoded representation to a representation that is as close to the original input as possible. It reduces data dimensions by learning how to ignore the noise in the data. | Python | f = mapping.Autoencoder() | - | result = f.encode(sample) | result = f.decode(sample) | |
C++ | auto f = metric::Autoencoder<uint8_t, double>() | - | auto result = f.encode(sample) | result = f.decode(sample) | ||
dbscan (Density-based spatial clustering of applications with noise) is a data clustering non-parametric algorithm. Given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. | Python | f = mapping.dbscan | assignments, seeds, counts = f(matrix, eps, minpts) | - | - | |
dbscan | C++ | auto f = mapping::dbscan<std::vector> | auto result = f(matrix, eps, minpts) | |||
ESN (echo state network) is a recurrent neural network with a sparsely connected hidden layer (with typically 1% connectivity). The connectivity and weights of hidden neurons are fixed and randomly assigned | Python | f = partial(mapping.ESN(w_size=400).train(slices, target).predict) | result = f(slices) | - | - | defaults: w_size=500, w_connections=10, w_sr=0.6, alpha=0.5, washout=1, beta=0.5 |
ESN | C++ | auto f = metric::ESN<std::vector, Euclidean>() | - | - | defaults: w_size=500, w_connections=10, w_sr=0.6, alpha=0.5, washout=1, beta=0.5 | |
affprop (Affinity propagation) is a clustering algorithm based on message passing between data points. Similar to K-medoids, it looks at the (dis)similarities in the data, picks one exemplar data point for each cluster, and assigns every point in the data set to the cluster with the closest exemplar. | Python | f = partial(mapping.affprop, preference=1.0, maxiter=100); | result = f(data) | - | - | defaults: preference=0.5, maxiter=200, tol=1.0e-6, damp=0.5 |
affprop | C++ | auto f = mapping.affprop<std::vector> | auto result = f(data) | defaults: preference=0.5, maxiter=200, tol=1.0e-6, damp=0.5 | ||
kmeans clustering is a method of vector quantization, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. | Python | f = partial(mapping.kmeans, maxiter=100, distance_measure='manhatten') | result = f(data, k) | - | - | defaults: maxiter=200, distance_measure='euclidean', random_seed=-1 |
kmeans | C++ | auto f = metric::kmeans | auto result = f(data, k) | defaults: maxiter=200, distance_measure='euclidean', random_seed=-1 | ||
kmedoids is a classical partitioning technique of clustering, which clusters the data set of n objects into k clusters, with the number k of clusters assumed known a priori (which implies that the programmer must specify k before the execution of the algorithm). | Python | f = mapping.kmedoids | result = f(data, k) | - | - | |
kmedoids | C++ | auto f = metric::kmedoids<std::vector> | auto result = f(data, k) |
Calls METRIC | TRANSFORM
Class Description | Language | Constructor | ()-Operator |
---|---|---|---|
Discrete wavelet transform(DWT) is any wavelet transform for which the wavelets are discretely sampled. As with other wavelet transforms, a key advantage it has over Fourier transforms is temporal resolution: it captures both frequency and location information. | Python | f = partial(transform.dwt, wavelet_type=3) | result = f(a) |
C++ | auto f = metric::dwt<std::vector> | auto result = f(a) | |
The idwt command performs a single-level one-dimensional wavelet reconstruction. | Python | f = partial(transform.idwt, wavelet_type=1, lx=3) | result = f(a, b) |
C++ | auto f =metric::idwt<std::vector> | auto result = f(a, b) | |
wmaxlev returns the maximum level L possible for a wavelet decomposition of a signal or image of size size_x. The maximum level is the last level for which at least one coefficient is correct. | Python | f = partial(transform.wmaxlev, wavelet_type=t) | result = f(size_x) |
C++ | auto f = metric::wmaxlev | auto result = f(size_x) |
Calls METRIC | CORRELATION
Class Description | Language | Constructor | ()-Operator | Estimate | Parameters |
---|---|---|---|---|---|
Use MGC (Multiscale Graph Correlation) as correlation coefficient to find nonlinear dependencies in data sets. It is optimized for small data set sizes. | C++ | auto f = metric::MGC<void, Euclidean,void, Manhatten>() | auto result = f(dataA, dataB) | auto result= f.estimate(dataA, dataB) | Default parameters constructor: metric1=Euclidean(), metric2=Euclidean() Default parameters estimate: b_sample_size=250, threshold=0.05, max_iterations=1000 |
Python | f = correlation.MGC() | auto result = f(dataA, dataB) | auto result= f.estimate(dataA, dataB) | Default parameters constructor: metric1=Euclidean(), metric2=Euclidean(); Default parameters estimate: b_sample_size=250, threshold=0.05, max_iterations=1000 | |
Calculate metric entropy, which gives a measure of instrinsic local dimensionality | C++ | auto f = metric::Entropy<void, Manhatten>(metric, k, p, exp) | auto result = f(dataA) | auto result= f.estimate(data) | Default parameters constructor: metric=Euclidean(), k=7, p=25, exp=False Default parameters estimate: samples_size=250, threshold=0.05, max_iterations=1000 |
Python | f = distance.Entropy(metric=Manhatten(), k=3, p=30) | result = f(dataA) | result = f.estimate(data) | Default parameters constructor: metric=Euclidean(), k=7, p=25, exp=False Default parameters estimate: samples_size=250, threshold=0.05, max_iterations=1000 | |
VMixing | C++ | auto f = metric::VMixing<void, metric::Euclidean>(metric::Euclidean(), 7, 50) | auto result = f(dataA, dataB) | auto result = f.estimate(dataA, dataB) | defaults: constructor: metric=metric::Euclidean(), k=3, p=25; .estimate: sampleSize = 250, threshold = 0.05, maxIterations = 1000 |
VMixing_simple | C++ | auto f = metric::VMixing_simple<void, metric::Euclidean>(metric::Euclidean(), 7) | auto result = f(dataA, dataB) | auto result = f.estimate(dataA, dataB) | defaults: constructor: metric=metric::Euclidean(), k=3; .estimate: sampleSize = 250, threshold = 0.05, maxIterations = 1000 |
Calls METRIC | UTILS
Class Description | Language | Constructor | ()-Operator | Parameters |
---|---|---|---|---|
The goal of resistance sparsification of graphs is to find a sparse subgraph (with reweighted edges) that approximately preserves the effective resistances between every pair of nodes. | C++ | auto f = metric::sparsify_effective_resistance | auto result = f(data) | Default params: ep=0.3, max_conc_const=4.0, jl_fac=4.0 |
Python | f = partial(utils.sparsify_effective_resistance, ep=0.1) | result = f(data) | Default params: ep=0.3, max_conc_const=4.0, jl_fac=4.0 | |
A minimum spanning tree is a graph consisting of the subset of edges which together connect all connected nodes, while minimizing the total sum of weights on the edges. This is computed using the Kruskal algorithm. | C++ | auto f = metric::sparsify_spanning_tree | auto result = f(data) | Default params: minimum=true |
Python | f = partial(utils.sparsify_spanning_tree, minimum=False) | result = f(data) | Default params: minimum=True |