Release Date: 2015-July-17
New features:
- Improved Latent Dirichlet Allocation (LDA) Performance
- Function lda_train() is about twice as fast.
- Improved the scalability of the function
(vocabulary size x number of topics can be up to 250 million).
- New module: Matrix operations
Added the following operations/functions for dense and sparse matrices:- Mathematical operations: addition, subtraction, multiplication,
element-wise multiplication, scalar and vector multiplication. - Aggregation operations: apply various operations including
max, min, sum, mean along a specified dimension. - Visitor methods: extract row/column from matrix.
- Representation: convert a matrix to either dense or sparse representation.
- Mathematical operations: addition, subtraction, multiplication,
- Quotation and International Character Support
- Most modules now support table and column names that are quoted and
contain international characters, including:- Regression models (GLMs, linear regression, elastic net, etc.)
- Decision trees and random forests
- Unsupervised learning models (association rules, k-means, LDA, etc.)
- Summary, Pearson's correlation, and PCA
- Most modules now support table and column names that are quoted and
- Array Norms and Distances
- Generic p-norm distance
- Jaccard distance
- Cosine similarity
- Text Analysis:
- Text utility for term frequency and vacabulary construction (prepares
documents for input to LDA).
- Text utility for term frequency and vacabulary construction (prepares
- Miscellaneous
- Improved organization of User and Developer guide at doc.madlib.net/latest.
- Low-rank matrix factorization: added 32-bit integer aupport (MADLIB-903).
- Cross-validation: added classification support (MADLIB-908).
- Added a new clean-up function for removing MADlib temporary tables.
Note:
- LDA models that are trained using MADlib v1.7.1 or earlier need to be
re-trained to be used in MADlib v1.8.
Known issues:
- Performance for decision tree with cross-validation is poor on a HAWQ
multi-node system.