Skip to content

MADlib v1.8

Latest
Compare
Choose a tag to compare
@iyerr3 iyerr3 released this 21 Mar 22:49
· 69 commits to placeholder since this release

Release Date: 2015-July-17

New features:

  • Improved Latent Dirichlet Allocation (LDA) Performance
    • Function lda_train() is about twice as fast.
    • Improved the scalability of the function
      (vocabulary size x number of topics can be up to 250 million).
  • New module: Matrix operations
    Added the following operations/functions for dense and sparse matrices:
    • Mathematical operations: addition, subtraction, multiplication,
      element-wise multiplication, scalar and vector multiplication.
    • Aggregation operations: apply various operations including
      max, min, sum, mean along a specified dimension.
    • Visitor methods: extract row/column from matrix.
    • Representation: convert a matrix to either dense or sparse representation.
  • Quotation and International Character Support
    • Most modules now support table and column names that are quoted and
      contain international characters, including:
      • Regression models (GLMs, linear regression, elastic net, etc.)
      • Decision trees and random forests
      • Unsupervised learning models (association rules, k-means, LDA, etc.)
      • Summary, Pearson's correlation, and PCA
  • Array Norms and Distances
    • Generic p-norm distance
    • Jaccard distance
    • Cosine similarity
  • Text Analysis:
    • Text utility for term frequency and vacabulary construction (prepares
      documents for input to LDA).
  • Miscellaneous
    • Improved organization of User and Developer guide at doc.madlib.net/latest.
    • Low-rank matrix factorization: added 32-bit integer aupport (MADLIB-903).
    • Cross-validation: added classification support (MADLIB-908).
    • Added a new clean-up function for removing MADlib temporary tables.

Note:

  • LDA models that are trained using MADlib v1.7.1 or earlier need to be
    re-trained to be used in MADlib v1.8.

Known issues:

  • Performance for decision tree with cross-validation is poor on a HAWQ
    multi-node system.