Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lda v1 #311

Open
wants to merge 6 commits into
base: lda_output_fix_final
Choose a base branch
from
Open

Commits on Feb 3, 2018

  1. SVM: Add minibatch as a new solver

    This work is based on the original work by
    Xiaocheng Tang <[email protected]> in madlib#75.
    
    This PR adds two main features:
    
    - A Minibatch solver that takes as input a batch of data
    - SVM code that takes advantage of the minibatch
    
    Closes madlib#229
    
    Co-authored-by: Nikhil Kak <[email protected]>
    Co-authored-by: Xiaocheng Tang <[email protected]>
    3 people committed Feb 3, 2018
    Configuration menu
    Copy the full SHA
    a8bbe08 View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2018

  1. Fix lda output inconsistency bug and add install check test

    JIRA: MADLIB-1201
    
    Fixed the issue of output of lda_train and lda_get_word_topic_count
    not matching each other. Added test case in install check.
    See jira for more details and example.
    
    Also added a install check that validates that the output of lda_train and
    lda_get_word_topic_count are consistent with each other.
    See jira for more details and example.
    Jingyi Mei and Nikhil Kak authored and Jingyi Mei committed Feb 7, 2018
    Configuration menu
    Copy the full SHA
    a99883d View commit details
    Browse the repository at this point in the history
  2. LDA: Add helper function to map wordid and topicid

    JIRA: MADLIB-1160
    
    This commit adds a helper function, which will map each wordid with
    corresponding topicid that get assigned in output table. Duplicate lines
    are removed from the final result.
    
    Also adds a workaround for GPDB4.3 svec
    
    In GPDB4.3, we cannot call madlib.svec directly on a text
    format.Instead, we have to call madlib.svec_from_string to convert the
    text. This commit fix this issue so the new helper function
    madlib.lda_get_word_topic_mapping can work on both gpdb5 and gpdb4.
    Jingyi Mei committed Feb 7, 2018
    Configuration menu
    Copy the full SHA
    f066423 View commit details
    Browse the repository at this point in the history
  3. Address LDA topicid index inconsistency issue

    JIRA:MADLIB-1160
    
    This commit fixes the topicid inconsistency in madlib.lda_train
    and madlib.lda_get_topic_desc, where the former one uses 0 based index
    and the latter uses 1 index. Now they will all start at 0.
    Jingyi Mei committed Feb 7, 2018
    Configuration menu
    Copy the full SHA
    a062acb View commit details
    Browse the repository at this point in the history
  4. Fix LDA lda_get_topic_desc getting wrong top_k words issue

    JIRA: MADLIB-1160
    
    Previously, madlib.lda_get_topic_desc got top k - 1 words in the result
    table. This commit fixed it to be top k.
    Jingyi Mei committed Feb 7, 2018
    Configuration menu
    Copy the full SHA
    7569049 View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2018

  1. Configuration menu
    Copy the full SHA
    e9a51fc View commit details
    Browse the repository at this point in the history