Skip to content

Commit

Permalink
[#162] Add lightgbm docs to sphinx-docs/models.md
Browse files Browse the repository at this point in the history
  • Loading branch information
riley-harper committed Nov 22, 2024
1 parent 7dcc81d commit 6ae1f4e
Show file tree
Hide file tree
Showing 6 changed files with 149 additions and 4 deletions.
51 changes: 50 additions & 1 deletion docs/_sources/models.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ chosen_model = {

*Added in version 3.8.0.*

This is an alternate, high-performance implementation of gradient boosting.
XGBoost is an alternate, high-performance implementation of gradient boosting.
It uses [xgboost.spark.SparkXGBClassifier](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.spark.SparkXGBClassifier).
Since the XGBoost-PySpark integration which the xgboost Python package provides
is currently unstable, support for the xgboost model type is disabled in hlink
Expand Down Expand Up @@ -122,3 +122,52 @@ chosen_model = {
threshold_ratio = 1.5
}
```

## lightgbm

*Added in version 3.8.0.*

LightGBM is another alternate, high-performance implementation of gradient
boosting. It uses
[synapse.ml.lightgbm.LightGBMClassifier](https://mmlspark.blob.core.windows.net/docs/1.0.8/pyspark/synapse.ml.lightgbm.html#module-synapse.ml.lightgbm.LightGBMClassifier).
`synapse.ml` is a library which provides various integrations with PySpark,
including integrations between the C++ LightGBM library and PySpark.

LightGBM requires some additional Scala libraries that hlink does not usually
install, so support for the lightgbm model is disabled in hlink by default.
hlink will stop with an error if you try to use this model type without
enabling support for it. To enable support for lightgbm, install hlink with the
`lightgbm` extra.

```
pip install hlink[lightgbm]
```

This installs the lightgbm package and its Python dependencies. Depending on
your machine and operating system, you may also need to install the libomp
library, which is another dependency of lightgbm. If you encounter errors when
training a lightgbm model, please try installing libomp if you do not have it
installed.

lightgbm has an enormous number of available parameters. Many of these are
available as normal in hlink, via the [LightGBMClassifier
class](https://mmlspark.blob.core.windows.net/docs/1.0.8/pyspark/synapse.ml.lightgbm.html#module-synapse.ml.lightgbm.LightGBMClassifier).
Others are available through the special `passThroughArgs` parameter, which
passes additional parameters through to the C++ library. You can see a full
list of the supported parameters
[here](https://lightgbm.readthedocs.io/en/latest/Parameters.html).

```
chosen_model = {
type = "lightgbm",
# LightGBMClassifier supports these parameters (and many more).
maxDepth = 5,
learningRate = 0.5,
# LightGBMClassifier does not directly support this parameter,
# so we have to send it to the C++ library with passThroughArgs.
passThroughArgs = "force_row_wise=true",
# hlink's threshold and threshold_ratio
threshold = 0.8,
threshold_ratio = 1.5,
}
```
1 change: 1 addition & 0 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ <h1>Configuration API<a class="headerlink" href="#configuration-api" title="Link
<li class="toctree-l2"><a class="reference internal" href="models.html#decision-tree">decision_tree</a></li>
<li class="toctree-l2"><a class="reference internal" href="models.html#gradient-boosted-trees">gradient_boosted_trees</a></li>
<li class="toctree-l2"><a class="reference internal" href="models.html#xgboost">xgboost</a></li>
<li class="toctree-l2"><a class="reference internal" href="models.html#lightgbm">lightgbm</a></li>
</ul>
</li>
</ul>
Expand Down
46 changes: 45 additions & 1 deletion docs/models.html
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ <h2>gradient_boosted_trees<a class="headerlink" href="#gradient-boosted-trees" t
<section id="xgboost">
<h2>xgboost<a class="headerlink" href="#xgboost" title="Link to this heading"></a></h2>
<p><em>Added in version 3.8.0.</em></p>
<p>This is an alternate, high-performance implementation of gradient boosting.
<p>XGBoost is an alternate, high-performance implementation of gradient boosting.
It uses <a class="reference external" href="https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.spark.SparkXGBClassifier">xgboost.spark.SparkXGBClassifier</a>.
Since the XGBoost-PySpark integration which the xgboost Python package provides
is currently unstable, support for the xgboost model type is disabled in hlink
Expand All @@ -163,6 +163,49 @@ <h2>xgboost<a class="headerlink" href="#xgboost" title="Link to this heading">¶
</pre></div>
</div>
</section>
<section id="lightgbm">
<h2>lightgbm<a class="headerlink" href="#lightgbm" title="Link to this heading"></a></h2>
<p><em>Added in version 3.8.0.</em></p>
<p>LightGBM is another alternate, high-performance implementation of gradient
boosting. It uses
<a class="reference external" href="https://mmlspark.blob.core.windows.net/docs/1.0.8/pyspark/synapse.ml.lightgbm.html#module-synapse.ml.lightgbm.LightGBMClassifier">synapse.ml.lightgbm.LightGBMClassifier</a>.
<code class="docutils literal notranslate"><span class="pre">synapse.ml</span></code> is a library which provides various integrations with PySpark,
including integrations between the C++ LightGBM library and PySpark.</p>
<p>LightGBM requires some additional Scala libraries that hlink does not usually
install, so support for the lightgbm model is disabled in hlink by default.
hlink will stop with an error if you try to use this model type without
enabling support for it. To enable support for lightgbm, install hlink with the
<code class="docutils literal notranslate"><span class="pre">lightgbm</span></code> extra.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pip</span> <span class="n">install</span> <span class="n">hlink</span><span class="p">[</span><span class="n">lightgbm</span><span class="p">]</span>
</pre></div>
</div>
<p>This installs the lightgbm package and its Python dependencies. Depending on
your machine and operating system, you may also need to install the libomp
library, which is another dependency of lightgbm. If you encounter errors when
training a lightgbm model, please try installing libomp if you do not have it
installed.</p>
<p>lightgbm has an enormous number of available parameters. Many of these are
available as normal in hlink, via the <a class="reference external" href="https://mmlspark.blob.core.windows.net/docs/1.0.8/pyspark/synapse.ml.lightgbm.html#module-synapse.ml.lightgbm.LightGBMClassifier">LightGBMClassifier
class</a>.
Others are available through the special <code class="docutils literal notranslate"><span class="pre">passThroughArgs</span></code> parameter, which
passes additional parameters through to the C++ library. You can see a full
list of the supported parameters
<a class="reference external" href="https://lightgbm.readthedocs.io/en/latest/Parameters.html">here</a>.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">chosen_model</span> <span class="o">=</span> <span class="p">{</span>
<span class="nb">type</span> <span class="o">=</span> <span class="s2">&quot;lightgbm&quot;</span><span class="p">,</span>
<span class="c1"># LightGBMClassifier supports these parameters (and many more).</span>
<span class="n">maxDepth</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span>
<span class="n">learningRate</span> <span class="o">=</span> <span class="mf">0.5</span><span class="p">,</span>
<span class="c1"># LightGBMClassifier does not directly support this parameter,</span>
<span class="c1"># so we have to send it to the C++ library with passThroughArgs.</span>
<span class="n">passThroughArgs</span> <span class="o">=</span> <span class="s2">&quot;force_row_wise=true&quot;</span><span class="p">,</span>
<span class="c1"># hlink&#39;s threshold and threshold_ratio</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="mf">0.8</span><span class="p">,</span>
<span class="n">threshold_ratio</span> <span class="o">=</span> <span class="mf">1.5</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
</div>
</section>
</section>


Expand Down Expand Up @@ -214,6 +257,7 @@ <h1 class="logo"><a href="index.html">hlink</a></h1>
<li class="toctree-l2"><a class="reference internal" href="#decision-tree">decision_tree</a></li>
<li class="toctree-l2"><a class="reference internal" href="#gradient-boosted-trees">gradient_boosted_trees</a></li>
<li class="toctree-l2"><a class="reference internal" href="#xgboost">xgboost</a></li>
<li class="toctree-l2"><a class="reference internal" href="#lightgbm">lightgbm</a></li>
</ul>
</li>
</ul>
Expand Down
2 changes: 1 addition & 1 deletion docs/searchindex.js

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions sphinx-docs/.metals/metals.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
2024.11.22 16:26:12 INFO tracing is disabled for protocol LSP, to enable tracing of incoming and outgoing JSON messages create an empty file at /Users/rileyh/projects/hlink/sphinx-docs/.metals/lsp.trace.json or /Users/rileyh/Library/Caches/org.scalameta.metals/lsp.trace.json
2024.11.22 16:26:13 INFO logging to file /Users/rileyh/projects/hlink/.metals/metals.log
51 changes: 50 additions & 1 deletion sphinx-docs/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ chosen_model = {

*Added in version 3.8.0.*

This is an alternate, high-performance implementation of gradient boosting.
XGBoost is an alternate, high-performance implementation of gradient boosting.
It uses [xgboost.spark.SparkXGBClassifier](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.spark.SparkXGBClassifier).
Since the XGBoost-PySpark integration which the xgboost Python package provides
is currently unstable, support for the xgboost model type is disabled in hlink
Expand Down Expand Up @@ -122,3 +122,52 @@ chosen_model = {
threshold_ratio = 1.5
}
```

## lightgbm

*Added in version 3.8.0.*

LightGBM is another alternate, high-performance implementation of gradient
boosting. It uses
[synapse.ml.lightgbm.LightGBMClassifier](https://mmlspark.blob.core.windows.net/docs/1.0.8/pyspark/synapse.ml.lightgbm.html#module-synapse.ml.lightgbm.LightGBMClassifier).
`synapse.ml` is a library which provides various integrations with PySpark,
including integrations between the C++ LightGBM library and PySpark.

LightGBM requires some additional Scala libraries that hlink does not usually
install, so support for the lightgbm model is disabled in hlink by default.
hlink will stop with an error if you try to use this model type without
enabling support for it. To enable support for lightgbm, install hlink with the
`lightgbm` extra.

```
pip install hlink[lightgbm]
```

This installs the lightgbm package and its Python dependencies. Depending on
your machine and operating system, you may also need to install the libomp
library, which is another dependency of lightgbm. If you encounter errors when
training a lightgbm model, please try installing libomp if you do not have it
installed.

lightgbm has an enormous number of available parameters. Many of these are
available as normal in hlink, via the [LightGBMClassifier
class](https://mmlspark.blob.core.windows.net/docs/1.0.8/pyspark/synapse.ml.lightgbm.html#module-synapse.ml.lightgbm.LightGBMClassifier).
Others are available through the special `passThroughArgs` parameter, which
passes additional parameters through to the C++ library. You can see a full
list of the supported parameters
[here](https://lightgbm.readthedocs.io/en/latest/Parameters.html).

```
chosen_model = {
type = "lightgbm",
# LightGBMClassifier supports these parameters (and many more).
maxDepth = 5,
learningRate = 0.5,
# LightGBMClassifier does not directly support this parameter,
# so we have to send it to the C++ library with passThroughArgs.
passThroughArgs = "force_row_wise=true",
# hlink's threshold and threshold_ratio
threshold = 0.8,
threshold_ratio = 1.5,
}
```

0 comments on commit 6ae1f4e

Please sign in to comment.