Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dusty-nv committed Apr 16, 2024
1 parent cd42479 commit 091bb32
Show file tree
Hide file tree
Showing 10 changed files with 55 additions and 36 deletions.
8 changes: 4 additions & 4 deletions _modules/nano_llm/agents/video_stream.html
Original file line number Diff line number Diff line change
Expand Up @@ -98,12 +98,12 @@ <h1>Source code for nano_llm.agents.video_stream</h1><div class="highlight"><pre

<span class="sd"> For example, this will capture a V4L2 camera and serve it via WebRTC with H.264 encoding:</span>
<span class="sd"> </span>
<span class="sd"> .. code-block:: console</span>
<span class="sd"> .. code-block:: text</span>
<span class="sd"> </span>
<span class="sd"> python3 -m local_llm.agents.video_stream \</span>
<span class="sd"> --video-input /dev/video0 \</span>
<span class="sd"> python3 -m local_llm.agents.video_stream \ </span>
<span class="sd"> --video-input /dev/video0 \ </span>
<span class="sd"> --video-output webrtc://@:8554/output</span>
<span class="sd"> </span>
<span class="sd"> </span>
<span class="sd"> It&#39;s also used as a basic test of video streaming before using more complex agents that rely on it.</span>
<span class="sd"> &quot;&quot;&quot;</span>
<div class="viewcode-block" id="VideoStream.__init__">
Expand Down
12 changes: 10 additions & 2 deletions _modules/nano_llm/chat/stream.html
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,14 @@
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<p class="caption" role="heading"><span class="caption-text">Documentation:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../install.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../models.html">Models</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../chat.html">Chat</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../plugins.html">Plugins</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../agents.html">Agents</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../webserver.html">Websockets</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../webserver.html">Webserver</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../utilities.html">Utilities</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../releases.html">Release Notes</a></li>
</ul>
Expand Down Expand Up @@ -148,6 +149,13 @@ <h1>Source code for nano_llm.chat.stream</h1><div class="highlight"><pre>

<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_message_delta</span><span class="p">()</span></div>


<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">eos</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Returns true if End of Sequence (EOS) and generation has stopped.</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">stopped</span>

<div class="viewcode-block" id="StreamingResponse.stop">
<a class="viewcode-back" href="../../../models.html#nano_llm.StreamingResponse.stop">[docs]</a>
Expand Down
11 changes: 6 additions & 5 deletions _modules/nano_llm/nano_llm.html
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,14 @@
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<p class="caption" role="heading"><span class="caption-text">Documentation:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../install.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../models.html">Models</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chat.html">Chat</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../plugins.html">Plugins</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../agents.html">Agents</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../webserver.html">Websockets</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../webserver.html">Webserver</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../utilities.html">Utilities</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../releases.html">Release Notes</a></li>
</ul>
Expand Down Expand Up @@ -116,8 +117,8 @@ <h1>Source code for nano_llm.nano_llm</h1><div class="highlight"><pre>
<span class="sd"> api (str): the model backend API to use: &#39;auto_gptq&#39;, &#39;awq&#39;, &#39;mlc&#39;, or &#39;hf&#39;</span>
<span class="sd"> if left as None, it will attempt to be automatically determined.</span>

<span class="sd"> quant (str): for AWQ or MLC, either specify the quantization method,</span>
<span class="sd"> or the path to the quantized model (AWQ and MLC API&#39;s only)</span>
<span class="sd"> quantization (str): for AWQ or MLC, either specify the quantization method,</span>
<span class="sd"> or the path to the quantized model (AWQ and MLC API&#39;s only)</span>

<span class="sd"> vision_model (str): for VLMs, override the vision embedding model </span>
<span class="sd"> (typically `openai/clip-vit-large-patch14-336 &lt;https://huggingface.co/openai/clip-vit-large-patch14-336&gt;`_).</span>
Expand All @@ -134,7 +135,7 @@ <h1>Source code for nano_llm.nano_llm</h1><div class="highlight"><pre>
<span class="n">model_name</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">basename</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>

<span class="k">if</span> <span class="ow">not</span> <span class="n">api</span><span class="p">:</span>
<span class="n">api</span> <span class="o">=</span> <span class="n">default_model_api</span><span class="p">(</span><span class="n">model_path</span><span class="p">,</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;quant&#39;</span><span class="p">))</span>
<span class="n">api</span> <span class="o">=</span> <span class="n">default_model_api</span><span class="p">(</span><span class="n">model_path</span><span class="p">,</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;quantization&#39;</span><span class="p">))</span>

<span class="n">kwargs</span><span class="p">[</span><span class="s1">&#39;name&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">model_name</span>
<span class="n">kwargs</span><span class="p">[</span><span class="s1">&#39;api&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">api</span>
Expand Down
15 changes: 8 additions & 7 deletions _modules/nano_llm/utils/args.html
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,14 @@
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<p class="caption" role="heading"><span class="caption-text">Documentation:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../install.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../models.html">Models</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../chat.html">Chat</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../plugins.html">Plugins</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../agents.html">Agents</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../webserver.html">Websockets</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../webserver.html">Webserver</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../utilities.html">Utilities</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../releases.html">Release Notes</a></li>
</ul>
Expand Down Expand Up @@ -111,7 +112,7 @@ <h1>Source code for nano_llm.utils.args</h1><div class="highlight"><pre>
<span class="k">if</span> <span class="s1">&#39;model&#39;</span> <span class="ow">in</span> <span class="n">extras</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s2">&quot;--model&quot;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="c1">#required=True, </span>
<span class="n">help</span><span class="o">=</span><span class="s2">&quot;path to the model, or repository on HuggingFace Hub&quot;</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s2">&quot;--quant&quot;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s2">&quot;--quantization&quot;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">help</span><span class="o">=</span><span class="s2">&quot;for MLC, the type of quantization to apply (default q4f16_ft) For AWQ, the path to the quantized weights.&quot;</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s2">&quot;--api&quot;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">str</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">choices</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;auto_gptq&#39;</span><span class="p">,</span> <span class="s1">&#39;awq&#39;</span><span class="p">,</span> <span class="s1">&#39;hf&#39;</span><span class="p">,</span> <span class="s1">&#39;mlc&#39;</span><span class="p">],</span>
<span class="n">help</span><span class="o">=</span><span class="s2">&quot;specify the API to use (otherwise inferred)&quot;</span><span class="p">)</span>
Expand Down Expand Up @@ -258,13 +259,13 @@ <h1>Source code for nano_llm.utils.args</h1><div class="highlight"><pre>
<span class="nd">@staticmethod</span>
<span class="k">def</span> <span class="nf">parse_prompt_args</span><span class="p">(</span><span class="n">prompts</span><span class="p">,</span> <span class="n">chat</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Parse prompt command-line argument and return list of prompts</span>
<span class="sd"> It&#39;s assumed that the argparse argument was created like this:</span>
<span class="sd"> Parse prompt command-line argument and return list of prompts.</span>
<span class="sd"> It&#39;s assumed that the argparse argument was created like this::</span>
<span class="sd"> </span>
<span class="sd"> `parser.add_argument(&#39;--prompt&#39;, action=&#39;append&#39;, nargs=&#39;*&#39;)`</span>
<span class="sd"> parser.add_argument(&#39;--prompt&#39;, action=&#39;append&#39;, nargs=&#39;*&#39;)</span>
<span class="sd"> </span>
<span class="sd"> If the prompt text is &#39;default&#39;, then default chat prompts will</span>
<span class="sd"> be assigned if chat=True (otherwise default completion prompts)</span>
<span class="sd"> be assigned if ``chat=True`` (otherwise default completion prompts)</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">if</span> <span class="n">prompts</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
Expand Down
16 changes: 8 additions & 8 deletions _sources/install.md.txt
Original file line number Diff line number Diff line change
@@ -1,24 +1,23 @@
# Installation

To use the optimized API's like MLC and AWQ built with CUDA, the recommended installation method is by running the Docker container image built by [jetson-containers](https://github.com/dusty-nv/jetson-containers). First, clone that repo:
Having a complex set of dependencies, currently the recommended installation method is by running the Docker container image built by [jetson-containers](https://github.com/dusty-nv/jetson-containers). First, clone and install that repo:

```bash
git clone https://github.com/dusty-nv/jetson-containers
cd jetson-containers
pip3 install -r requirements.txt
bash jetson-containers/install.sh
```

Then you can start `nano_llm` container like this:
Then you can start the `nano_llm` container like this:

```bash
./run.sh $(./autotag nano_llm)
jetson-containers run $(autotag nano_llm)
```

This will automatically pull/run the container image compatible with your version of JetPack-L4T (e.g. `dustynv/nano_llm:r36.2.0` for JetPack 6.0 DP)
This will automatically pull/run the container image compatible with your version of JetPack-L4T (e.g. `dustynv/nano_llm:r36.2.0` for JetPack 6.0)

### Running Models

Once in the container, you should be able to `import nano_llm` in a Python3 interpreter, and run the various example commands shown on this page like:
Once in the container, you should be able to `import nano_llm` in a Python3 interpreter, and run the various example commands from the docs like:

```bash
python3 -m nano_llm.chat --model meta-llama/Llama-2-7b-chat-hf --api=mlc --quantization q4f16_ft
Expand All @@ -27,7 +26,8 @@ python3 -m nano_llm.chat --model meta-llama/Llama-2-7b-chat-hf --api=mlc --quant
Or you can run the container & chat command in one go like this:

```bash
./run.sh --env HUGGINGFACE_TOKEN=hf_abc123def \
jetson-containers run \
--env HUGGINGFACE_TOKEN=hf_abc123def \
$(./autotag nano_llm) \
python3 -m nano_llm.chat --api=mlc \
--model meta-llama/Llama-2-7b-chat-hf \
Expand Down
2 changes: 2 additions & 0 deletions genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,8 @@ <h2 id="E">E</h2>
</li>
</ul></li>
<li><a href="models.html#nano_llm.NanoLLM.embed_tokens">embed_tokens() (NanoLLM method)</a>
</li>
<li><a href="models.html#nano_llm.StreamingResponse.eos">eos (StreamingResponse property)</a>
</li>
<li><a href="agents.html#nano_llm.agents.video_query.VideoQuery.events">events (VideoQuery attribute)</a>
</li>
Expand Down
16 changes: 8 additions & 8 deletions install.html
Original file line number Diff line number Diff line change
Expand Up @@ -87,25 +87,25 @@

<section id="installation">
<h1>Installation<a class="headerlink" href="#installation" title="Link to this heading"></a></h1>
<p>To use the optimized API’s like MLC and AWQ built with CUDA, the recommended installation method is by running the Docker container image built by <a class="reference external" href="https://github.com/dusty-nv/jetson-containers">jetson-containers</a>. First, clone that repo:</p>
<p>Having a complex set of dependencies, currently the recommended installation method is by running the Docker container image built by <a class="reference external" href="https://github.com/dusty-nv/jetson-containers">jetson-containers</a>. First, clone and install that repo:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/dusty-nv/jetson-containers
<span class="nb">cd</span><span class="w"> </span>jetson-containers
pip3<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
bash<span class="w"> </span>jetson-containers/install.sh
</pre></div>
</div>
<p>Then you can start <code class="docutils literal notranslate"><span class="pre">nano_llm</span></code> container like this:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./run.sh<span class="w"> </span><span class="k">$(</span>./autotag<span class="w"> </span>nano_llm<span class="k">)</span>
<p>Then you can start the <code class="docutils literal notranslate"><span class="pre">nano_llm</span></code> container like this:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>jetson-containers<span class="w"> </span>run<span class="w"> </span><span class="k">$(</span>autotag<span class="w"> </span>nano_llm<span class="k">)</span>
</pre></div>
</div>
<p>This will automatically pull/run the container image compatible with your version of JetPack-L4T (e.g. <code class="docutils literal notranslate"><span class="pre">dustynv/nano_llm:r36.2.0</span></code> for JetPack 6.0 DP)</p>
<p>This will automatically pull/run the container image compatible with your version of JetPack-L4T (e.g. <code class="docutils literal notranslate"><span class="pre">dustynv/nano_llm:r36.2.0</span></code> for JetPack 6.0)</p>
<section id="running-models">
<h2>Running Models<a class="headerlink" href="#running-models" title="Link to this heading"></a></h2>
<p>Once in the container, you should be able to <code class="docutils literal notranslate"><span class="pre">import</span> <span class="pre">nano_llm</span></code> in a Python3 interpreter, and run the various example commands shown on this page like:</p>
<p>Once in the container, you should be able to <code class="docutils literal notranslate"><span class="pre">import</span> <span class="pre">nano_llm</span></code> in a Python3 interpreter, and run the various example commands from the docs like:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python3<span class="w"> </span>-m<span class="w"> </span>nano_llm.chat<span class="w"> </span>--model<span class="w"> </span>meta-llama/Llama-2-7b-chat-hf<span class="w"> </span>--api<span class="o">=</span>mlc<span class="w"> </span>--quantization<span class="w"> </span>q4f16_ft
</pre></div>
</div>
<p>Or you can run the container &amp; chat command in one go like this:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./run.sh<span class="w"> </span>--env<span class="w"> </span><span class="nv">HUGGINGFACE_TOKEN</span><span class="o">=</span>hf_abc123def<span class="w"> </span><span class="se">\</span>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>jetson-containers<span class="w"> </span>run<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--env<span class="w"> </span><span class="nv">HUGGINGFACE_TOKEN</span><span class="o">=</span>hf_abc123def<span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="k">$(</span>./autotag<span class="w"> </span>nano_llm<span class="k">)</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span>python3<span class="w"> </span>-m<span class="w"> </span>nano_llm.chat<span class="w"> </span>--api<span class="o">=</span>mlc<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--model<span class="w"> </span>meta-llama/Llama-2-7b-chat-hf<span class="w"> </span><span class="se">\</span>
Expand Down
Loading

0 comments on commit 091bb32

Please sign in to comment.