Skip to content

Commit

Permalink
Merge pull request #15 from lfnovo/transform_ui
Browse files Browse the repository at this point in the history
Transform UI
  • Loading branch information
lfnovo authored Nov 1, 2024
2 parents 53883aa + 69477b8 commit 34c3b64
Show file tree
Hide file tree
Showing 35 changed files with 805 additions and 721 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
prompts/patterns/user/
notebooks/
data/
.uploads/
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ In a world dominated by Artificial Intelligence, having the ability to think

Open Notebook empowers you to manage your research, generate AI-assisted notes, and interact with your content—on your terms.

Learn more about our project at [https://www.open-notebook.ai](https://www.open-notebook.ai)

## ⚙️ Setting Up

Go to the [Setup Guide](docs/SETUP.md) to learn how to set up the tool in details.
Expand Down
39 changes: 1 addition & 38 deletions docs/SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,44 +41,7 @@ volumes:
notebook_data:
```
or with the environment variables:
```yaml
version: '3'

services:
surrealdb:
image: surrealdb/surrealdb:v2
ports:
- "8000:8000"
volumes:
- surreal_data:/mydata
command: start --log trace --user root --pass root rocksdb:/mydata/mydatabase.db
pull_policy: always
user: root

open_notebook:
image: lfnovo/open_notebook:latest
ports:
- "8080:8502"
environment:
- OPENAI_API_KEY=API_KEY
- SURREAL_ADDRESSsurrealdb
- SURREAL_PORT=8000
- SURREAL_USER=root
- SURREAL_PASS=root
- SURREAL_NAMESPACE=open_notebook
- SURREAL_DATABASE=staging
depends_on:
- surrealdb
pull_policy: always
volumes:
- notebook_data:/app/data

volumes:
surreal_data:
notebook_data:
```
Take a look at the [Open Notebook Boilerplate](https://github.com/lfnovo/open-notebook-boilerplate) repo with a sample of how to set it up for maximum feature usability.
### 📦 Installing from Source
Expand Down
32 changes: 18 additions & 14 deletions docs/TRANSFORMATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,42 +24,47 @@ For example, you could start by summarizing a text, then use that summary to gen

### Setting Up Transformations

Take a look at the [Open Notebook Boilerplate](https://github.com/lfnovo/open-notebook-boilerplate) repo with a sample of how to set it up for maximum feature usability.

To set up your own Transformations, you'll define them in the `transformations.yaml` file. Below is an example setup:

```yaml
source_insights:
- name: "Summarize"
insight_type: "Content Summary"
description: "Summarize the content"
transformations:
- patterns/makeitdense
- patterns/summarize
patterns:
- patterns/default/makeitdense
- patterns/default/summarize
- name: "Key Insights"
insight_type: "Key Insights"
description: "Extracts a list of the Key Insights of the content"
transformations:
- patterns/keyinsights
patterns:
- patterns/default/keyinsights
- name: "Make it Dense"
insight_type: "Dense Representation"
description: "Create a dense representation of the content"
transformations:
- patterns/makeitdense
patterns:
- patterns/default/makeitdense
- name: "Analyze Paper"
insight_type: "Paper Analysis"
description: "Analyze the paper and provide a quick summary"
transformations:
- patterns/analyze_paper
patterns:
- patterns/default/analyze_paper
- name: "Reflection"
insight_type: "Reflection Questions"
description: "Generates a list of insightful questions to provoke reflection"
transformations:
- patterns/reflection_questions
patterns:
- patterns/default/reflection_questions
```
Once you've defined your transformation, make sure to add the corresponding prompts to the `prompts/patterns` folder. Here's an example of a transformation prompt:
You can mount this file to the docker image to replace its default value.
Once you've defined your transformation, make sure to add the corresponding prompts to the `prompts/patterns/user` folder. Here's an example of a transformation prompt:

```jinja
{% include 'patterns/common_text.jinja' %}
{% include 'patterns/user/common_text.jinja' %}
# IDENTITY and PURPOSE
Expand Down Expand Up @@ -95,7 +100,6 @@ You extract deep, thought-provoking, and meaningful reflections from text conten
- Any item that doesn't follow the `patterns/` format will be interpreted as a command (refer to `command.jinja` for clarity).



### Call for Contributions

Have an idea for an amazing Transformation? We'd love to see your creativity! Please submit a pull request with your favorite transformations to help expand our library. Whether it's summarization, content analysis, or something entirely unique, your contributions will help us all get more out of our research!Leveraging Transformations in Open Notebook
Expand Down
26 changes: 14 additions & 12 deletions open_notebook/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,20 @@
os.makedirs(PODCASTS_FOLDER, exist_ok=True)


DEFAULT_MODELS = DefaultModels.load()

if DEFAULT_MODELS.default_embedding_model:
EMBEDDING_MODEL = get_model(
DEFAULT_MODELS.default_embedding_model, model_type="embedding"
def load_default_models():
default_models = DefaultModels.load()
embedding_model = (
get_model(default_models.default_embedding_model, model_type="embedding")
if default_models.default_embedding_model
else None
)
else:
EMBEDDING_MODEL = None

if DEFAULT_MODELS.default_speech_to_text_model:
SPEECH_TO_TEXT_MODEL = get_model(
DEFAULT_MODELS.default_speech_to_text_model, model_type="speech_to_text"
speech_to_text_model = (
get_model(
default_models.default_speech_to_text_model, model_type="speech_to_text"
)
if default_models.default_speech_to_text_model
else None
)
else:
SPEECH_TO_TEXT_MODEL = None

return default_models, embedding_model, speech_to_text_model
9 changes: 6 additions & 3 deletions open_notebook/domain/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,24 +68,27 @@ def get_embedding_content(self) -> Optional[str]:
return None

def save(self) -> None:
from open_notebook.config import EMBEDDING_MODEL
from open_notebook.config import load_default_models

DEFAULT_MODELS, EMBEDDING_MODEL, SPEECH_TO_TEXT_MODEL = load_default_models()

try:
logger.debug(f"Validating {self.__class__.__name__}")
self.model_validate(self.model_dump(), strict=True)
data = self._prepare_save_data()
data["updated"] = datetime.now().isoformat()
data["updated"] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

if self.needs_embedding():
embedding_content = self.get_embedding_content()
if embedding_content:
data["embedding"] = EMBEDDING_MODEL.embed(embedding_content)

if self.id is None:
data["created"] = datetime.now().isoformat()
data["created"] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
logger.debug("Creating new record")
repo_result = repo_create(self.__class__.table_name, data)
else:
data["created"] = self.created.strftime("%Y-%m-%d %H:%M:%S")
logger.debug(f"Updating record with id {self.id}")
repo_result = repo_update(self.id, data)

Expand Down
8 changes: 7 additions & 1 deletion open_notebook/domain/notebook.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from loguru import logger
from pydantic import BaseModel, Field, field_validator

from open_notebook.config import EMBEDDING_MODEL
from open_notebook.config import load_default_models
from open_notebook.database.repository import (
repo_create,
repo_query,
Expand Down Expand Up @@ -140,6 +140,8 @@ def save_chunks(self, text: str) -> None:
raise DatabaseOperationError(e)

def vectorize(self) -> None:
DEFAULT_MODELS, EMBEDDING_MODEL, SPEECH_TO_TEXT_MODEL = load_default_models()

try:
if not self.full_text:
return
Expand Down Expand Up @@ -189,6 +191,8 @@ def search(cls, query: str) -> List[Dict[str, Any]]:
raise DatabaseOperationError("Failed to search sources")

def add_insight(self, insight_type: str, content: str) -> Any:
DEFAULT_MODELS, EMBEDDING_MODEL, SPEECH_TO_TEXT_MODEL = load_default_models()

if not insight_type or not content:
raise InvalidInputError("Insight type and content must be provided")
try:
Expand All @@ -209,6 +213,8 @@ def add_insight(self, insight_type: str, content: str) -> Any:

# todo: move this to content processing pipeline as a major graph
def generate_toc_and_title(self) -> "Source":
DEFAULT_MODELS, EMBEDDING_MODEL, SPEECH_TO_TEXT_MODEL = load_default_models()

try:
config = RunnableConfig(configurable=dict(thread_id=self.id))
result = toc_graph.invoke({"content": self.full_text}, config=config)
Expand Down
10 changes: 6 additions & 4 deletions open_notebook/graphs/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@
from langgraph.graph.message import add_messages
from typing_extensions import TypedDict

from open_notebook.config import DEFAULT_MODELS, LANGGRAPH_CHECKPOINT_FILE
from open_notebook.config import LANGGRAPH_CHECKPOINT_FILE, load_default_models
from open_notebook.domain.notebook import Notebook
from open_notebook.graphs.utils import run_pattern

DEFAULT_MODELS, EMBEDDING_MODEL, SPEECH_TO_TEXT_MODEL = load_default_models()


class ThreadState(TypedDict):
messages: Annotated[list, add_messages]
Expand All @@ -22,12 +24,12 @@ class ThreadState(TypedDict):


def call_model_with_messages(state: ThreadState, config: RunnableConfig) -> dict:
model_name = config.get("configurable", {}).get(
"model_name", DEFAULT_MODELS.default_chat_model
model_id = config.get("configurable", {}).get(
"model_id", DEFAULT_MODELS.default_chat_model
)
ai_message = run_pattern(
"chat",
model_name,
model_id,
messages=state["messages"],
state=state,
)
Expand Down
4 changes: 3 additions & 1 deletion open_notebook/graphs/content_processing/audio.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from loguru import logger
from pydub import AudioSegment

from open_notebook.config import SPEECH_TO_TEXT_MODEL
from open_notebook.config import load_default_models
from open_notebook.graphs.content_processing.state import SourceState

# future: parallelize the transcription process
Expand Down Expand Up @@ -72,6 +72,8 @@ def split_audio(input_file, segment_length_minutes=15, output_prefix=None):


def extract_audio(data: SourceState):
DEFAULT_MODELS, EMBEDDING_MODEL, SPEECH_TO_TEXT_MODEL = load_default_models()

input_audio_path = data.get("file_path")
audio_files = []

Expand Down
11 changes: 6 additions & 5 deletions open_notebook/graphs/doc_query.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import os

from langchain_core.runnables import (
RunnableConfig,
)
from langgraph.graph import END, START, StateGraph
from typing_extensions import TypedDict

from open_notebook.config import load_default_models
from open_notebook.domain.notebook import Note, Notebook, Source
from open_notebook.graphs.utils import run_pattern

DEFAULT_MODELS, EMBEDDING_MODEL, SPEECH_TO_TEXT_MODEL = load_default_models()


class DocQueryState(TypedDict):
doc_id: str
Expand All @@ -19,10 +20,10 @@ class DocQueryState(TypedDict):


def call_model(state: dict, config: RunnableConfig) -> dict:
model_name = config.get("configurable", {}).get(
"model_name", os.environ.get("RETRIEVAL_MODEL")
model_id = config.get("configurable", {}).get(
"model_id", DEFAULT_MODELS.default_transformation_model
)
return {"answer": run_pattern("doc_query", model_name, state)}
return {"answer": run_pattern("doc_query", model_id, state)}


# todo: there is probably a better way to do this and avoid repetition
Expand Down
22 changes: 12 additions & 10 deletions open_notebook/graphs/multipattern.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,48 +7,50 @@
from langgraph.graph import END, START, StateGraph
from typing_extensions import Annotated, TypedDict

from open_notebook.config import DEFAULT_MODELS
from open_notebook.config import load_default_models
from open_notebook.graphs.utils import run_pattern

DEFAULT_MODELS, EMBEDDING_MODEL, SPEECH_TO_TEXT_MODEL = load_default_models()


class PatternChainState(TypedDict):
content_stack: Annotated[Sequence[str], operator.add]
transformations: List[str]
patterns: List[str]
output: str


def call_model(state: dict, config: RunnableConfig) -> dict:
model_name = config.get("configurable", {}).get(
"model_name", DEFAULT_MODELS.default_transformation_model
model_id = config.get("configurable", {}).get(
"model_id", DEFAULT_MODELS.default_transformation_model
)
transformations = state["transformations"]
current_transformation = transformations.pop(0)
patterns = state["patterns"]
current_transformation = patterns.pop(0)
if current_transformation.startswith("patterns/"):
input_args = {"input_text": state["content_stack"][-1]}
else:
input_args = {
"input_text": state["content_stack"][-1],
"command": current_transformation,
}
current_transformation = "patterns/custom"
current_transformation = "patterns/default/command"

transformation_result = run_pattern(
pattern_name=current_transformation,
model_name=model_name,
model_id=model_id,
state=input_args,
)
return {
"content_stack": [transformation_result.content],
"output": transformation_result.content,
"transformations": state["transformations"],
"patterns": state["patterns"],
}


def transform_condition(state: PatternChainState) -> Literal["agent", END]: # type: ignore
"""
Checks whether there are more chunks to process.
"""
if len(state["transformations"]) > 0:
if len(state["patterns"]) > 0:
return "agent"
return END

Expand Down
10 changes: 6 additions & 4 deletions open_notebook/graphs/pattern.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@
from langgraph.graph import END, START, StateGraph
from typing_extensions import TypedDict

from open_notebook.config import DEFAULT_MODELS
from open_notebook.config import load_default_models
from open_notebook.graphs.utils import run_pattern

DEFAULT_MODELS, EMBEDDING_MODEL, SPEECH_TO_TEXT_MODEL = load_default_models()


class PatternState(TypedDict):
input_text: str
Expand All @@ -15,13 +17,13 @@ class PatternState(TypedDict):


def call_model(state: dict, config: RunnableConfig) -> dict:
model_name = config.get("configurable", {}).get(
"model_name", DEFAULT_MODELS.default_transformation_model
model_id = config.get("configurable", {}).get(
"model_id", DEFAULT_MODELS.default_transformation_model
)
return {
"output": run_pattern(
pattern_name=state["pattern"],
model_name=model_name,
model_id=model_id,
state=state,
)
}
Expand Down
Loading

0 comments on commit 34c3b64

Please sign in to comment.