Repo restructure + eval (#2)

* repo structure * adding unit tests * andrew repo rebase * repo structure * adding unit tests * andrew repo rebase * Dev (#3) * cutting costs * cutting costs * added docstrings and typehints * Create utils.py * Delete poetry.lock * eval first run * formatting * eval2 * small fixes * add .env.sample for people to use as a template * spanish and german flores200 outputs * repo restructure * deepl scores * updated readme * added ruff for PR formatting * changed model to gpt-4o * adding wmt script and the batch articles * nllb benchmarks + gradio app start * adding updated prompts - single chunk only * finished google translate, working on gradio * adding short sample text * french, korean, japanes, and mandarin scores * updated promppts, comet_eval script and japanese agent translation * updated prompts * added random generation to gradio app * adding minor changes * moved gradio app functions to utils file * gradio app works with regeneration and model anonymity * subj benchmark run ready * adding the batch en * adding wsj source text * final changes? * updated readme * fixed example script, updated readme --------- Co-authored-by: John Santerre <[email protected]> Co-authored-by: Kevin Solorio <[email protected]> Co-authored-by: Nedelina <[email protected]> * cleanup --------- Co-authored-by: John Santerre <[email protected]> Co-authored-by: Kevin Solorio <[email protected]> Co-authored-by: Nedelina <[email protected]>
andrewyng · Jun 1, 2024 · 42ec493 · 42ec493
1 parent 59631fb
commit 42ec493
Show file tree

Hide file tree

Showing 78 changed files with 20,851 additions and 475 deletions.
diff --git a/.env.sample b/.env.sample
@@ -0,0 +1 @@
+OPENAI_API_KEY=""
diff --git a/.gitignore b/.gitignore
@@ -1 +1,8 @@
 cache_dir
+.env
+.venv
+__pycache__
+poetry.lock
+floresp-v2.0-rc.3
+*cache
+wmt
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,23 @@
+repos:
+ - repo: https://github.com/pre-commit/pre-commit-hooks
+ rev: v4.5.0
+ hooks:
+ - id: trailing-whitespace
+ exclude: tests
+ - id: end-of-file-fixer
+ - id: check-merge-conflict
+ - id: check-case-conflict
+ - id: check-json
+ - id: check-toml
+ exclude: tests/fixtures/invalid_lock/poetry\.lock
+ - id: check-yaml
+ - id: pretty-format-json
+ args: [--autofix, --no-ensure-ascii, --no-sort-keys]
+ - id: check-ast
+ - id: debug-statements
+ - id: check-docstring-first
+ - repo: https://github.com/astral-sh/ruff-pre-commit
+ rev: v0.3.5
+ hooks:
+ - id: ruff
+ - id: ruff-format
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,9 @@
+MIT License
+
+Copyright (c) 2024 Andrew Ng
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/README.md b/README.md
@@ -1,20 +1,46 @@
-# Agentic translation using reflection workflow 
+# Translation Agent: Agentic translation using reflection workflow
 
-Usage: 
+Translation Agent is a Python-based project that leverages an agentic workflow for machine translation tasks. The repository contains code that utilizes the power of Reflection to enhance the translation process and improve the quality of the generated translations.
 
-Download spaCy (natural language processing package)'s English model:
+## Features
+
+- Agentic Workflow: Translation Agent employs an agentic workflow, which allows for a more intelligent and context-aware approach to machine translation. By incorporating Reflection, the system can analyze and understand the source text more effectively, resulting in more accurate and fluent translations.
+- Reflection-based Translation: The core of Translation Agent lies in its use of Reflection, a technique that enables the system to introspect and reason about its own translation process. By reflecting on the intermediate steps and considering the context and meaning of the source text, the system can make informed decisions and generate translations that better capture the intended meaning.
+- Language Support: Translation Agent supports a wide range of languages, making it a versatile tool for translating text across different linguistic boundaries. Whether you need to translate between commonly spoken languages or handle less-resourced language pairs, Translation Agent has you covered.
+- Customizable Models: The repository provides a flexible framework that allows you to customize and fine-tune the translation models according to your specific requirements. You can experiment with different architectures, training data, and hyperparameters to optimize the translation quality for your use case.
+- Easy Integration: Translation Agent is designed to be easily integrated into existing projects and workflows. With a simple and intuitive API, you can seamlessly incorporate machine translation capabilities into your applications, websites, or data pipelines.
+
+## Getting Started
+
+To get started with Translation Agent, follow these steps:
+
+### Installation:
 
 ```bash
-python -m spacy download en_core_web_sm
+git clone https://github.com/andrewyng/translation-agent.git
 ```
 
-Use as follows: 
+- Poetry package manager is required (and recommended)
+- A .env file with a OPENAI_API_KEY is required to run the workflow. See the .env.sample file as an example.
+
+Once you are in the repository directory:
 
 ```python
-import translation_agent as ta 
+poetry install
+```
 
-source_lang, target_lang = "English", "Spanish"
+### Usage:
 
-translation = ta.translate(source_lang, target_lang, source_text) 
+```python
+import translation_agent as ta
+
+source_lang, target_lang, country = "English", "Spanish", "Mexico"
+
+translation = ta.translate(source_lang, target_lang, source_text, country)
 ```
 
+See examples/example_script.py for an example script to try out.
+
+## License
+
+Translation Agent is released under the **MIT License**. You are free to use, modify, and distribute the code for both commercial and non-commercial purposes.
diff --git a/eval/ab_dataset_gen.py b/eval/ab_dataset_gen.py
@@ -0,0 +1,81 @@
+import json
+
+import translation_agent as ta
+from dotenv import load_dotenv
+from google.cloud import translate
+
+
+load_dotenv()
+
+
+def translate_text(
+ text: str = "YOUR_TEXT_TO_TRANSLATE",
+ project_id: str = "YOUR_PROJECT_ID",
+ source_lang: str = "en-US",
+ target_lang: str = "es",
+) -> translate.TranslationServiceClient:
+ """Translating Text."""
+
+ client = translate.TranslationServiceClient()
+
+ location = "global"
+
+ parent = f"projects/{project_id}/locations/{location}"
+
+ # Detail on supported types can be found here:
+ # https://cloud.google.com/translate/docs/supported-formats
+ response = client.translate_text(
+ request={
+ "parent": parent,
+ "contents": [text],
+ "mime_type": "text/plain", # mime types: text/plain, text/html
+ "source_language_code": source_lang,
+ "target_language_code": target_lang,
+ }
+ )
+
+ # Display the translation for each input text provided
+ return response.translations[0].translated_text
+
+
+if __name__ == "__main__":
+ source_lang, target_lang, country = "English", "Chinese", "China"
+ with open("./sample-texts/google_en_spa_flores_sample.json") as f:
+ data = json.load(f)
+
+ translations_agents = []
+ translations_google = []
+ for entry in data:
+ print(f"Source text:\n\n{entry["source_txt"]}\n------------\n")
+ translation_google = translate_text(
+ text=entry["source_txt"],
+ project_id="santerre",
+ source_lang="en-US",
+ target_lang="zh",
+ )
+
+ translation_agents = ta.translate(
+ source_lang=source_lang,
+ target_lang=target_lang,
+ source_text=entry["source_txt"],
+ country=country,
+ )
+
+ new_dict_google = {
+ "source_txt": entry["source_txt"],
+ "translation": translation_google,
+ }
+
+ new_dict_agents = {
+ "source_txt": entry["source_txt"],
+ "translation": translation_agents,
+ }
+
+ translations_google.append(new_dict_google)
+ translations_agents.append(new_dict_agents)
+
+ with open("google_en_man_flores_sample.json", "w") as ts:
+ json.dump(translations_google, ts)
+
+ with open("gpt4_en_man_flores_sample.json", "w") as ta:
+ json.dump(translations_agents, ta)
diff --git a/eval/app/app.py b/eval/app/app.py
@@ -0,0 +1,175 @@
+import random
+
+import gradio as gr
+from dotenv import load_dotenv
+from utils import change_language
+from utils import css
+from utils import gen_random
+from utils import get_model_description_md
+from utils import google_language_dict
+from utils import gpt4_language_dict
+from utils import models
+from utils import regen
+from utils import write_answer
+
+
+load_dotenv()
+
+
+with gr.Blocks(
+ title="Translation Arena",
+ theme=gr.themes.Soft(secondary_hue=gr.themes.colors.sky),
+ css=css,
+) as demo:
+ num_sides = 2
+ states = [gr.State() for _ in range(num_sides)]
+ chatbots = [None] * num_sides
+ intro_num = gen_random()
+ idx_random = gr.State(value=intro_num, render=True)
+ model_a = random.choice(models)
+ txtbox1_model = gr.State(value=model_a, render=True)
+ print(idx_random.value)
+ gr.Markdown(
+ "# Translation Model Arena\n\nCompare and evaluate the translations of multiple models."
+ )
+ with gr.Row():
+ with gr.Column(scale=1):
+ with gr.Accordion(
+ "Language", open=True, elem_id="language_box"
+ ) as lang_row:
+ lang = gr.Dropdown(
+ label="Choose Preferred Language",
+ choices=["Spanish", "Bulgarian", "Chinese"],
+ value="Spanish",
+ interactive=True,
+ )
+ with gr.Tabs() as tabs:
+ with gr.Tab("Text Arena", id=0):
+ with gr.Tab("⚔️ Arena (battle)", id=0):
+ with gr.Group(elem_id="share-region-annoy"):
+ with gr.Column(scale=4):
+ chosen_lang = lang.value
+ print(chosen_lang)
+ source_txtbox = gr.Textbox(
+ label="Source Text",
+ value=gpt4_language_dict[chosen_lang][
+ idx_random.value
+ ]["source_txt"],
+ show_copy_button=True,
+ )
+
+ with gr.Group(elem_id="share-region-annoy"):
+ with gr.Accordion(
+ "🔍 Expand to see the models", open=False
+ ):
+ all_models = [
+ {"name": "gpt-4-turbo"},
+ {"name": "gpt-4-turbo (w/ Agents)"},
+ {"name": "Google Translate"},
+ {"name": "DeepL"},
+ {"name": "NLLB-200 (3.3B)"},
+ ]
+ models = ["gpt-4-turbo", "google-translate"]
+ model_description_md = get_model_description_md(models)
+ gr.Markdown(
+ model_description_md,
+ elem_id="model_description_markdown",
+ )
+ with gr.Row():
+ with gr.Column():
+ label = "Model A"
+ print(txtbox1_model.value)
+ match txtbox1_model.value:
+ case "gpt-4-turbo":
+ init_value_a = gpt4_language_dict[
+ chosen_lang
+ ][idx_random.value]["translation"]
+
+ init_value_b = google_language_dict[
+ chosen_lang
+ ][idx_random.value]["translation"]
+
+ case "google-translate":
+ init_value_a = google_language_dict[
+ chosen_lang
+ ][idx_random.value]["translation"]
+
+ init_value_b = gpt4_language_dict[
+ chosen_lang
+ ][idx_random.value]["translation"]
+ case _:
+ pass
+
+ txtbox1 = gr.Textbox(
+ label=label,
+ elem_id="chatbot",
+ value=init_value_a,
+ show_copy_button=True,
+ )
+
+ with gr.Column():
+ label = "Model B"
+ txtbox2 = gr.Textbox(
+ label=label,
+ elem_id="chatbot",
+ value=init_value_b,
+ show_copy_button=True,
+ )
+ lang.change(
+ fn=change_language,
+ inputs=[txtbox1, txtbox2, source_txtbox, lang],
+ outputs=[
+ txtbox1,
+ txtbox2,
+ source_txtbox,
+ ],
+ )
+ with gr.Row() as button_row:
+ a_better = gr.Button(
+ value="👈 A is better",
+ interactive=True,
+ )
+ a_better.click(
+ fn=write_answer, inputs=[a_better, txtbox1_model]
+ )
+
+ b_better = gr.Button(
+ value="👉 B is better",
+ interactive=True,
+ )
+ b_better.click(
+ fn=write_answer, inputs=[b_better, txtbox1_model]
+ )
+
+ tie_btn = gr.Button(
+ value="🤝 Tie", visible=True, interactive=True
+ )
+
+ tie_btn.click(
+ fn=write_answer, inputs=[tie_btn, txtbox1_model]
+ )
+
+ bothbad_btn = gr.Button(
+ value="👎 Both are bad",
+ visible=True,
+ interactive=True,
+ )
+
+ bothbad_btn.click(
+ fn=write_answer, inputs=[bothbad_btn, txtbox1_model]
+ )
+
+ with gr.Row():
+ regenerate_btn = gr.Button(
+ value="Regenerate", visible=True, interactive=True
+ )
+
+ regenerate_btn.click(
+ fn=regen,
+ inputs=[lang, txtbox1_model],
+ outputs=[txtbox1, txtbox2, source_txtbox, txtbox1_model],
+ )
+
+if __name__ == "__main__":
+ demo.queue(default_concurrency_limit=10)
+ demo.launch(share=True)
diff --git a/eval/app/data_points_samples.json b/eval/app/data_points_samples.json
@@ -0,0 +1 @@
+[{"text": "Paid ChatGPT users can now upload files directly from Google Drive and Microsoft OneDrive, interact with tables and charts using natural language, and customize charts for presentations. When users upload or import a data file, ChatGPT can now write and execute Python code to analyze or visualize that data on users\u2019 behalf. These features may make it easier for those with limited coding skills to conduct in-depth analyses and let experts save time on routine data tasks."}, {"text": "Reddit\u2019s vast forums will be used to power ChatGPT and other AI products. The collaboration will give Reddit new AI-powered features for its users and moderators, while OpenAI will advertise on Reddit. (Full terms were undisclosed.) OpenAI now has deals with global newspapers, software forums, and a wide variety of other publishers, giving it special access to timely and high-quality training material."}, {"text": "ZeroGPU is accessible through Hugging Face\u2019s Spaces platform, which already hosts over 300,000 AI demos. The shared Nvidia A100s can be used concurrently by multiple users or applications; unutilized capacity will be made available to others. HuggingFace\u2019s goal is to counter tech giants and closed models\u2019 centralization by making state-of-the-art AI technologies more accessible."}, {"text": "Chameleon can natively process both text and images together, allowing it to perform a wide range of mixed-modal tasks with impressive results. Meta\u2019s researchers say the key is Chameleon\u2019s fully token-based architecture (representing images as well as texts as tokens) and training on datasets that combine text with images. Chameleon outperforms many leading and specialized models (including GPT-4V and Gemini Pro) when answering questions about images, describing pictures, writing relevant text, and creating images from text prompts.\u00a0"}, {"text": "Google\u2019s AI-assisted, browser-based integrated development environment (IDE) offers now-familiar features like code completion, debugging tools, and a chat-assisted sidebar, all powered by Gemini. Whenever IDX modifies snippets or suggests new code, it also links back to the original source and its associated license, ensuring proper attribution. Although Google is entering a competitive market, IDX aims to attract developers by showcasing Gemini\u2019s AI advancements and integrating with the company\u2019s cloud services."}, {"text": "The tool aims to solve new users\u2019 \u201cblank page problem\u201d by providing a starting point for testing and iteration, incorporating best practices like chain of thought and separating data from instructions. Users can access the prompt generator directly on the Console or analyze the underlying prompt and architecture using a Google Colab notebook. The generator addresses a common challenge for AI users: efficiently crafting effective (and often larger and more complex) prompts that yield high-quality results."}, {"text": "ElevenLabs Reader: AI Audio is the billion-dollar AI voice cloning startup\u2019s first consumer app. The free app can read web pages, PDFs, and other documents aloud using a selection of 11 AI-generated voices. The app marks ElevenLabs\u2019 expansion into the broader AI voice market beyond its current focus on entertainment and media production."}, {"text": "Microsoft reportedly asked hundreds of its China-based employees working on cloud computing and AI to consider relocating to other countries. One source said Microsoft offered 700 to 800 Chinese engineers the opportunity to transfer to the U.S., Ireland, Australia, or New Zealand. The move comes as the U.S. government tightens restrictions on China\u2019s access to advanced technology, citing concerns over potential military applications and cybersecurity threats."}, {"text": "Abu Dhabi\u2019s Technology Innovation Institute released Falcon 2, a family of large language models that includes Falcon 2 11B and Falcon 2 11B VLM. The latter is the institute\u2019s first multimodal model, capable of converting visual inputs into textual outputs. Both models are Apache 2.0 open-source, multilingual, and perform on par with Gemma 7B and better than Llama 3 8B according to benchmarks and HuggingFace leaderboards."}]