Skip to content

add 5 files for deep research function#29

Open
genglongling wants to merge 8 commits intoHazyResearch:mainfrom
genglongling:main
Open

add 5 files for deep research function#29
genglongling wants to merge 8 commits intoHazyResearch:mainfrom
genglongling:main

Conversation

@genglongling
Copy link

@genglongling genglongling commented Mar 21, 2025

Please check files in the following:

  1. minions_deepreearch.ipynb
  2. minions/minions_deepresearch.py
  3. minions/minion_deepresearch.py
  4. utils/serp_util.py
  5. utils/firecrawl_util.py
  6. README

Copy link
Collaborator

@danbider danbider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work here!

like I asked in other PRs, can you explain the logic of your deep research model here? what are its advantages and disadvantages? what do you use local versus remote LLMs for?

@@ -0,0 +1,455 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain the goal of this notebook? does it show important features of the system?

Copy link
Author

@genglongling genglongling Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow of deepresearch version of Minion/Minions is extended based on the original Minion/Minions version. Here is the full workflow:

  1. [NEW] real-time search: pass this query to serpAPI to get the top-k (k=5 by default) links, use firecrawl.dev to pull the text from each of these links, and the key is to concatenate the searching results with the 'context' variable (this ensures the minimal modification for the program)
  2. Decompose: pass summarize back to the remote model
  3. Execute: pass the text to the local model to summarize,
  4. Aggregate: remote model then decides if it has enough information to answer the query, or if it needs to get more information (in which case it sends up another retrieval query) -- "show me how to use anthropic's MCP protocol"
    pass this query to serpAPI

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main advantages of MinionDeepResearch/MinionsDeepResearch, compared to original Minion/Minions is that:

  1. real-time information: providing a more comprehensive dataset rather than pre-trained LLM,
  2. Less computation cost (anticipated): less remote model access reduces budget, but more experiment is needed later.
  3. Higher accuracy (anticipated): minions have 97.9% of the accuracy of remote-only solutions while costing just 17.5% as much, minionsDeepSearch will further improve accuracy, but more experiment is needed as well.

Copy link
Author

@genglongling genglongling Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MinionsDeepResearch.ipynb provides:

  1. an interface for Minion/Minions deep research version, adding "max_urls" compared to previous version: output = protocol(
    task=task,
    doc_metadata=doc_metadata,
    context=[context],
    max_urls=5, # you can adjust rounds as needed for testing
    max_rounds=5, # you can adjust rounds as needed for testing
    )
  2. two use cases: with and without context information.
  3. users could check the 'correct' output for searching, local, and remote model after running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants