add 5 files for deep research function#29
add 5 files for deep research function#29genglongling wants to merge 8 commits intoHazyResearch:mainfrom
Conversation
danbider
left a comment
There was a problem hiding this comment.
great work here!
like I asked in other PRs, can you explain the logic of your deep research model here? what are its advantages and disadvantages? what do you use local versus remote LLMs for?
| @@ -0,0 +1,455 @@ | |||
| { | |||
There was a problem hiding this comment.
can you explain the goal of this notebook? does it show important features of the system?
There was a problem hiding this comment.
The workflow of deepresearch version of Minion/Minions is extended based on the original Minion/Minions version. Here is the full workflow:
- [NEW] real-time search: pass this query to serpAPI to get the top-k (k=5 by default) links, use firecrawl.dev to pull the text from each of these links, and the key is to concatenate the searching results with the 'context' variable (this ensures the minimal modification for the program)
- Decompose: pass summarize back to the remote model
- Execute: pass the text to the local model to summarize,
- Aggregate: remote model then decides if it has enough information to answer the query, or if it needs to get more information (in which case it sends up another retrieval query) -- "show me how to use anthropic's MCP protocol"
pass this query to serpAPI
There was a problem hiding this comment.
The main advantages of MinionDeepResearch/MinionsDeepResearch, compared to original Minion/Minions is that:
- real-time information: providing a more comprehensive dataset rather than pre-trained LLM,
- Less computation cost (anticipated): less remote model access reduces budget, but more experiment is needed later.
- Higher accuracy (anticipated): minions have 97.9% of the accuracy of remote-only solutions while costing just 17.5% as much, minionsDeepSearch will further improve accuracy, but more experiment is needed as well.
There was a problem hiding this comment.
MinionsDeepResearch.ipynb provides:
- an interface for Minion/Minions deep research version, adding "max_urls" compared to previous version: output = protocol(
task=task,
doc_metadata=doc_metadata,
context=[context],
max_urls=5, # you can adjust rounds as needed for testing
max_rounds=5, # you can adjust rounds as needed for testing
) - two use cases: with and without context information.
- users could check the 'correct' output for searching, local, and remote model after running.
Please check files in the following: