Skip to content

Commit

Permalink
upload data and code
Browse files Browse the repository at this point in the history
  • Loading branch information
iseesaw committed Jul 12, 2024
1 parent 942fde6 commit 959bb10
Show file tree
Hide file tree
Showing 15 changed files with 4,184 additions and 0 deletions.
Binary file added assert/multi-agent.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
802 changes: 802 additions & 0 deletions data/gpt-3.5/test_seen.json

Large diffs are not rendered by default.

802 changes: 802 additions & 0 deletions data/gpt-3.5/test_unseen.json

Large diffs are not rendered by default.

762 changes: 762 additions & 0 deletions data/gpt-4/test_seen.json

Large diffs are not rendered by default.

790 changes: 790 additions & 0 deletions data/gpt-4/test_unseen.json

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1 +1,86 @@
# LLM4BioHypoGen

This repository houses the datasets and code used in our COLM 2024 paper titled "Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation."

> An earlier version of this work was accepted at the NeurIPS 2023 Workshop and is available as a preprint at [Large Language Models are Zero Shot Hypothesis Proposers](https://arxiv.org/abs/2311.05965).
## Data

The dataset includes both "seen" and "unseen" splits, generated by GPT-3.5-turbo and GPT-4. Each subset of the data is organized under the `data` directory as follows:

```bash
- data
- gpt-3.5
- test_seen.json
- test_unseen.json
- gpt-4
- test_seen.json
- test_unseen.json
```

Each element in these files comprises background and hypothesis sentences extracted from the literature using GPT-3.5-turbo or GPT-4.

## Code

The provided code implements our novel multi-agent framework. This framework simulates a collaborative scientific environment, with each agent assigned a specific role. These agents work together in a symbiotic and iterative manner to develop hypotheses that are both innovative and anchored in current scientific understanding. Our approach aims to emulate the dynamics of scientific inquiry, thereby fostering the creation of hypotheses that are scientifically valid and forward-looking. The framework is structured into five key components, including four automated agents and an optional human interaction loop, as shown below:

![](./assert/multi-agent.jpg)

### Requirements

To install the necessary libraries, run the following command:

```bash
pip install langchain srsly
```

### Running the Code

- To run the multi-agent framework without additional tools, execute:

```bash
cd src
python run_multi_agent_wo_tool.py
```

- To run the multi-agent framework with integrated tools, use:

```bash
cd src
python run_multi_agent_wo_tool.py
```

Ensure that the correct Python file name is used in the second instance if it differs from the first.



## Citation

- COLM 2024 version

```tex
@inproceedings{
qi2024large,
title={Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation},
author={Biqing Qi and Kaiyan Zhang and Kai Tian and Haoxiang Li and Zhang-Ren Chen and Sihang Zeng and Ermo Hua and Hu Jinfang and Bowen Zhou},
booktitle={First Conference on Language Modeling},
year={2024},
url={https://openreview.net/forum?id=q36rpGlG9X}
}
```

- NeurIPS 2023 Workshop version

```tex
@inproceedings{
qi2023large,
title={Large Language Models are Zero Shot Hypothesis Proposers},
author={Biqing Qi and Kaiyan Zhang and Haoxiang Li and Kai Tian and Sihang Zeng and Zhang-Ren Chen and Bowen Zhou},
booktitle={NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following},
year={2023},
url={https://openreview.net/forum?id=EAuteBjTMw}
}
```



54 changes: 54 additions & 0 deletions src/agents/dialogue_agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date : 2023-09-11 17:35:59
# @Author : Kaiyan Zhang ([email protected])
# @Link : https://github.com/iseesaw

from typing import List, Dict, Callable
from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import (
AIMessage,
HumanMessage,
SystemMessage,
BaseMessage,
)

class DialogueAgent:
def __init__(
self,
name: str,
system_message: SystemMessage,
model: ChatOpenAI,
) -> None:
self.name = name
self.system_message = system_message
self.model = model
self.prefix = f"{self.name}: "
self.reset()

def reset(self):
self.message_history = ["Here is the conversation so far."]

def send(self) -> str:
"""
Applies the chatmodel to the message history
and returns the message string
"""
message = self.model(
[
self.system_message,
HumanMessage(content="\n".join(self.message_history + [self.prefix])),
]
)
return message.content

def receive(self, name: str, message: str) -> None:
"""
Concatenates {message} spoken by {name} into message history
"""
self.message_history.append(f"{name}: {message}")

63 changes: 63 additions & 0 deletions src/agents/dialogue_agent_with_tool.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date : 2023-09-11 17:37:00
# @Author : Kaiyan Zhang ([email protected])
# @Link : https://github.com/iseesaw

from typing import List, Dict, Callable
from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import (
AIMessage,
HumanMessage,
SystemMessage,
BaseMessage,
)
from langchain.agents import Tool
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.agents import load_tools

from src.agents.dialogue_agent import DialogueAgent

class DialogueAgentWithTools(DialogueAgent):
def __init__(
self,
name: str,
system_message: SystemMessage,
model: ChatOpenAI,
# tool_names: List[str],
tools,
**tool_kwargs,
) -> None:
super().__init__(name, system_message, model)
self.tools = tools #load_tools(tool_names, **tool_kwargs)

def send(self) -> str:
"""
Applies the chatmodel to the message history
and returns the message string
"""
agent_chain = initialize_agent(
self.tools,
self.model,
agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
verbose=True,
memory=ConversationBufferMemory(
memory_key="chat_history", return_messages=True
),
handle_parsing_errors=True
)
message = AIMessage(
content=agent_chain.run(
input="\n".join(
[self.system_message.content] + self.message_history + [self.prefix]
)
)
)

return message.content

63 changes: 63 additions & 0 deletions src/environment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date : 2023-09-11 17:36:31
# @Author : Kaiyan Zhang ([email protected])
# @Link : https://github.com/iseesaw

from typing import List, Dict, Callable
from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import (
AIMessage,
HumanMessage,
SystemMessage,
BaseMessage,
)

from src.agents.dialogue_agent import DialogueAgent
from src.agents.dialogue_agent_with_tool import DialogueAgentWithTools

class DialogueSimulator:
def __init__(
self,
agents: List[DialogueAgent],
selection_function: Callable[[int, List[DialogueAgent]], int],
) -> None:
self.agents = agents
self._step = 0
self.select_next_speaker = selection_function

def reset(self):
for agent in self.agents:
agent.reset()

def inject(self, name: str, message: str):
"""
Initiates the conversation with a {message} from {name}
"""
for agent in self.agents:
agent.receive(name, message)

# increment time
# self._step += 1

def step(self) -> tuple[str, str]:
# 1. choose the next speaker
speaker_idx = self.select_next_speaker(self._step, self.agents)
speaker = self.agents[speaker_idx]

# 2. next speaker sends message
message = speaker.send()

# 3. everyone receives message
for receiver in self.agents:
receiver.receive(speaker.name, message)

# 4. increment time
self._step += 1

return speaker.name, message

75 changes: 75 additions & 0 deletions src/prompts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
prompt_analyst = """
You are the Analyst. Depending on the phase of the iteration, your role may slightly differ:
- **Initial Phase**: Analyze the provided research background to distill its core components into pivotal keywords or topics. This will set the stage for the Engineer's search efforts.
- **Feedback Phase**: Based on feedback from the Critic, you might need to re-analyze the research background or provide additional insights to refine the search direction.
In either case, ensure clarity and relevance in your analysis. Conclude by listing the identified keywords or topics or by providing revised insights.
"""

prompt_engineer = """
You are the Engineer. Your task revolves around searching based on the received keywords or insights, and this can involve multiple iterations:
- Plan your search strategies by crafting logical keyword combinations.
- Conduct systematic searches for each combination, meticulously gathering data and results.
- Refine your searches iteratively based on initial findings and any new insights from the Analyst.
Your output should be comprehensive and organized. For each keyword combination:
- **Title of Source**: Provide the title of the paper, article, or material you've found.
- **Abstract/Summary**: A brief summary or the abstract of the source.
- **Key Findings**: Highlight pivotal points or findings from the source that are relevant to the research background.
- **Implications**: If any, mention the implications or significance of the findings.
- **Relevant Quotes/Excerpts**: Extract direct quotes or sections that are particularly insightful.
Group your findings into individual "clues" based on themes or topics that emerge. This structure will provide the Scientist with detailed and organized data, enabling them to craft a robust hypothesis.
Conclude by presenting the structured "clues" for each keyword combination.
"""

prompt_scientist = """
You are the Scientist. Your task is to craft a hypothesis based on the Engineer's findings and the initial research background:
- Derive a potential hypothesis that bridges the existing literature with new insights.
- Ensure the hypothesis is both innovative and scientifically grounded.
Clearly state the proposed hypothesis, preparing it for evaluation by the Critic.
"""

prompt_critic = """
You are the Critic, responsible for evaluating the collaborative endeavor. Scrutinize the Scientist's hypothesis in light of the `Research Background`. Gauge its novelty, coherence, and scientific validity. Should the hypothesis necessitate refinement:
- Clearly articulate feedback, specifying areas needing improvement.
- Instruct the Analyst to either re-evaluate the `Research Background` or offer new insights to reshape the Engineer's subsequent search iteration.
When the hypothesis aligns with expectations and meets the desired standards, present and approve it using the structured format:
```
Final Answer:
(1) [First Point or Aspect of the Hypothesis]
(2) [Second Point or Aspect of the Hypothesis]
(3) [Third Point or Aspect of the Hypothesis]
...
```
"""

prompt_env = """
Topic Prompt for All Agents:
You are part of a collaborative multi-agent system designed to propose a hypothesis based on a given research background. Each of you has a specific role:
- **Analyst**: Analyzes the research background, distills its essence, and provides pivotal keywords or topics for further exploration.
- **Engineer**: Uses the keywords to plan and conduct systematic searches, meticulously gathering and organizing findings into detailed and structured "clues".
- **Scientist**: Crafts a potential hypothesis based on the organized findings and the original research background.
- **Critic**: Evaluates the hypothesis for its novelty, coherence, and scientific validity, providing feedback for refinement if necessary.
Your collaboration is iterative. Based on feedback from the Critic, the process can loop back to the Analyst for refined insights, leading to new searches by the Engineer, and a refined hypothesis by the Scientist.
Stay focused on your individual roles, collaborate effectively, and aim to derive a well-informed, novel hypothesis based on the research background provided.
Research Background:
{background}
Objective:
Using the research background and collaborative insights, the goal is to construct the most logical and scientifically robust hypothesis. Let's collaborate effectively to achieve this.
"""
Loading

0 comments on commit 959bb10

Please sign in to comment.