Skip to content
/ hollr Public

An R package for chat completion and text annotation with both local LLMs and OpenAI models.

License

Notifications You must be signed in to change notification settings

jaytimm/hollr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hollr

An R package for chat completion and text annotation with both local LLMs and OpenAI models. Key features include:

  • Versatile Model Access: Interact with either local LLMs (via Python/reticulate) or OpenAI models through a straightforward function.

  • Multiple Annotator Support: Facilitate text annotation workflows with support for multiple annotators, including ensembling and majority voting methods.

  • Batch and Parallel Processing: Handle multiple inputs simultaneously, leveraging local LLMs or speeding up tasks by utilizing multiple cores when working with OpenAI models.

  • Consistent Output: Ensure uniform data frame outputs across model types.

Installation

Get the development version from GitHub with:

remotes::install_github("jaytimm/hollr")

Usage

A quick example

Some PubMed data

pmids <- puremoe::search_pubmed('("political ideology"[TiAb])',
                                 use_pub_years = F) |> 
  puremoe::get_records(endpoint = 'pubmed_abstracts', 
                       cores = 3, 
                       sleep = 1) 
pmid year articletitle ab
39374517 2024 Racial Minorities Face Discrimination From Across the Political Spectrum When Seeking to Form Ties on Social Media: Evidence From a Field Experiment. We conducted a preregistered field experiment examining racial discrimination in tie formation on social media. We randomly assigned research accounts …
39340096 2024 Messaging to Reduce Booster Hesitancy among the Fully Vaccinated. Vaccine hesitancy was a serious problem in the United States throughout the COVID-19 pandemic, due in part to the reduction …
39320049 2024 Rural reticence to inform physicians of cannabis use. Over 75% of Americans have legal access to medical cannabis, though physical access is not uniform and can be difficult …
39222956 2024 The prototypical UK blood donor, homophily and blood donation: Blood donors are like you, not me. Homophily represents the extent to which people feel others are like them and encourages the uptake of activities they feel …
39194099 2024 The impact of conspiracy theories and vaccine knowledge on vaccination intention: a longitudinal study. In this study, we analyzed associations between vaccination knowledge, vaccination intention, political ideology, and belief in conspiracy theories before and …
39148747 2024 Formative reasons for state-to-state influences on firearm acquisition in the U.S. Firearm-related crimes and self-inflicted harms pose a significant threat to the safety and well-being of Americans. Investigation of firearm prevalence …

A quick prompt

## For the PubMed abstract provided below, provide a
## single sentence summary of the research findings
## in 30 words. Ensure that the summary is concise,
## starts with "Study results demonstrate," and
## highlights the key outcomes. Also, identify the
## country or countries where the study was
## conducted.
## 
## Expected Output:
## {
## "country": "Country or countries where the study
## was conducted.",
## "summary": "Study results demonstrate ...
## (summary of the research findings in 30 words)."
## }
## 
## Abstract:

Cloud-based LLMs

prompt <- paste(p1, pmids$abstract, sep = '\n\n')

Single core & single annotator

class_task1 <- hollr::hollr(
  model = 'gpt-4o-mini',
  id = pmids$pmid[1:6],
  user_message = prompt[1:6], 
  cores = 1, 
  annotators = 1, 
  max_attempts = 7,
  force_json = T,
  flatten_json = T
  )

Ouput

id country summary
39374517 United States Study results demonstrate racial discrimination in social media tie formation, with individuals less likely to reciprocate ties with Black accounts compared to White ones, regardless of political orientation.
39340096 United States Study results demonstrate that providing scientific explanations about mRNA booster safety and effectiveness significantly improved willingness to get boosted, trust in scientists, and perceptions across political ideology groups.
39320049 United States Study results demonstrate that rural Pennsylvanians are less likely to disclose marijuana use to healthcare providers due to stigma, potentially impacting their health outcomes and care quality.
39222956 United Kingdom Study results demonstrate current blood donors and MSM show higher homophily with the prototypical UK donor, while ethnic minorities and recipients exhibit lower homophily, influencing donation likelihood.
39194099 Brazil Study results demonstrate that increased belief in vaccine conspiracy theories correlates with decreased vaccination intention and knowledge, highlighting the need for targeted health education in Brazil.
39148747 United States Study results demonstrate that U.S. states’ firearm acquisition patterns co-evolve with crime rates and laws, indicating that stricter laws and lower homicides can reduce inter-state acquisition influences.

Parallel processing & multiple annotators

class_task2 <- hollr::hollr(
  model = 'gpt-4o-mini',
  id = pmids$pmid[1:10],
  user_message = prompt[1:10], 
  cores = 7, 
  annotators = 3, 
  max_attempts = 7,
  force_json = T,
  flatten_json = T
  )

Local LLMs

Conda environment

# Create and activate a new conda environment with Python 3.9
conda create -n llm_base python=3.9 -y
conda activate llm_base

# Update all packages in the environment
conda update --all -y

# Install required packages with conda
conda install nmslib pandas numpy spacy -c conda-forge -y
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y

# Install additional packages with pip
pip install transformers packaging ninja flash-attn --no-build-isolation accelerate protobuf auto-gptq \
"git+https://github.com/PanQiWei/[email protected]" optimum tiktoken sentencepiece

Reticulate

# Set environment variables and use conda environment
Sys.setenv(RETICULATE_PYTHON = file.path(miniconda_path, "envs", env_name, "bin/python"))
reticulate::use_condaenv(condaenv = env_name, conda = file.path(miniconda_path, "bin/conda"))
llm = 'meta-llama/Meta-Llama-3.1-8B-Instruct'

Batch processing

batch_seq <- hollr::hollr(
  model = llm,
  id = pmids$pmid[1:10],
  user_message = prompt[1:10], 
  
  annotators = 3, 
  #max_attempts = 7,
  force_json = F,
  flatten_json = F,
  max_new_tokens = 75, 
  batch_size = 5
  )

About

An R package for chat completion and text annotation with both local LLMs and OpenAI models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published