ARMemory

Official code for the paper A picture is worth a thousand words? Investigating the Impact of Image Aids in AR on Memory Recall for Everyday Tasks

Memories retrieval using Multimodal LLM with Hololens 2.

This system developed for deployment on Hololens 2 device

Captures images from user's field of view and stores the descriptions of them generated with LLM in a vector database;
Retrieves the images best matching the user's question and generates textual answers complementing the retrieved images.

Stack

Client-related code is written in C#;
Server-related code is written in Python.

for integrating LISA model
's **Transformers** library (Bits and Bytes) for model 4-bit quantization

Server set-up

Clone the repository; to run the server, you will only need files in the server folder.

git clone https://github.com/bess-cater/ARMemory.git && cd ARMemory/server

Build docker image from dockerfile:

docker build -t memory_image .

And run it:

docker run -i -t -d --name memory --ipc=host --gpus all -v "$(pwd)/ARMemory/server:/memory" -p 9999:9999 memory_image

Activate conda environment:

conda env create -f environment.yml && conda activate memory

Additional steps:

For Image and Image+Text mode you may want to use segmentation model. We use LISA.

Clone the repository and place the corresponding folder inside the server folder.

You may want to change the port; in this case, make sure to assign the same port in server and client-related files.

Encoding

This section describes the process for capturing pictures from Hololens built-in camera and processing them with the chosen LLM.

Server

Before starting the server, make sure to provide your OpenAI key if you are using GPT-4o model or/and OpenAI model for vector embeddings; it is required here.
Start the server

python -m myserver --save <name of a person whose viewpoint is recorded> --scene <place which is recorded>

Client

In order to enable Hololens 2 capturing images and sending it to the server, you need to build the Hololens application from Unity project.

Open encoding scene
Make sure that encoding script is attached to the scene and your IP and port are set.
Before building the application, make sure your app has access to the WebCam (Edit - Project Settings - Capabilities - WebCam)
Build and deploy solution
Run in the Hololens 2 device (grant access to the WebCam)

Retrieval

In this stage, answers to the users' questions are retrieved with the help of LLM.

Server

Run send_message.py

python -m send_message --save <name of a person whose viewpoint is recorded> --scene <place which is recorded> --condition <condition>

The first two arguments must be the same as in the Encoding stage; for condition, you may choose image, text, ot image_text.

Client

Create a new project in Unity and make sure you have configured it with the Mixed Reality Toolkit. We used version 2.8.3.0.
For this project, we especially rely on Microsoft Azure cognitive services; the package must be downloaded from here and imported into the Unity project.
Open the retrieval scene. If everything is set right, you will see the following layout:

To use the sevices, provide your Speech SDK credentials.
Do not forget to set your IP and port!
Before the build, make sure that InternetClient, Microphone, and SpatialPerception capabilities of the Publishing settings are enabled.
Build and deploy solution; run it on the Hololens 2.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
LLM_eval		LLM_eval
client/Assets		client/Assets
img		img
server		server
vis		vis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARMemory

Table of Contents

Stack

Server set-up

Encoding

Server

Client

Retrieval

Server

Client

About

Releases

Packages

Languages

bess-cater/ARMemory

Folders and files

Latest commit

History

Repository files navigation

ARMemory

Table of Contents

Stack

Server set-up

Encoding

Server

Client

Retrieval

Server

Client

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages