This project aims at providing a reference implementation for an IPA framework.
You will need
- A C/C++ compiler supporting at least C++ 17
- CMake version 3.22
- a local copy of libcurl https://curl.se/download.html so that it can be found by FindPackage on Windows systems
- A C++ IDE of your choice
- An OpenAI developer key
git clone [email protected]:w3c/voiceinteraction.git
The source code is in the folder source/w3cipa
w3cipaframework
contains the interfaces as described at Intelligent Personal Assistant Interfaces and Architecture and Potential for Standardization Version.w3cipareferenceimplementation
contains the actual reference implementation of some common components using these interfacesw3cipachatgptipaprovider
an IPAProvider implementation for ChatGPTw3cipademo
contains demo using the reference implementation
Most C++-IDEs support CMake as a build system. In the IDE of your choice open
file w3cipa/CMakeLists.txt
.
On Windows systems
cd source/w3cipa
mkdir build
cd build
cmake .. -DCURL_INCLUDE_DIR=<Your path to the CURL include directory> -DCURL_LIBRARY=<Your path to the CURL library directory>
make && make install
On Linux based systems or any other
cd source/w3cipa
mkdir build
cd build
cmake ..
make && make install
The current demo aims at interacting with ChatGPT. This Walkthrough will also give hints about the used components from Architecture and Potential for Standardization Version
As a first step you will need to provide the correct developer key to communicate with ChatGPT.
In order to get started, you will need to replace your OpenAI developer key in the file voiceinteraction/source/w3cipa/w3cipachatgptipaprovider/ChatGPTIPAProvider.json
Replace OPENAI-DEVELOPER-KEY
with your actual key.
Take care not to commit while this key is in the source code
The main program starts with creating all the needed components per layer as described in Intelligent Personal Assistant Interfaces.
All components are created as shared instances, as they can potentially be re-used in the employed processing chain.
On the client side, we mainly need the correct modality components, text via
console
for now, a modality manager modalityManager
to handle all known
modalities, and a component to select which input to forward to the
IPA. In this case, we simply select the first one that reaches us via
inputListener
.
The modalityManager
and inputListener
are part of the IPA Client
.
ModalityComponent
s are either the Capture
or Presentation
components or
both as in the case of the console
.
std::shared_ptr<client::ModalityManager> modalityManager =
std::make_shared<client::ModalityManager>();
std::shared_ptr<::reference::client::ConsoleTextModalityComponent> console =
std::make_shared<::reference::client::ConsoleTextModalityComponent>();
modalityManager->addModalityComponent(console);
std::shared_ptr<::reference::client::TakeFirstInputModalityComponentListener> inputListener =
std::make_shared<::reference::client::TakeFirstInputModalityComponentListener>();
Here, we create the Dialog Manager
and the ipaService
as an implementation of the IPA Service
.
The dialog manager mainly separates good calls from erroneous ones and forwards
the reply accordingly.
std::shared_ptr<::reference::dialog::ReferenceIPAService> ipaService =
std::make_shared<::reference::dialog::ReferenceIPAService>();
std::shared_ptr<::reference::dialog::ReferenceIPADialogManager> ipaDialogManager =
std::make_shared<::reference::dialog::ReferenceIPADialogManager>();
Here, we create an instance of an IPAProvider
to communicate with ChatGPT.
This instance chatGPT
is added to the list of known IPA providers in the
registry
as an implementation of a Provider Registry
.
The providerSelectionStrategy
is used by the registry
to select those IPA providers that are suited to handle the current request.
In this case, we select all those that have a matching modality, i.e. text and
the correct language. A chained filter is used to select the best provider.
The providerSelectionService
as an implementation of the
Provider Selection Service
acts as the main component to be approached
from components in the Dialog Layer
. It makes use of the registry
to
obtain a list of IPA Providers
that are suited to handle an actual
request, forwards this request to them and waits until all responses
have been received.
// Create a chained filter for selecting the best provider
std::shared_ptr<::external::ipa::ProviderSelectionStrategyList>
providerSelectionStrategy =
std::make_shared<::external::ipa::ProviderSelectionStrategyList>();
std::shared_ptr<::external::ipa::LanguageMatchingProviderSelectionStrategy>
languageProviderSelectionStrategy = std::make_shared<
::external::ipa::LanguageMatchingProviderSelectionStrategy>();
providerSelectionStrategy->addStrategy(languageProviderSelectionStrategy);
std::shared_ptr<::external::ipa::ModalityMatchingProviderSelectionStrategy>
modalityProviderSelectionStrategy =
std::make_shared<::external::ipa::ModalityMatchingProviderSelectionStrategy>();
providerSelectionStrategy->addStrategy(modalityProviderSelectionStrategy);
// create main components in the external layer
std::shared_ptr<ProviderRegistry> registry =
std::make_shared<ProviderRegistry>(providerSelectionStrategy);
std::shared_ptr<IPAProvider> chatGPT =
std::make_shared<::reference::external::ipa::chatgpt::ChatGPTIPAProvider>();
registry->addIPAProvider(chatGPT);
std::shared_ptr<external::ProviderSelectionService> providerSelectionService =
std::make_shared<external::ProviderSelectionService>(registry);
Following Intelligent Personal Assistant Interfaces we then tie those needed components together.
modalityManager >> inputListener >> ipaService >> providerSelectionService
>> ipaDialogManager >> ipaService >> modalityManager;
The following shows which components from the diagram above are available and how this chain maps to it.
Finally, we need to start capturing input and start processing in the IPA
modalityManager->startInput();
inputListener->IPADataProcessor::processIPAData();
When running the program w3cipademo
we may see the following on the screen
User: What is the voice interaction community group?
System: The Voice Interaction Community Group (VoiceIG) is a group under the
World Wide Web Consortium (W3C) that focuses on promoting and enabling the use
of voice technology on the web. This community group aims to facilitate
discussions, share best practices, and collaborate on standards and guidelines
related to voice interactions on the web. The group is open to anyone interested
in voice technology, including developers, designers, researchers, and other
stakeholders in the industry.
This code is distributed under the Software and Document license - 2023 version.
While the W3C IPA Framework in the folder include
is a dependency free
implementation without any dependency, the reference implementation in folder
source
makes use of 3rd-party software.
- libCURL
- https://curl.se/libcurl/
- MIT/X like license
- log4cplus
- https://github.com/log4cplus/log4cplus
- Two clause BSD license
- nlohmann JSON
- https://github.com/nlohmann/json
- MIT license
- OpenSSL
- https://www.openssl.org/
- Apache License 2
- stduuid
- https://github.com/mariusbancila/stduuid/
- MIT License
A list of open issues can be displayed via Open Issues for Reference Implementation.