a virtual assistant that helps users improve their conversation skills through voice practice sessions
Munio is a virtual assistant that functions as an English teacher to help users enhance their conversational skills through voice practice sessions. The application's main dependency is Generative AI, which it uses for audio analysis, generating phrases, and providing feedback. In this project, we utilize Gemini AI to fulfill these requirements.
The application offers two modes: sessions and conversations.
In this mode, users will answer random phrases based on context and level requested by themselves. After each lesson and at the end of the session, they will receive an overall feedback, helping them understand how to improve.
- Phrase generation using Gemini AI, based on context and level requested by the user;
- Audio analysis using Gemini AI, providing feedback about the user's speaking and pronunciation;
- Audio upload using Google Cloud Storage;
- Session overall feedback using Gemini AI.
In this mode, users will have a realistic dialogue with an AI (using Text to Speech) generated based on context and level requested by them. At the end of the conversation, they will receive a general feedback that presents ways for improve their conversation skills.
- Realistic dialogue generated by Gemini AI, based on context and level requested by the user;
- Audio analysis using Gemini AI, providing feedback about the user's speaking and pronunciation;
- Audio upload using Google Cloud Storage;
- Text to Speech using Google Cloud Text to Speech;
- Conversation overall feedback using Gemini AI.
To improve user experience, we use Websockets (Socket.io) for a more natural real-time interaction.
This is a public version of the back-end application and the front-end can be found here: web-client.
npm ci
docker compose up
npm run migration:up
# production
npm run migration:up:prod
# development
npm run start
# watch mode
npm run start:dev
# production mode
npm run start:prod
- Frontend interface: web-client
- Phrase Generator: Google Gemini Flash
- Audio recording analysis: Google Gemini Flash
- Session analysis: Google Gemini Flash
- Storage: Google Cloud Storage
- TTS: Google Cloud Text to Speech
- Infrastructure: Google Cloud Platform: Cloud SQL and App Engine
- Websockets: Socket.io
- Authentication: Passport
- Validations: Zod
- Query Builder: Knex
- Database: MySQL
- Framework: NestJS
- Author - Gabriel Sena
- Website - https://munio.cloud