For now, I'm using a Raspberry PI 3B+ combined with a RASPIAUDIO Ultra +. It's onboard mic and speaker which is convenient for my experimentation.
To trigger the discussion a "wake-up-word" technology is necessary. There is a lot of these technologies in the opensource world. Here is the list of technologies experimented :
The translation of voice records into text is currently performed by the Google Cloud Speech-To-Text API.
The mozilla/deepspeech project with Common Voice database is considered.
The translation of Milobella text answers into speech synthesis is ensured by Google Cloud Text-To-Speech API.
No replacement has been considered for now.