Skip to content

Commit ca0c0c8

Browse files
yinhewYulin LiCopilot
authored
[Talking Avatar] support setting photo avatar scene (zoom, position, rotation) from API (#2968)
* [TalkingAvatar] Add sample code for TTS talking avatar real-time API * sample codes for batch avatar synthesis * Address repository check failure * update * [Avatar] Update real time avatar sample code to support multi-lingual * [avatar] update real time avatar chat sample to receive GPT response streamingly * [Live Avatar] update chat sample to make some refinements * [TTS Avatar] Update real-time sample to support 1. non-continuous recognition mode 2. a button to stop speaking 3. user can type query without speech * [TTS Avatar] Update real time avatar sample to support auto-reconnect * Don't reset message history when re-connecting * [talking avatar] update real time sample to support using cached local video for idle status, to help save customer cost * Update chat.html and README.md * Update batch avatar sample to use mp4 as default format, to avoid defaultly showing slow speed with vp9 * A minor refinement * Some refinement * Some bug fixing * Refine the reponse receiving logic for AOAI streaming mode, to make it more robust * [Talking Avatar] update real-time sample code to log result id (turn id) for ease of debugging * [Talking Avatar] Update avatar live chat sample, to upgrade AOAI API version from 2023-03-15-preview to 2023-12-01-preview * [Talking Avatar][Live Chat] Update AOAI API to be long term support version 2023-06-01-preview * [Talking Avatar] Add real time avatar sample code for server/client hybrid web app, with server code written in python * Some refinements * Add README.md * Fix repo check failure: files that are neither marked as binary nor text, please extend .gitattributes * [Python][TTS Avatar] Add chat sample * [Python][TTS Avatar] Add chat sample - continue * Support multiple clients management * Update README.md * [Python][TTS Avatar] Support customized ICE server * [Talking Avatar][Python] Support stop speaking * Tolerat speech sdk to unsupport sending message with connection * [Python][TTS Avatar] Send local SDP as post body instead of header, to avoid header size over limit * [python][avatar] update requirements.txt to add the missing dependencies * [python][avatar] update real-time sample to make auto-connection more smoothy * [Python][Avatar] Fix some small bugs * [python][avatar] Support AAD authorization on private endpoint * [Java][Android][Avatar] Add Android sample code for real time avatar * Code refinement * More refinement * More refinement * Update README.md * [Java][Android][Avatar] Remove AddStream method, which is not available with Unified Plan SDP semantics, and use AddTrack per suggestion * [Python][Avatar][Live] Get speaking status from WebRTC event, and remove the checkSpeakingStatus API from backend code * [Java][Android][Live Avatar] Update the sample to demonstrate switching audio output device to loud speaker * [Python][Avatar][Live] Switch from REST API to SDK for calling AOAI * [Python][Avatar][Live] Trigger barging at first recognizing event which is earlier * [Python][Avatar][Live] Enable continuous conversation by default * [Python][Avatar][Live] Disable multi-lingual by default for better latency * [Python][Avatar][Live] Configure shorter segmentation silence timeout for quicker SR * [Live Avatar][Python, CSharp] Add logging for latency * [TTS Avatar][Live][Python, CSharp, JS] Fix a bug to correctly clean up audio player * [TTS Avatar][Live][JavaScript] Output display text with a slower rate, to follow the avatar speaking progress * Make the display text / speech alignment able for on/off * [TTS Avatar][Live][CSharp] Output display text with a slower rate, to follow the avatar speaking progress * Create an auto-deploy file * Unlink the containerApp yinhew-avatar-app from this repo * Delete unnecessary file * [talking avatar][python] Update real time sample to add option to connect with server through WebSocket, and do STT on server side * [TTS Avatar][Live][js] update sample code for support of setting of background image and remote TURN server URL * [talking avatar][live][python] make sure host can still start up without AOAI resource * [Talking Avatar][Live] update sample code to close WS connection in-time, when user closes/refreshes web page, or auto-reconnection is applied * Some refinement to connection object * Update csharp sample as well * [Talking Avatar][Live][Python] Check ICE token fetching success * [Talking Avatar][Live][Python] Add VAD for interruption with lower delay * [TTS Avatar][JS] Continue speaking unfinished sentences after reconnection * [TTS Avatar][Python] Continue speaking unfinished sentences after reconnection * [TTS Avatar][Python] Trigger reconnection for websockets disconnection * [TTS Avatar][JS] Trigger reconnection for websockets disconnection * [TTS Avatar][JS, Python] Refine the auto-reconnect logic to avoid infinite reconnection * [Talking Avatar][JS, Python, CSharp] When reconnecting, remove data channel onmessage callback to avoid duplicatedly triggering reconnect * [Talking Avatar][Python] refine the re-connect logic to detect the disconnection earlier * [Talking Avatar][Live] update sample code to make sure avatar can be loaded on iOS Safari * Update samples/csharp/web/avatar/wwwroot/js/basic.js Co-authored-by: Copilot <[email protected]> * Update samples/csharp/web/avatar/wwwroot/js/chat.js Co-authored-by: Copilot <[email protected]> * Update samples/csharp/web/avatar/wwwroot/js/chat.js Co-authored-by: Copilot <[email protected]> * Update samples/js/browser/avatar/js/chat.js Co-authored-by: Copilot <[email protected]> * Update samples/python/web/avatar/static/js/basic.js Co-authored-by: Copilot <[email protected]> * Update samples/python/web/avatar/static/js/chat.js Co-authored-by: Copilot <[email protected]> * Revert the unpurposed change * [Talking Avatar] Collect ICE candidates on page loading, to reduce avatar load latency * Fix a typo * [Python][Avatar] Add recommendation of deployment through Azure Container Apps * Fix lint-python pipeline break due to flake8 package installation failure * Bump up SDK version to 1.45 * [talking avatar] start playing only after data is loaded, to handle not-playing issue on some browsers * [Talking Avatar] Use different routes for prebuilt avatar and custom avatar * suppress lint error E501 line too long * [Batch Avatar] update batch avatar sample code to support photo avatar * [Talking Avatar] Update real-time API samples to support photo avatar * [Talking Avatar] Update real-time API samples to support photo avatar - CSharp * [Talking Avatar] Update real-time API samples to support photo avatar - node.js * [Talking Avatar] Update real-time API samples to support photo avatar - readme * Add voice live avatar sample code of node.js * Address repository check failures * Address repository check error * Address lint python check errors * Address lint check failure * Fix a docker build failure * Add readme for Voice Live Avatar sample of node.js * A minor update * [Talking Avatar] support setting photo avatar scene (zoom, position, rotation) from API --------- Co-authored-by: Yulin Li <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent 3a0162c commit ca0c0c8

File tree

9 files changed

+376
-1
lines changed

9 files changed

+376
-1
lines changed

samples/csharp/web/avatar/Controllers/AvatarController.cs

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,15 @@ public async Task<IActionResult> ConnectAvatar()
295295
{
296296
url = backgroundImageUrl
297297
}
298+
},
299+
scene = new
300+
{
301+
zoom = 1.0,
302+
positionX = 0.0,
303+
positionY = 0.0,
304+
rotationX = 0.0,
305+
rotationY = 0.0,
306+
rotationZ = 0.0
298307
}
299308
}
300309
}
@@ -392,6 +401,53 @@ public async Task<IActionResult> StopSpeaking()
392401
}
393402
}
394403

404+
[HttpPost("api/updateScene")]
405+
public async Task<IActionResult> UpdateScene()
406+
{
407+
try
408+
{
409+
var clientIdHeader = Request.Headers["ClientId"];
410+
if (!Guid.TryParse(clientIdHeader, out Guid clientId))
411+
{
412+
return BadRequest("Invalid ClientId");
413+
}
414+
415+
string jsonData;
416+
using (var reader = new StreamReader(Request.Body, Encoding.UTF8))
417+
{
418+
jsonData = await reader.ReadToEndAsync();
419+
}
420+
421+
var sceneRequest = JsonConvert.DeserializeObject<JObject>(jsonData);
422+
var sceneConfig = new
423+
{
424+
avatarScene = new
425+
{
426+
zoom = sceneRequest?["zoom"],
427+
positionX = sceneRequest?["positionX"],
428+
positionY = sceneRequest?["positionY"],
429+
rotationX = sceneRequest?["rotationX"],
430+
rotationY = sceneRequest?["rotationY"],
431+
rotationZ = sceneRequest?["rotationZ"]
432+
}
433+
};
434+
435+
var clientContext = _clientService.GetClientContext(clientId);
436+
var connection = clientContext.SpeechSynthesizerConnection as Connection;
437+
if (connection != null)
438+
{
439+
await connection.SendMessageAsync("synthesis.control", JsonConvert.SerializeObject(sceneConfig));
440+
return Ok("Scene updated.");
441+
}
442+
443+
return BadRequest("Connection not available.");
444+
}
445+
catch (Exception ex)
446+
{
447+
return StatusCode(StatusCodes.Status500InternalServerError, $"Error: {ex.Message}");
448+
}
449+
}
450+
395451
[HttpPost("api/chat")]
396452
public async Task<IActionResult> Chat()
397453
{

samples/csharp/web/avatar/Views/Home/basic.cshtml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,28 @@
6868
</div>
6969
<br />
7070

71+
<div id="photoAvatarSceneConfig" style="background-color: white; width: 300px; padding: 10px;" hidden="hidden">
72+
<label style="font-size: medium;" for="sliderZoom">Zoom:</label>
73+
<input type="range" id="sliderZoom" min="70" max="100" step="1" value="100" oninput="window.updatePhotoAvatarScene()">
74+
<span id="valueZoom">100%</span><br/>
75+
<label style="font-size: medium;" for="sliderPositionX">Position X:</label>
76+
<input type="range" id="sliderPositionX" min="-50" max="50" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
77+
<span id="valuePositionX">0%</span><br/>
78+
<label style="font-size: medium;" for="sliderPositionY">Position Y:</label>
79+
<input type="range" id="sliderPositionY" min="-15" max="50" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
80+
<span id="valuePositionY">0%</span><br/>
81+
<label style="font-size: medium;" for="sliderRotationX">Rotation X:</label>
82+
<input type="range" id="sliderRotationX" min="-30" max="30" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
83+
<span id="valueRotationX">0 deg</span><br/>
84+
<label style="font-size: medium;" for="sliderRotationY">Rotation Y:</label>
85+
<input type="range" id="sliderRotationY" min="-30" max="30" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
86+
<span id="valueRotationY">0 deg</span><br/>
87+
<label style="font-size: medium;" for="sliderRotationZ">Rotation Z:</label>
88+
<input type="range" id="sliderRotationZ" min="-30" max="30" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
89+
<span id="valueRotationZ">0 deg</span><br/>
90+
</div>
91+
<br/>
92+
7193
<h2 style="background-color: white; width: 300px;">Logs</h2>
7294
<div id="logging" style="background-color: white;"></div>
7395
</body>

samples/csharp/web/avatar/wwwroot/js/basic.js

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,10 @@ function preparePeerConnection() {
119119
document.getElementById('stopSession').disabled = false
120120
document.getElementById('speak').disabled = false
121121
document.getElementById('configuration').hidden = true
122+
if (document.getElementById('photoAvatar').checked) {
123+
document.getElementById('photoAvatarSceneConfig').hidden = false
124+
window.resetPhotoAvatarScene()
125+
}
122126
}
123127

124128
if (peerConnection.iceConnectionState === 'disconnected' || peerConnection.iceConnectionState === 'failed') {
@@ -127,6 +131,9 @@ function preparePeerConnection() {
127131
document.getElementById('stopSession').disabled = true
128132
document.getElementById('startSession').disabled = false
129133
document.getElementById('configuration').hidden = false
134+
if (document.getElementById('photoAvatar').checked) {
135+
document.getElementById('photoAvatarSceneConfig').hidden = true
136+
}
130137
}
131138
}
132139

@@ -209,6 +216,9 @@ function connectToAvatarService(peerConnection) {
209216
} else {
210217
document.getElementById('startSession').disabled = false;
211218
document.getElementById('configuration').hidden = false;
219+
if (document.getElementById('photoAvatar').checked) {
220+
document.getElementById('photoAvatarSceneConfig').hidden = true
221+
}
212222
throw new Error(`Failed connecting to the Avatar service: ${response.status} ${response.statusText}`)
213223
}
214224
})
@@ -369,6 +379,59 @@ window.updataTransparentBackground = () => {
369379
}
370380
}
371381

382+
window.updatePhotoAvatarScene = () => {
383+
const zoom = parseFloat(document.getElementById('sliderZoom').value)
384+
const positionX = parseFloat(document.getElementById('sliderPositionX').value)
385+
const positionY = parseFloat(document.getElementById('sliderPositionY').value)
386+
const rotationX = parseFloat(document.getElementById('sliderRotationX').value)
387+
const rotationY = parseFloat(document.getElementById('sliderRotationY').value)
388+
const rotationZ = parseFloat(document.getElementById('sliderRotationZ').value)
389+
390+
// Update the displayed values
391+
document.getElementById('valueZoom').textContent = zoom.toFixed() + '%'
392+
document.getElementById('valuePositionX').textContent = positionX.toFixed() + '%'
393+
document.getElementById('valuePositionY').textContent = positionY.toFixed() + '%'
394+
document.getElementById('valueRotationX').textContent = rotationX.toFixed() + ' deg'
395+
document.getElementById('valueRotationY').textContent = rotationY.toFixed() + ' deg'
396+
document.getElementById('valueRotationZ').textContent = rotationZ.toFixed() + ' deg'
397+
const sceneRequest = {
398+
zoom: zoom / 100,
399+
positionX: positionX / 100,
400+
positionY: positionY / 100,
401+
rotationX: rotationX * Math.PI / 180,
402+
rotationY: rotationY * Math.PI / 180,
403+
rotationZ: rotationZ * Math.PI / 180
404+
}
405+
406+
fetch('/api/updateScene', {
407+
method: 'POST',
408+
headers: {
409+
'ClientId': clientId,
410+
'Content-Type': 'application/json'
411+
},
412+
body: JSON.stringify(sceneRequest)
413+
})
414+
.then(response => {
415+
if (response.ok) {
416+
console.log(`[${new Date().toISOString()}] Scene updated successfully.`)
417+
} else {
418+
console.error(`[${new Date().toISOString()}] Failed to update scene. ${response.status} ${response.statusText}`)
419+
}
420+
})
421+
.catch(error => {
422+
console.error(`[${new Date().toISOString()}] Error updating scene: ${error}`)
423+
})
424+
}
425+
426+
window.resetPhotoAvatarScene = () => {
427+
document.getElementById('sliderZoom').value = 100.0
428+
document.getElementById('sliderPositionX').value = 0.0
429+
document.getElementById('sliderPositionY').value = 0.0
430+
document.getElementById('sliderRotationX').value = 0.0
431+
document.getElementById('sliderRotationY').value = 0.0
432+
document.getElementById('sliderRotationZ').value = 0.0
433+
}
434+
372435
window.onbeforeunload = () => {
373436
navigator.sendBeacon('/api/releaseClient', JSON.stringify({ clientId: clientId }))
374437
}

samples/js/node/web/avatar/basic.ejs

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,28 @@
6565
</div>
6666
<br/>
6767

68+
<div id="photoAvatarSceneConfig" style="background-color: white; width: 300px; padding: 10px;" hidden="hidden">
69+
<label style="font-size: medium;" for="sliderZoom">Zoom:</label>
70+
<input type="range" id="sliderZoom" min="70" max="100" step="1" value="100" oninput="window.updatePhotoAvatarScene()">
71+
<span id="valueZoom">100%</span><br/>
72+
<label style="font-size: medium;" for="sliderPositionX">Position X:</label>
73+
<input type="range" id="sliderPositionX" min="-50" max="50" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
74+
<span id="valuePositionX">0%</span><br/>
75+
<label style="font-size: medium;" for="sliderPositionY">Position Y:</label>
76+
<input type="range" id="sliderPositionY" min="-15" max="50" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
77+
<span id="valuePositionY">0%</span><br/>
78+
<label style="font-size: medium;" for="sliderRotationX">Rotation X:</label>
79+
<input type="range" id="sliderRotationX" min="-30" max="30" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
80+
<span id="valueRotationX">0 deg</span><br/>
81+
<label style="font-size: medium;" for="sliderRotationY">Rotation Y:</label>
82+
<input type="range" id="sliderRotationY" min="-30" max="30" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
83+
<span id="valueRotationY">0 deg</span><br/>
84+
<label style="font-size: medium;" for="sliderRotationZ">Rotation Z:</label>
85+
<input type="range" id="sliderRotationZ" min="-30" max="30" step="1" value="0" oninput="window.updatePhotoAvatarScene()">
86+
<span id="valueRotationZ">0 deg</span><br/>
87+
</div>
88+
<br/>
89+
6890
<h2 style="background-color: white; width: 300px;">Logs</h2>
6991
<div id="logging" style="background-color: white;"></div>
7092
</body>

samples/js/node/web/avatar/server.js

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,14 @@ app.post('/api/connectAvatar', async (req, res) => {
239239
background: {
240240
color: transparent_background ? '#00FF00FF' : background_color,
241241
image: { url: background_image_url }
242+
},
243+
scene: {
244+
zoom: 1.0,
245+
positionX: 0.0,
246+
positionY: 0.0,
247+
rotationX: 0.0,
248+
rotationY: 0.0,
249+
rotationZ: 0.0
242250
}
243251
}
244252
}
@@ -436,6 +444,34 @@ app.post('/api/stopSpeaking', async (req, res) => {
436444
res.status(200).send('Speaking stopped.')
437445
})
438446

447+
// The API route to update the avatar scene
448+
app.post('/api/updateScene', async (req, res) => {
449+
try {
450+
const client_id = req.headers['clientid']
451+
const scene_request = req.body
452+
const scene_config = {
453+
avatarScene: {
454+
zoom: scene_request.zoom,
455+
positionX: scene_request.positionX,
456+
positionY: scene_request.positionY,
457+
rotationX: scene_request.rotationX,
458+
rotationY: scene_request.rotationY,
459+
rotationZ: scene_request.rotationZ
460+
}
461+
}
462+
const client_context = client_contexts[client_id]
463+
const avatar_connection = client_context.speech_synthesizer_connection
464+
if (avatar_connection) {
465+
await avatar_connection.sendMessageAsync('synthesis.control', JSON.stringify(scene_config))
466+
res.status(200).send('Scene updated.')
467+
} else {
468+
res.status(400).send('Connection not available.')
469+
}
470+
} catch (error) {
471+
res.status(500).send(`Error: ${error.message}`)
472+
}
473+
})
474+
439475
// The API route for chat
440476
// It receives the user query and return the chat response.
441477
// It returns response in stream, which yields the chat response in chunks.
@@ -894,7 +930,6 @@ async function stopSpeakingInternal(client_id, skipClearingSpokenTextQueue) {
894930
const avatar_connection = client_context.speech_synthesizer_connection
895931
if (avatar_connection) {
896932
await avatar_connection.sendMessageAsync('synthesis.control', '{"action":"stop"}')
897-
avatar_connection.close()
898933
}
899934

900935
}

samples/js/node/web/avatar/static/js/basic.js

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,10 @@ function preparePeerConnection() {
119119
document.getElementById('stopSession').disabled = false
120120
document.getElementById('speak').disabled = false
121121
document.getElementById('configuration').hidden = true
122+
if (document.getElementById('photoAvatar').checked) {
123+
document.getElementById('photoAvatarSceneConfig').hidden = false
124+
window.resetPhotoAvatarScene()
125+
}
122126
}
123127

124128
if (peerConnection.iceConnectionState === 'disconnected' || peerConnection.iceConnectionState === 'failed') {
@@ -127,6 +131,9 @@ function preparePeerConnection() {
127131
document.getElementById('stopSession').disabled = true
128132
document.getElementById('startSession').disabled = false
129133
document.getElementById('configuration').hidden = false
134+
if (document.getElementById('photoAvatar').checked) {
135+
document.getElementById('photoAvatarSceneConfig').hidden = true
136+
}
130137
}
131138
}
132139

@@ -209,6 +216,9 @@ function connectToAvatarService(peerConnection) {
209216
} else {
210217
document.getElementById('startSession').disabled = false;
211218
document.getElementById('configuration').hidden = false;
219+
if (document.getElementById('photoAvatar').checked) {
220+
document.getElementById('photoAvatarSceneConfig').hidden = true
221+
}
212222
throw new Error(`Failed connecting to the Avatar service: ${response.status} ${response.statusText}`)
213223
}
214224
})
@@ -369,6 +379,59 @@ window.updataTransparentBackground = () => {
369379
}
370380
}
371381

382+
window.updatePhotoAvatarScene = () => {
383+
const zoom = parseFloat(document.getElementById('sliderZoom').value)
384+
const positionX = parseFloat(document.getElementById('sliderPositionX').value)
385+
const positionY = parseFloat(document.getElementById('sliderPositionY').value)
386+
const rotationX = parseFloat(document.getElementById('sliderRotationX').value)
387+
const rotationY = parseFloat(document.getElementById('sliderRotationY').value)
388+
const rotationZ = parseFloat(document.getElementById('sliderRotationZ').value)
389+
390+
// Update the displayed values
391+
document.getElementById('valueZoom').textContent = zoom.toFixed() + '%'
392+
document.getElementById('valuePositionX').textContent = positionX.toFixed() + '%'
393+
document.getElementById('valuePositionY').textContent = positionY.toFixed() + '%'
394+
document.getElementById('valueRotationX').textContent = rotationX.toFixed() + ' deg'
395+
document.getElementById('valueRotationY').textContent = rotationY.toFixed() + ' deg'
396+
document.getElementById('valueRotationZ').textContent = rotationZ.toFixed() + ' deg'
397+
const sceneRequest = {
398+
zoom: zoom / 100,
399+
positionX: positionX / 100,
400+
positionY: positionY / 100,
401+
rotationX: rotationX * Math.PI / 180,
402+
rotationY: rotationY * Math.PI / 180,
403+
rotationZ: rotationZ * Math.PI / 180
404+
}
405+
406+
fetch('/api/updateScene', {
407+
method: 'POST',
408+
headers: {
409+
'ClientId': clientId,
410+
'Content-Type': 'application/json'
411+
},
412+
body: JSON.stringify(sceneRequest)
413+
})
414+
.then(response => {
415+
if (response.ok) {
416+
console.log(`[${new Date().toISOString()}] Scene updated successfully.`)
417+
} else {
418+
console.error(`[${new Date().toISOString()}] Failed to update scene. ${response.status} ${response.statusText}`)
419+
}
420+
})
421+
.catch(error => {
422+
console.error(`[${new Date().toISOString()}] Error updating scene: ${error}`)
423+
})
424+
}
425+
426+
window.resetPhotoAvatarScene = () => {
427+
document.getElementById('sliderZoom').value = 100.0
428+
document.getElementById('sliderPositionX').value = 0.0
429+
document.getElementById('sliderPositionY').value = 0.0
430+
document.getElementById('sliderRotationX').value = 0.0
431+
document.getElementById('sliderRotationY').value = 0.0
432+
document.getElementById('sliderRotationZ').value = 0.0
433+
}
434+
372435
window.onbeforeunload = () => {
373436
navigator.sendBeacon('/api/releaseClient', JSON.stringify({ clientId: clientId }))
374437
}

0 commit comments

Comments
 (0)