Sequential Execution of translator.predict() in Multithreaded Environment #444

lin-xiaosheng · 2024-05-06T07:11:23Z

Problem Description

When using the seamless_communication library and specifically its translator.predict() method, I've encapsulated the core inference logic into reusable interfaces. However, I've noticed that even after wrapping it up, multiple requests are not processed concurrently; instead, they execute sequentially, significantly impacting system throughput and response time.

Relevant Code Snippet

@app.route('/translate/s2st', methods=['POST'])
def translate_s2st():
    data = request.form
    audio_data = request.files.get('audio') 
    if audio_data is None:
        return jsonify({"error": "No audio data provided"}), 400
    
    source_language = data.get('source_language')
    target_language = data.get('target_language')
    
    if not all([audio_data, source_language, target_language]):
        return jsonify({"error": "Missing required parameters"}), 400
    
    try:
        waveform, _ = process_audio_stream(audio_data.stream)
        audio_tensor = waveform.to(device=device, dtype=dtype)
        
        source_language_code = LANGUAGE_NAME_TO_CODE[source_language]
        target_language_code = LANGUAGE_NAME_TO_CODE[target_language]

        out_texts, out_audios = translator.predict(
            input=audio_tensor, 
            task_str="S2ST",
            src_lang=source_language_code,
            tgt_lang=target_language_code,
        )
        out_text = str(out_texts[0])
        # out_wav = out_audios.audio_wavs[0].cpu().detach().numpy()
        
        wav_buffer = BytesIO()
        torchaudio.save(wav_buffer, out_audios.audio_wavs[0][0].to(torch.float32).cpu(), out_audios.sample_rate,format="mp3")
        wav_buffer.seek(0)  
        
        response = {
            "translated_text": out_text,
        }
        
        return send_file(
            wav_buffer,
            mimetype="audio/mp3",
            as_attachment=True,
            download_name=f"translated_audio.mp3",
        )
    
    except Exception as e:
        logging.error(f"Translation error: {str(e)}")
        return jsonify({"error": str(e)}), 500

Result: All tasks complete sequentially rather than concurrently.

Expected Behavior

I would expect to utilize parallel processing capabilities on multi-core GPU by running multiple translations concurrently.

Actual Results

Only one request is processed at a time, leading to increased overall execution time.

Can you please suggest modifications or configurations needed in my code so that predictor.translate() can handle concurrent translations effectively?

If you need any further information or clarification, feel free to reply. Looking forward to your assistance in optimizing our application performance!

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequential Execution of translator.predict() in Multithreaded Environment #444

Sequential Execution of translator.predict() in Multithreaded Environment #444

lin-xiaosheng commented May 6, 2024

Sequential Execution of translator.predict() in Multithreaded Environment #444

Sequential Execution of translator.predict() in Multithreaded Environment #444

Comments

lin-xiaosheng commented May 6, 2024

Problem Description

Relevant Code Snippet

Result: All tasks complete sequentially rather than concurrently.

Expected Behavior

Actual Results