Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequential Execution of translator.predict() in Multithreaded Environment #444

Open
lin-xiaosheng opened this issue May 6, 2024 · 0 comments

Comments

@lin-xiaosheng
Copy link

Problem Description

When using the seamless_communication library and specifically its translator.predict() method, I've encapsulated the core inference logic into reusable interfaces. However, I've noticed that even after wrapping it up, multiple requests are not processed concurrently; instead, they execute sequentially, significantly impacting system throughput and response time.

Relevant Code Snippet

@app.route('/translate/s2st', methods=['POST'])
def translate_s2st():
    data = request.form
    audio_data = request.files.get('audio') 
    if audio_data is None:
        return jsonify({"error": "No audio data provided"}), 400
    
    source_language = data.get('source_language')
    target_language = data.get('target_language')
    
    if not all([audio_data, source_language, target_language]):
        return jsonify({"error": "Missing required parameters"}), 400
    
    try:
        waveform, _ = process_audio_stream(audio_data.stream)
        audio_tensor = waveform.to(device=device, dtype=dtype)
        
        source_language_code = LANGUAGE_NAME_TO_CODE[source_language]
        target_language_code = LANGUAGE_NAME_TO_CODE[target_language]

        out_texts, out_audios = translator.predict(
            input=audio_tensor, 
            task_str="S2ST",
            src_lang=source_language_code,
            tgt_lang=target_language_code,
        )
        out_text = str(out_texts[0])
        # out_wav = out_audios.audio_wavs[0].cpu().detach().numpy()
        
        wav_buffer = BytesIO()
        torchaudio.save(wav_buffer, out_audios.audio_wavs[0][0].to(torch.float32).cpu(), out_audios.sample_rate,format="mp3")
        wav_buffer.seek(0)  
        
        response = {
            "translated_text": out_text,
        }
        
        return send_file(
            wav_buffer,
            mimetype="audio/mp3",
            as_attachment=True,
            download_name=f"translated_audio.mp3",
        )
    
    except Exception as e:
        logging.error(f"Translation error: {str(e)}")
        return jsonify({"error": str(e)}), 500

Result: All tasks complete sequentially rather than concurrently.

Expected Behavior

I would expect to utilize parallel processing capabilities on multi-core GPU by running multiple translations concurrently.

Actual Results

Only one request is processed at a time, leading to increased overall execution time.

Can you please suggest modifications or configurations needed in my code so that predictor.translate() can handle concurrent translations effectively?

If you need any further information or clarification, feel free to reply. Looking forward to your assistance in optimizing our application performance!

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant