离线音频识别完成后，内存没有释放 #1808

ryancurry-mz · 2024-06-12T05:53:34Z

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

麻烦大佬有空时能解答解答，十分感谢。

基于/runtime/python/http/server.py的代码做了简单的修改，具体的代码会在下面贴出。
存在问题：识别多个离线音频后，内存没有释放，最后导致内存被打满。

Code sample

import argparse
import logging
import os
import uuid
import gc

import aiofiles
import ffmpeg
import uvicorn
from fastapi import FastAPI, File, UploadFile
from modelscope.utils.logger import get_logger

from funasr import AutoModel
from itn.chinese.inverse_normalizer import InverseNormalizer

logger = get_logger(log_level=logging.INFO)
logger.setLevel(logging.INFO)

parser = argparse.ArgumentParser()
parser.add_argument(
    "--host", type=str, default="0.0.0.0", required=False, help="host ip, localhost, 0.0.0.0"
)
parser.add_argument("--port", type=int, default=8000, required=False, help="server port")
parser.add_argument(
    "--asr_model",
    type=str,
    # default="paraformer-zh",
    default="/soft/FunASR/model/speech_paraformer-large-vad-punc-spk_asr_nat-zh-cn",
    help="asr model from https://github.com/alibaba-damo-academy/FunASR?tab=readme-ov-file#model-zoo",
)
parser.add_argument("--asr_model_revision", type=str, default="v2.0.4", help="")
parser.add_argument(
    "--vad_model",
    type=str,
    # default="fsmn-vad",
    default="/soft/FunASR/model/speech_fsmn_vad_zh-cn-16k-common-pytorch/",
    help="vad model from https://github.com/alibaba-damo-academy/FunASR?tab=readme-ov-file#model-zoo",
)
parser.add_argument("--vad_model_revision", type=str, default="v2.0.4", help="")
parser.add_argument(
    "--punc_model",
    type=str,
    # default="ct-punc-c",
    default="/soft/FunASR/model/punc_ct-transformer_cn-en-common-vocab471067-large/",
    help="model from https://github.com/alibaba-damo-academy/FunASR?tab=readme-ov-file#model-zoo",
)
parser.add_argument("--punc_model_revision", type=str, default="v2.0.4", help="")

# 说话人识别/分割
parser.add_argument("--spk_model_revision", type=str, default="v2.0.4", help="")
parser.add_argument(
    "--spk_model",
    type=str,
    # default="cam++",
    default="/soft/FunASR/model/speech_campplus_sv_zh-cn_16k-common/",
    help="model from https://github.com/alibaba-damo-academy/FunASR?tab=readme-ov-file#model-zoo",
)

parser.add_argument("--ngpu", type=int, default=0, help="0 for cpu, 1 for gpu")
parser.add_argument("--device", type=str, default="cpu", help="cuda, cpu")
parser.add_argument("--ncpu", type=int, default=4, help="cpu cores")
parser.add_argument(
    "--hotword_path",
    type=str,
    default="hotwords.txt",
    help="hot word txt path, only the hot word model works",
)
parser.add_argument("--certfile", type=str, default=None, required=False, help="certfile for ssl")
parser.add_argument("--keyfile", type=str, default=None, required=False, help="keyfile for ssl")
parser.add_argument("--temp_dir", type=str, default="temp_dir/", required=False, help="temp dir")
args = parser.parse_args()
logger.info("-----------  Configuration Arguments -----------")
for arg, value in vars(args).items():
    logger.info("%s: %s" % (arg, value))
logger.info("------------------------------------------------")

os.makedirs(args.temp_dir, exist_ok=True)

logger.info("model loading")
# load funasr model
model = AutoModel(
    model=args.asr_model,
    model_revision=args.asr_model_revision,
    vad_model=args.vad_model,
    vad_model_revision=args.vad_model_revision,
    punc_model=args.punc_model,
    punc_model_revision=args.punc_model_revision,
    spk_model=args.spk_model,
    spk_model_revision=args.spk_model_revision,
    ngpu=args.ngpu,
    ncpu=args.ncpu,
    device=args.device,
    disable_pbar=True,
    disable_log=True,
)
logger.info("loaded models!")

app = FastAPI(title="FunASR")

param_dict = {"sentence_timestamp": False, "batch_size_s": 50}
if args.hotword_path is not None and os.path.exists(args.hotword_path):
    with open(args.hotword_path, "r", encoding="utf-8") as f:
        lines = f.readlines()
        lines = [line.strip() for line in lines]
    hotword = " ".join(lines)
    logger.info(f"热词：{hotword}")
    param_dict["hotword"] = hotword


@app.post("/recognition")
async def api_recognition(audio: UploadFile = File(..., description="audio file")):
    suffix = audio.filename.split(".")[-1]
    audio_path = f"{args.temp_dir}/{str(uuid.uuid1())}.{suffix}"
    async with aiofiles.open(audio_path, "wb") as out_file:
        content = await audio.read()
        await out_file.write(content)
    try:
        audio_bytes, _ = (
            ffmpeg.input(audio_path, threads=0)
            .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=16000)
            .run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
        )
    except Exception as e:
        logger.error(f"读取音频文件发生错误，错误信息：{e}")
        return {"msg": "读取音频文件发生错误", "code": 1}
    rec_results = model.generate(input=audio_bytes, is_final=True, **param_dict)
    logger.error(f"识别结果rec_results：{rec_results}")
    # 结果为空
    if len(rec_results) == 0:
        return {"text": "", "sentences": [], "code": 0}
    elif len(rec_results) == 1:
        # 解析识别结果
        rec_result = rec_results[0]
        # text = rec_result["text"]
        # 文本逆正则化
        invnormalizer = InverseNormalizer(cache_dir="/soft/FunASR/model/fst_itn_zh/")
        text = invnormalizer.normalize(rec_result["text"])
        sentences = []
        for sentence in rec_result["sentence_info"]:
            # 每句话的时间戳
            sentences.append(
                {"spk": sentence["spk"],
                 "text": invnormalizer.normalize(sentence["text"]),
                 "start": sentence["start"],
                 "end": sentence["end"]}
            )
        ret = {"text": text, "sentences": sentences, "code": 0}
        logger.info(f"识别结果：{ret}")
        # 强制进行垃圾回收
        gc.collect()
        return ret
    else:
        logger.info(f"识别结果：{rec_results}")
        return {"msg": "未知错误", "code": -1}


if __name__ == "__main__":
    uvicorn.run(
        app, host=args.host, port=args.port, ssl_keyfile=args.keyfile, ssl_certfile=args.certfile
    )

Expected behavior

每次识别完成后，内存应该被释放，而不是一直增长。

Environment

OS (e.g., Linux): centos7.9
FunASR Version (e.g., 1.0.0): 1.0.27
ModelScope Version (e.g., 1.11.0): 1.14.0
PyTorch Version (e.g., 2.0.0): 2.3.0
How you installed funasr (pip, source): pip
Python version:3.9.19
GPU (e.g., V100M32)
CUDA/cuDNN version (e.g., cuda11.7):
Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
Any other relevant information:

Additional context

我用的是CPU版本，虚拟机配置是4核8G，我通过上面的server.py启动需要20-30分钟，识别1分钟双人对话离线音频需要3分钟，不知道是不是配置太低的原因。

The text was updated successfully, but these errors were encountered:

yeyupiaoling · 2024-07-02T11:27:26Z

@ryancurry-mz 你好，我这边测试了一下，使用的是项目中原本的代码，测试的音频是1分42秒的，使用GPU推理时间为1.5秒。使用CPU推理时间为7.6秒，推理时间是正常的，应该是你的设备问题。

启动server.py我这边不超过10秒

另外我重复推理请100次，无论是使用GPU还是CPU，内存都没有变化，并没有你说的内存不断增长，你要排除其他代码的影响。

yeyupiaoling · 2024-07-02T11:30:45Z

@ryancurry-mz 你是林外修改了代码吧，你检查下是不是你添加哪些代码影响的

ryancurry-mz · 2024-07-03T01:09:07Z

@ryancurry-mz 你是林外修改了代码吧，你检查下是不是你添加哪些代码影响的

我在原本server.py的基础上修改了几处地方，添加了说话人识别和逆文本正则化，不确定是否是这里的影响，我再检查下。感谢大佬回复！

# 说话人识别/分割
parser.add_argument("--spk_model_revision", type=str, default="v2.0.4", help="")
parser.add_argument(
    "--spk_model",
    type=str,
    # default="cam++",
    default="/soft/FunASR/model/speech_campplus_sv_zh-cn_16k-common/",
    help="model from https://github.com/alibaba-damo-academy/FunASR?tab=readme-ov-file#model-zoo",
)

# 文本逆正则化
        invnormalizer = InverseNormalizer(cache_dir="/soft/FunASR/model/fst_itn_zh/")
        text = invnormalizer.normalize(rec_result["text"])

secslim · 2024-08-13T03:17:13Z

@ryancurry-mz 你好，我这边测试了一下，使用的是项目中原本的代码，测试的音频是1分42秒的，使用GPU推理时间为1.5秒。使用CPU推理时间为7.6秒，推理时间是正常的，应该是你的设备问题。

启动server.py我这边不超过10秒

另外我重复推理请100次，无论是使用GPU还是CPU，内存都没有变化，并没有你说的内存不断增长，你要排除其他代码的影响。

您好，我使用同样的代码（FunASR/runtime/python/http/server.py），代码中删除了ffmpeg把音频文件读成字节流，直接传的音频文件
async with aiofiles.open(audio_path, "wb") as out_file:
content = await audio.read()
await out_file.write(content)
rec_results = model.generate(audio_path, is_final=True, **param_dict)
服务启动时，显存占用1682MiB，使用10Min大小的音频文件访问后，显存最大占用7252MiB，推理完成后显存占用5186MiB，第二次使用相同的文件访问，显存最大占用7458MiB，推理完成后显存占用5252MiB，显存一直在增加，并未完全释放，请问您是如何测试的？

yeyupiaoling · 2024-08-14T14:11:07Z

@secslim 启动的时候，没有推理，显存是占用比较低。之后推理增加是正常的，你有没有一直往上升？上面的开发者是显存一直往上升，直到显存不足。应该是cam++模型的原因

secslim · 2024-08-14T14:28:15Z

感谢您的回复，我的显存并没有一直往上加，推理之前占用2674MiB，推理之后占用5254MiB，之后一直保持5254MiB 发自我的iPhone

…

------------------ 原始邮件 ------------------ 发件人: 夜雨飘零 ***@***.***> 发送时间: 2024年8月14日 22:11 收件人: modelscope/FunASR ***@***.***> 抄送: secslim ***@***.***>, Mention ***@***.***> 主题: Re: [modelscope/FunASR] 离线音频识别完成后，内存没有释放 (Issue #1808) @secslim 启动的时候，没有推理，显存是占用比较低。之后推理增加是正常的，你有没有一直往上升？上面的开发者是显存一直往上升，直到显存不足。应该是cam++模型的原因 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

zeminroot · 2024-09-29T09:16:57Z

@secslim启动的时候，没有推理，显存是占用比较低。之后推理增加是正常的，你有没有一直往上升？上面的开发者是显存一直往上升，直到显存不足。应该是cam++模型的原因

您好，我也有同样的问题，使用了说话人分割模型cam++后，内存无法释放每次识别音频后内存不断增加

LauraGPT · 2024-09-29T09:18:36Z

@secslim启动的时候，没有推理，显存是占用比较低。之后推理增加是正常的，你有没有一直往上升？上面的开发者是显存一直往上升，直到显存不足。应该是cam++模型的原因

您好，我也有同样的问题，使用了说话人分割模型cam++后，内存无法释放每次识别音频后内存不断增加

funasr版本是1.1.8么

zeminroot · 2024-09-29T09:19:51Z

@secslim启动的时候，没有推理，显存是占用比较低。之后推理增加是正常的，你有没有一直往上升？上面的开发者是显存一直往上升，直到显存不足。应该是cam++模型的原因

您好，我也有同样的问题，使用了说话人分割模型cam++后，内存无法释放锁识别音频后内存不断增加

funasr版本是1.1.8吗

funasr 1.1.8是的

zeminroot · 2024-09-29T09:23:21Z

@secslim启动的时候，没有推理，显存是占用比较低。之后推理增加是正常的，你有没有一直往上升？上面的开发者是显存一直往上升，直到显存不足。应该是cam++模型的原因

您好，我也有同样的问题，使用了说话人分割模型cam++后，内存无法释放锁识别音频后内存不断增加

funasr版本是1.1.8吗

是的1.1.8版本请问您有解决办法吗

LauraGPT · 2024-09-30T17:39:30Z

那你先把cam++模型注释掉，看看这个现象还有么？

ryancurry-mz added the bug Something isn't working label Jun 12, 2024

LauraGPT assigned yeyupiaoling Jun 14, 2024

yeyupiaoling closed this as completed Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

离线音频识别完成后，内存没有释放 #1808

离线音频识别完成后，内存没有释放 #1808

ryancurry-mz commented Jun 12, 2024 •

edited

Loading

yeyupiaoling commented Jul 2, 2024

yeyupiaoling commented Jul 2, 2024

ryancurry-mz commented Jul 3, 2024 •

edited

Loading

secslim commented Aug 13, 2024 •

edited

Loading

yeyupiaoling commented Aug 14, 2024

secslim commented Aug 14, 2024 via email

zeminroot commented Sep 29, 2024

LauraGPT commented Sep 29, 2024

zeminroot commented Sep 29, 2024

zeminroot commented Sep 29, 2024

LauraGPT commented Sep 30, 2024

离线音频识别完成后，内存没有释放 #1808

离线音频识别完成后，内存没有释放 #1808

Comments

ryancurry-mz commented Jun 12, 2024 • edited Loading

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

yeyupiaoling commented Jul 2, 2024

yeyupiaoling commented Jul 2, 2024

ryancurry-mz commented Jul 3, 2024 • edited Loading

secslim commented Aug 13, 2024 • edited Loading

yeyupiaoling commented Aug 14, 2024

secslim commented Aug 14, 2024 via email

zeminroot commented Sep 29, 2024

LauraGPT commented Sep 29, 2024

zeminroot commented Sep 29, 2024

zeminroot commented Sep 29, 2024

LauraGPT commented Sep 30, 2024

ryancurry-mz commented Jun 12, 2024 •

edited

Loading

ryancurry-mz commented Jul 3, 2024 •

edited

Loading

secslim commented Aug 13, 2024 •

edited

Loading