Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does frigate support INTEL-GPU-SRIOV? #14840

Open
fay000fay opened this issue Nov 7, 2024 · 14 comments
Open

Does frigate support INTEL-GPU-SRIOV? #14840

fay000fay opened this issue Nov 7, 2024 · 14 comments
Labels
enhancement New feature or request

Comments

@fay000fay
Copy link

fay000fay commented Nov 7, 2024

I am using an n100 CPU, running LXC in PVE 8.2.7, and deploying frigate in a docker with version 0.14.1. I have enabled the CPU's SRIOV function and successfully virtualized 4 virtual GPUs. However, frigate encountered an error,However, if I disable SRIOV, frigate can access the GPU normally. This is very strange, could it be that frigate is not compatible with SRIOV?

ERROR : Unable to poll intel GPU stats: Failed to detect engines! (No such file or directory)
2024-11-06 13:30:54.245952251 (Kernel 4.16 or newer is required for i915 PMU support.)

Here is my LXC configuration file:

arch: amd64
cores: 2
features: fuse=1,mknod=1,nesting=1
hostname: frigate
memory: 4096
mp0: /mnt,mp=/mnt
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.100.3,hwaddr=BC:24:11:FE:EE:3A,ip=192.168.100.171/24,type=veth
onboot: 0
ostype: alpine
rootfs: local:101/vm-101-disk-0.raw,size=10G
swap: 512
lxc.mount.entry: /dev/dri/card0 dev/dri/card0 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:

Here is my docker-compose.yaml file:

version: "3.9"
services:
  frigate:
    container_name: frigate
    privileged: true # this may not be necessary for all setups
    restart: unless-stopped
    image: ghcr.io/blakeblackshear/frigate:stable
    shm_size: "256mb"
    devices:
      - /dev/dri/renderD128:/dev/dri/renderD128
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /mnt/frigate/config:/config
      - /mnt/frigate/video:/media/frigate
      - type: tmpfs
        target: /tmp/cache
        tmpfs:
          size: 1000000000
    ports:
      - "8971:8971"
      - "1984:1984"
      - "9150:5000"
      - "8554:8554"
      - "8555:8555/tcp"

Here is the log of my frigate:

2024-11-06 13:30:37.630971506 [INFO] Preparing Frigate...
2024-11-06 13:30:37.654593663 [INFO] Starting Frigate...
2024-11-06 13:30:39.299050768 [2024-11-06 13:30:39] frigate.app INFO : Starting Frigate (0.14.1-f4f3cfa)
2024-11-06 13:30:39.299353049 [2024-11-06 13:30:39] frigate.util.config INFO : Checking if frigate config needs migration...
2024-11-06 13:30:39.339558101 [2024-11-06 13:30:39] frigate.util.config INFO : frigate config does not need migration...
2024-11-06 13:30:46.753884993 [2024-11-06 13:30:46] peewee_migrate.logs INFO : Starting migrations
2024-11-06 13:30:46.754131689 [2024-11-06 13:30:46] peewee_migrate.logs INFO : There is nothing to migrate
2024-11-06 13:30:46.763016435 [2024-11-06 13:30:46] frigate.app INFO : Recording process started: 345
2024-11-06 13:30:46.767461716 [2024-11-06 13:30:46] frigate.app INFO : Recording process started: 347
2024-11-06 13:30:46.769435421 [2024-11-06 13:30:46] frigate.app INFO : go2rtc process pid: 101
2024-11-06 13:30:46.790650155 [2024-11-06 13:30:46] detector.ov INFO : Starting detection process: 374
2024-11-06 13:30:46.792243381 [2024-11-06 13:30:46] frigate.app INFO : Output process started: 376
2024-11-06 13:30:46.836122486 [2024-11-06 13:30:46] frigate.app INFO : Camera processor started for laojiaoutdoor: 394
2024-11-06 13:30:46.844010821 [2024-11-06 13:30:46] frigate.app INFO : Capture process started for laojiaoutdoor: 395
2024-11-06 13:30:54.245948418 [2024-11-06 13:30:54] frigate.util.services ERROR : Unable to poll intel GPU stats: Failed to detect engines! (No such file or directory)
2024-11-06 13:30:54.245952251 (Kernel 4.16 or newer is required for i915 PMU support.)

2024-11-06 13:30:54.245954102 timeout: the monitored command dumped core
2024-11-07 09:01:20.132814 2024-11-06 13:30:54.245955295

微信截图_20241107091122

@fay000fay fay000fay added the enhancement New feature or request label Nov 7, 2024
@fay000fay fay000fay changed the title does frigate support INTEL-GPU-SRIOV? Does frigate support INTEL-GPU-SRIOV? Nov 7, 2024
@NickM-27
Copy link
Collaborator

NickM-27 commented Nov 7, 2024

Your logs don't show any camera errors so looks like it is working

@fay000fay
Copy link
Author

Your logs don't show any camera errors so looks like it is working

there is one error below, and GPU is not working
2024-11-06 13:30:54.245948418 [2024-11-06 13:30:54] frigate.util.services ERROR : Unable to poll intel GPU stats: Failed to detect engines! (No such file or directory)
2024-11-06 13:30:54.245952251 (Kernel 4.16 or newer is required for i915 PMU support.)

@NickM-27
Copy link
Collaborator

NickM-27 commented Nov 7, 2024

Right, there's nothing wrong just means stats don't work. Not sure how we'd fix that exactly

@guoqiang881245
Copy link

I also found the same problem. ERROR: Unable to poll intel GPU stats: Failed to detect engines! (No such file or directory), did not seem to support the sriov CPU, did not find a solution for two days, and finally gave up, do not use the sriov driver.

@guoqiang881245
Copy link

The camera is working properly, the detector cannot be used, and even if it can use the CPU, it is very expensive

@fay000fay
Copy link
Author

The camera is working properly, the detector cannot be used, and even if it can use the CPU, it is very expensive

是的。我也暂时放弃了。不过我SRIOV还是开着,虚拟核显分给飞牛NAS影视用来解码用是可以的,在飞牛NAS里装了个Intel驱动,就成功调用虚拟GPU了。只是Frigate无论装在LXC,还是虚拟机,还是HA的addon,只要是虚拟的,一律无法调用,切换到物理GPU后,就可以了。

@guoqiang881245
Copy link

是的,没什么问题,只是意味着统计数据不起作用。不知道我们该如何解决这个问题

The camera is not a problem, the object detector can not call the GPU

@anishp55
Copy link

I think this is straight forward. You are running inside of an LXC container, the GPU can be shared without SRIVO. I believe docker only supports srvio for network stack and nvidia gpu. What is the end goal? Also it looks like the LXC is running unprivileged, maybe try it as privileged first to see if that helps.

@fay000fay
Copy link
Author

I think this is straight forward. You are running inside of an LXC container, the GPU can be shared without SRIVO. I believe docker only supports srvio for network stack and nvidia gpu. What is the end goal? Also it looks like the LXC is running unprivileged, maybe try it as privileged first to see if that helps.

Sorry, I did misunderstand the relationship between SR-IOV and LXC earlier. My goal is to allow the LXC container to use the integrated GPU, while also enabling Docker containers to utilize the GPU via SR-IOV.

The issue I’m facing is that if I disable SR-IOV, the LXC container can access the GPU without problems (to be precise, I can see the GPU working in Frigate). However, once I enable SR-IOV, Frigate shows the following error: “Unable to poll Intel GPU stats: Failed to detect engines! (No such file or directory).”

@anishp55
Copy link

Can you share how you're doing this? I have a n100 system that I can try it on as well

@fay000fay
Copy link
Author

Can you share how you're doing this? I have a n100 system that I can try it on as well

I followed this tutorial (https://v2rayssr.com/pve.html) to enable SR-IOV, and the steps to create the LXC container and install Frigate are just the official methods and code—nothing special. You can check the .conf file and Frigate configuration I mentioned earlier.

I’ve installed Intel GPU Tools in my PVE. When SR-IOV is enabled, the intel_gpu_top command shows the message: “Kernel 4.16 or newer is required for i915 PMU support.” However, if I use intel_gpu_top -d sriov, it does display the GPU activity window, and I can see that card0 is working, which is quite strange.

So, what I can’t figure out is: if the GPU is actually working, why does Frigate show the error “Unable to poll Intel GPU stats: Failed to detect engines! (No such file or directory)”, and why does it also show “Kernel 4.16 or newer is required for i915 PMU support”?

@paulcoates
Copy link

Right, there's nothing wrong just means stats don't work. Not sure how we'd fix that exactly

I run a similar setup to the original post (sr-iov in a proxmox lxc) and have the same issue, I think we could get the stats if the hwaccel device could be passed as a parameter to intel_gpu_top.

At the moment, frigate/util/services.py#L294-L303 looks like it calls the equivalent of intel_gpu_tool -J -o - -s 1. When I run this in my container I get an error:

root@frigate-2:~# intel_gpu_top -J -o - -s 1
Failed to detect engines! (No such file or directory)
(Kernel 4.16 or newer is required for i915 PMU support.)

However, if I specify the device with intel_gpu_top -J -o - -s 1 -d drm:/dev/dri/card0 the stats stream by in JSON as expected.

@anishp55
Copy link

anishp55 commented Dec 2, 2024

Can you share how you're doing this? I have a n100 system that I can try it on as well

I followed this tutorial (https://v2rayssr.com/pve.html) to enable SR-IOV, and the steps to create the LXC container and install Frigate are just the official methods and code—nothing special. You can check the .conf file and Frigate configuration I mentioned earlier.

I’ve installed Intel GPU Tools in my PVE. When SR-IOV is enabled, the intel_gpu_top command shows the message: “Kernel 4.16 or newer is required for i915 PMU support.” However, if I use intel_gpu_top -d sriov, it does display the GPU activity window, and I can see that card0 is working, which is quite strange.

So, what I can’t figure out is: if the GPU is actually working, why does Frigate show the error “Unable to poll Intel GPU stats: Failed to detect engines! (No such file or directory)”, and why does it also show “Kernel 4.16 or newer is required for i915 PMU support”?

I tried this driver in my system, unfortunately it did not create the new devices so I am not able to go any further. I might need to find a different firmware so I can enable some more options

@fay000fay
Copy link
Author

Can you share how you're doing this? I have a n100 system that I can try it on as well您能分享一下您是如何做到这一点的吗?我有一个 n100 系统,我也可以尝试一下

I followed this tutorial (https://v2rayssr.com/pve.html) to enable SR-IOV, and the steps to create the LXC container and install Frigate are just the official methods and code—nothing special. You can check the .conf file and Frigate configuration I mentioned earlier.我按照这个教程( https://v2rayssr.com/pve.html )启用SR-IOV,创建LXC容器和安装Frigate的步骤只是官方的方法和代码,没有什么特别的。你可以检查我之前提到的 .conf 文件和 Frigate 配置。
I’ve installed Intel GPU Tools in my PVE. When SR-IOV is enabled, the intel_gpu_top command shows the message: “Kernel 4.16 or newer is required for i915 PMU support.” However, if I use intel_gpu_top -d sriov, it does display the GPU activity window, and I can see that card0 is working, which is quite strange.我已在 PVE 中安装了英特尔 GPU 工具。启用 SR-IOV 后,intel_gpu_top 命令会显示消息:“i915 PMU 支持需要 Kernel 4.16 或更高版本。”但是,如果我使用 intel_gpu_top -d sriov,它确实会显示 GPU 活动窗口,并且我可以看到 card0 正在工作,这很奇怪。
So, what I can’t figure out is: if the GPU is actually working, why does Frigate show the error “Unable to poll Intel GPU stats: Failed to detect engines! (No such file or directory)”, and why does it also show “Kernel 4.16 or newer is required for i915 PMU support”?所以,我不明白的是:如果 GPU 确实在工作,为什么 Frigate 会显示错误“无法轮询 Intel GPU 统计信息:无法检测引擎! (没有这样的文件或目录)”,为什么还显示“i915 PMU 支持需要 Kernel 4.16 或更高版本”?

I tried this driver in my system, unfortunately it did not create the new devices so I am not able to go any further. I might need to find a different firmware so I can enable some more options我在我的系统中尝试了这个驱动程序,不幸的是它没有创建新设备,所以我无法继续。我可能需要找到不同的固件,以便启用更多选项

Thank you for your quick response. I look forward to the good news soon. Your reply means a lot to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants