Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate video.py and capture.py for local hardware acceleration #570

Open
abrichr opened this issue Feb 29, 2024 · 5 comments · May be fixed by #585
Open

Consolidate video.py and capture.py for local hardware acceleration #570

abrichr opened this issue Feb 29, 2024 · 5 comments · May be fixed by #585
Assignees
Labels
enhancement New feature or request

Comments

@abrichr
Copy link
Contributor

abrichr commented Feb 29, 2024

Feature request

capture/_macos.py uses AVFoundation, capture/_windows.py uses screen_recorder_sdk which uses MediaFoundationAPI. These are likely to be more performant than mss used in record.py and video.py, but currently capture does not support extracting time aligned screenshots (while video does):

(openadapt-py3.10) abrichr@MacBook-Pro-4 OpenAdapt % ffprobe captures/2024-02-19-10-43-33.mov  
ffprobe version 6.1.1 Copyright (c) 2007-2023 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.1.0.2.5)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.1.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopenvino --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox
  libavutil      58. 29.100 / 58. 29.100
  libavcodec     60. 31.102 / 60. 31.102
  libavformat    60. 16.100 / 60. 16.100
  libavdevice    60.  3.100 / 60.  3.100
  libavfilter     9. 12.100 /  9. 12.100
  libswscale      7.  5.100 /  7.  5.100
  libswresample   4. 12.100 /  4. 12.100
  libpostproc    57.  3.100 / 57.  3.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7fd88a704b40] moov atom not found
captures/2024-02-19-10-43-33.mov: Invalid data found when processing input

This issue will be complete once we have modified these files to support saving video files recorded via openadapt.capture from which time-aligned screenshots can be extracted. i.e. we need to modify openadapt.capture._macos.Capture and openadapt.capture._windows.Capture to supply screenshots in memory instead of file, e.g. self.session.addOutput_(self.file_output).

Motivation

Local hardware acceleration -> maximum performance

@abrichr abrichr added the enhancement New feature or request label Feb 29, 2024
@abrichr
Copy link
Contributor Author

abrichr commented Feb 29, 2024

Via ChatGPT:

To replace self.session.addOutput_(self.file_output) with a mechanism that calls a callback with a screenshot in your macOS capture implementation, you would typically use AVCaptureVideoDataOutput instead of AVCaptureMovieFileOutput. AVCaptureVideoDataOutput allows you to receive video frames as they are captured, which you can then process in a callback method.

Here’s a conceptual outline on how to set this up:

  1. Use AVCaptureVideoDataOutput: This class provides a way to capture video frames as they are produced by the capture session.

  2. Set up a Delegate for Frame Capture: Implement a delegate that conforms to the AVCaptureVideoDataOutputSampleBufferDelegate protocol. This delegate will receive callbacks with the video frames.

  3. Implement the Callback Method: The delegate's callback method receives a CMSampleBufferRef that contains the frame data. You can then convert this sample buffer into a format suitable for your needs (e.g., a screenshot).

Step-by-Step Implementation

First, modify your Capture class to include an AVCaptureVideoDataOutput and set up the delegate:

from Foundation import NSObject, NSLog
import AVFoundation as AVF
from Quartz import CGMainDisplayID

class SampleBufferDelegate(NSObject):
    def captureOutput_didOutputSampleBuffer_fromConnection_(self, captureOutput, sampleBuffer, connection):
        # This method is called with a CMSampleBufferRef `sampleBuffer`
        # You can convert this to a screenshot here and call your desired callback
        NSLog("Received a frame")
        # Conversion to screenshot and callback call goes here

class Capture:
    def __init__(self):
        # Initialize as before...
        self.videoDataOutput = None
        self.videoDataOutputQueue = None
        self.sampleBufferDelegate = None

    def start(self, audio: bool = False, camera: bool = False):
        # Setup as before...
        
        # Setup video data output
        self.videoDataOutput = AVF.AVCaptureVideoDataOutput.alloc().init()
        self.videoDataOutputQueue = AVF.dispatch_queue_create("videoDataOutputQueue", None)
        self.sampleBufferDelegate = SampleBufferDelegate.alloc().init()
        
        self.videoDataOutput.setSampleBufferDelegate_queue_(self.sampleBufferDelegate, self.videoDataOutputQueue)
        
        if self.session.canAddOutput_(self.videoDataOutput):
            self.session.addOutput_(self.videoDataOutput)

Notes:

  • Conversion of CMSampleBufferRef: Converting a CMSampleBufferRef to a more usable format (like an image or numpy array) inside the captureOutput_didOutputSampleBuffer_fromConnection_ method requires additional steps. This might involve extracting the image buffer (CVImageBufferRef) and then converting it to your desired format. This process can be complex and might involve using additional libraries (like OpenCV) for image manipulation.

  • Performance Considerations: Processing video frames in real-time can be CPU-intensive. Ensure your callback and conversion logic is optimized for performance. Depending on your requirements, you might want to drop frames or reduce the resolution to maintain performance.

  • Threading: The callback method captureOutput_didOutputSampleBuffer_fromConnection_ is called on the dispatch queue specified when setting the sample buffer delegate. Make sure any UI updates or intensive processing tasks are dispatched back to the main thread or handled appropriately to avoid blocking the capture queue.

This approach allows you to intercept video frames as they are captured, enabling you to process and use them as screenshots within your application.

@abrichr
Copy link
Contributor Author

abrichr commented Feb 29, 2024

@0dm thoughts? 🙏 😄

@0dm
Copy link
Collaborator

0dm commented Feb 29, 2024

@0dm thoughts? 🙏 😄

This could work. I will look into implementing this sometime this week.

@abrichr
Copy link
Contributor Author

abrichr commented Feb 29, 2024

Regarding this:

Performance Considerations: Processing video frames in real-time can be CPU-intensive. Ensure your callback and conversion logic is optimized for performance. Depending on your requirements, you might want to drop frames or reduce the resolution to maintain performance.

See max_cpu_percent and related for an attempt to implement this https://github.com/OpenAdaptAI/OpenAdapt/pull/569/files#diff-57d8577d1fb5faaf576a6f5663741c83e672378c13c91a1db036fb7a3f05e067R559

@0dm 0dm linked a pull request Mar 6, 2024 that will close this issue
@abrichr
Copy link
Contributor Author

abrichr commented Mar 23, 2024

@Cody-DV for a Windows approach see:

https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/capture/_windows.py

https://github.com/Andrey1994/screen_recorder_sdk/blob/31417c8af136a7b8b44702e69fa0bb6ebb5c2b13/python/screen_recorder_sdk/screen_recorder.py

https://chat.openai.com/share/19cc37a0-750f-451a-95cf-acad27efb7b6

import cv2
import numpy as np
import time

from screen_recorder_sdk import screen_recorder

def capture_frames_in_memory(duration, fps):
    """
    Captures frames for a given duration and fps, and stores the video in memory.
    
    :param duration: Duration to capture video for in seconds
    :type duration: int
    :param fps: Frames per second
    :type fps: int
    """
    frame_interval = 1.0 / fps
    num_frames = int(duration * fps)

    # Initialize video capture parameters
    params = screen_recorder.RecorderParams()
    screen_recorder.init_resources(params)

    # Prepare the first screenshot to determine resolution
    image = screen_recorder.get_screenshot()
    frame = np.array(image)
    height, width, layers = frame.shape
    size = (width, height)

    # Initialize an in-memory video writer using OpenCV
    # FourCC is a 4-byte code used to specify the video codec. The list of available codes can be found in fourcc.org.
    # *'MP4V' is a codec that is compatible with MP4 files.
    fourcc = cv2.VideoWriter_fourcc(*'MP4V')
    video_writer = cv2.VideoWriter('appsrc ! videoconvert ! x264enc noise-reduction=10000 speed-preset=ultrafast tune=zerolatency ! mp4mux ! filesink location=video.mp4 ', fourcc, fps, size)

    start_time = time.time()
    for _ in range(num_frames):
        image = screen_recorder.get_screenshot()
        frame = np.array(image)
        video_writer.write(cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
        time.sleep(frame_interval)

    video_writer.release()
    screen_recorder.free_resources()

    elapsed_time = time.time() - start_time
    print(f"Capturing completed in {elapsed_time:.2f} seconds.")

# Example usage
if __name__ == "__main__":
    duration = 5  # seconds
    fps = 10
    capture_frames_in_memory(duration, fps)

We can replace the cv2 writer with what we have in https://github.com/OpenAdaptAI/OpenAdapt/blob/main/openadapt/video.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants