Loading audio tensors fails: ValueError: all input arrays must have the same shape #1875

chrisammon3000 · 2024-03-06T20:13:45Z

Initial Checks

I have read and followed the docs and still think this is a bug

Description

I have created subclips of a video in .mp4 using ffmpeg (through moviepy):

# moviepy.video.io.ffmpeg_tools.ffmpeg_extract_subclip

def ffmpeg_extract_subclip(filename, t1, t2, targetname=None):
    """ Makes a new video file playing video file ``filename`` between
        the times ``t1`` and ``t2``. """
    name, ext = os.path.splitext(filename)
    if not targetname:
        T1, T2 = [int(1000*t) for t in [t1, t2]]
        targetname = "%sSUB%d_%d.%s" % (name, T1, T2, ext)
    
    cmd = [get_setting("FFMPEG_BINARY"),"-y",
           "-ss", "%0.2f"%t1,
           "-i", filename,
           "-t", "%0.2f"%(t2-t1),
           "-map", "0", "-vcodec", "copy", "-acodec", "copy", targetname]

    subprocess_call(cmd)

Output:

The subclip path is passed to VideoUrl:

subclip = VideoUrl("<subclip_path>")

Trying to load the tensors fails:

tensors = subclip.load()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 tensors = subclip.load()

File ~/Projects/chrisammon3000/experiments/docarray/docarray-test/.venv/lib/python3.11/site-packages/docarray/typing/url/video_url.py:96, in VideoUrl.load(self, **kwargs)
     33 """
     34 Load the data from the url into a `NamedTuple` of
     35 [`VideoNdArray`][docarray.typing.VideoNdArray],
   (...)
     93     [`NdArray`][docarray.typing.NdArray] of the key frame indices.
     94 """
     95 buffer = self.load_bytes(**kwargs)
---> 96 return buffer.load()

File ~/Projects/chrisammon3000/experiments/docarray/docarray-test/.venv/lib/python3.11/site-packages/docarray/typing/bytes/video_bytes.py:86, in VideoBytes.load(self, **kwargs)
     84     audio = parse_obj_as(AudioNdArray, np.array(audio_frames))
     85 else:
---> 86     audio = parse_obj_as(AudioNdArray, np.stack(audio_frames))
     88 video = parse_obj_as(VideoNdArray, np.stack(video_frames))
     89 indices = parse_obj_as(NdArray, keyframe_indices)

File ~/Projects/chrisammon3000/experiments/docarray/docarray-test/.venv/lib/python3.11/site-packages/numpy/core/shape_base.py:449, in stack(arrays, axis, out, dtype, casting)
    447 shapes = {arr.shape for arr in arrays}
    448 if len(shapes) != 1:
--> 449     raise ValueError('all input arrays must have the same shape')
    451 result_ndim = arrays[0].ndim + 1
    452 axis = normalize_axis_index(axis, result_ndim)

ValueError: all input arrays must have the same shape

Stepping through the code shows that the first audio frame has a sample rate of 16:

The second and all subsequent frames have 1024 samples:

So this results in arrays with different shapes for the audio.

What Ive tried:

I have tried adjusting the options for ffmpeg like converting to AAC ad specifying audio channels and it does fix the problem, however it takes about 10 times longer to create the subclips.
Using a preprocessing step to pad the arrays before reading them into DocArray would require reading and writing each subclip again

If there is a way to handle the shape mismatch inside DocArray that would be great because it would let me create the subclips and model them as quickly as possible. It would need to be added to this block:

docarray/docarray/typing/bytes/video_bytes.py

Lines 83 to 86 in f71a5e6

 if len(audio_frames) == 0: 

 audio = parse_obj_as(AudioNdArray, np.array(audio_frames)) 

 else: 

 audio = parse_obj_as(AudioNdArray, np.stack(audio_frames))

Example Code

import os
from pathlib import Path
import numpy as np
from docarray.typing import VideoUrl
from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip


def generate_subclips(parent_path, video_id, video_uri, video_duration, duration=60):
    subclips_path = Path(parent_path) / "subclips"
    subclips_path.mkdir(exist_ok=True)

    start_times = np.arange(0, video_duration, duration)
    end_times = np.append(start_times[1:], video_duration)
    clip_times = list(zip(start_times, end_times))

    for start_time, end_time in clip_times:
        # filename should have start_end seconds as part of the name
        output_file_path = subclips_path / f"{video_id}__{start_time}_{end_time}.{video_uri.suffix[1:]}"
        ffmpeg_extract_subclip(video_uri, start_time, end_time, targetname=output_file_path)

# Example usage
# parent_path = 'path/to/parent/directory'
# video_id = 'example_video_id'
# video_uri = Path('path/to/video.mp4')
# video_duration = 1200  # for example, 20 minutes
# generate_subclips(parent_path, video_id, video_uri, video_duration, duration=60)

def sort_key(path):
    """Sorts by the start time in the subclip file name
    For example: Fu7YkoRWKB8_Y__0_60.mp4 will sort by `0`
    """
    # Extract the integer after "__" from the filename
    return int(path.stem.split('__')[1].split('_')[0])

subclips_dir = Path(os.getcwd()).parent / "subclips"

# create subclips
generate_subclips(subclips_dir, <video_id>, <video_uri>, <video_duration>, duration=60)
subclips_paths = sorted(subclips_dir.iterdir(), key=sort_key)
video_urls = [VideoUrl(f"{str(subclip)}") for subclip in subclips_paths]

# load tensors
# the first subclip might work...
subclip0 = VideoUrl(str(subclips_paths[0]))
subclip0_tensors = subclip.load()

# but the second and other subclips throw the shape mismatch error
subclip1 = VideoUrl(str(subclips_paths[1]))
subclip1_tensors = subclip.load()

Python, DocArray & OS Version

0.40.0

Affected Components

The text was updated successfully, but these errors were encountered:

chrisammon3000 · 2024-03-06T20:18:21Z

If this is something I could contribute please let me know.

JoanFM · 2024-03-06T20:58:36Z

@chrisammon3000 ,

If u have an idea of how this could be contributed, it would definitely be great

chrisammon3000 · 2024-04-03T20:55:05Z

@JoanFM Created a pull request for this (#1880). The fix applies to audio from video only, since sometimes tools like FFMPEG downsample blank frames when they create subclips, which is what I was doing when I ran into this error. Please let me know if I should resolve the failed check for signed commits in the PR.

JoanFM · 2024-04-04T12:31:10Z

@JoanFM Created a pull request for this (#1880). The fix applies to audio from video only, since sometimes tools like FFMPEG downsample blank frames when they create subclips, which is what I was doing when I ran into this error. Please let me know if I should resolve the failed check for signed commits in the PR.

Hey @chrisammon3000 ,

Nice to see the contribution.

I already added some comments there

chrisammon3000 changed the title ~~Loading audio tensors fails: ValueError: all input arrays must have the same shape~~ Loading audio tensors fails: ValueError: all input arrays must have the same shape Mar 6, 2024

chrisammon3000 mentioned this issue Apr 3, 2024

fix: pad audio arrays to same shape if sample rates differ #1880

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading audio tensors fails: ValueError: all input arrays must have the same shape #1875

Loading audio tensors fails: ValueError: all input arrays must have the same shape #1875

chrisammon3000 commented Mar 6, 2024 •

edited

chrisammon3000 commented Mar 6, 2024

JoanFM commented Mar 6, 2024

chrisammon3000 commented Apr 3, 2024

JoanFM commented Apr 4, 2024

Loading audio tensors fails: ValueError: all input arrays must have the same shape #1875

Loading audio tensors fails: ValueError: all input arrays must have the same shape #1875

Comments

chrisammon3000 commented Mar 6, 2024 • edited

Initial Checks

Description

Example Code

Python, DocArray & OS Version

Affected Components

chrisammon3000 commented Mar 6, 2024

JoanFM commented Mar 6, 2024

chrisammon3000 commented Apr 3, 2024

JoanFM commented Apr 4, 2024

chrisammon3000 commented Mar 6, 2024 •

edited