Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out SPX (Speech) Audio format #2

Closed
Tracked by #13
LazyDuchess opened this issue Oct 22, 2022 · 12 comments
Closed
Tracked by #13

Figure out SPX (Speech) Audio format #2

LazyDuchess opened this issue Oct 22, 2022 · 12 comments
Labels
help wanted Extra attention is needed

Comments

@LazyDuchess
Copy link
Owner

I have no experience figuring out audio formats, so this is something I will need help with.
In the TSData/Res/Sound folder there are Voice1, Voice2, etc. packages. Inside of them there are audio files that begin with a SPX1 magic number - these are the audio files left to figure out.

@LazyDuchess LazyDuchess added the help wanted Extra attention is needed label Oct 22, 2022
@ammaraskar
Copy link
Collaborator

Aah these look to be Speex https://www.speex.org/ files with a custom header instead of vorbis? Haven't tried decoding them yet but it looks like from the symbol names they use this official speex library in the game: https://www.speex.org/docs/api/speex-api-reference/globals.html

Notice the symbols like speex_wb_mode, speex_bits_init etc

@LazyDuchess
Copy link
Owner Author

Yup, I believe they're Speex with a custom header

@berylliumquestion berylliumquestion mentioned this issue Jan 24, 2023
5 tasks
@berylliumquestion
Copy link
Contributor

How's the progress on this? I'm trying to figure out what to do next

@LazyDuchess
Copy link
Owner Author

Haven't touched this yet, I believe there is Speex source code floating around if you want to take a look, but it's all C/C++ as far as I know. Also just a really obscure format nowadays.

@actioninja
Copy link

actioninja commented Jul 5, 2023

Been picking at this one. I have a start but it's still pretty gnarly. Getting a rough idea of what the header looks like but basically every field I'm like "maybe this?"
It looks like it's roughly:
4 byte magic number "1XPS", read as BE,
1 byte flag,
if flag is
2 bytes unknown

Unfortunately the main implementations are all some cpp nonsense so all the calls are behind vtables that I haven't worked out of the location of. Still getting a handle on how Ghidra works, and a plugin to resolve rtti wasn't working right. IDA was choking on it as well, giving binary ninja a shot.

The actual file seems to have the header followed by some kind of regular potentially padding data, then something that seems to be the actual speex payload. Once where the custom implementation falls out and it's just calling libspeex decode it should be fairly easy from there.

@LazyDuchess
Copy link
Owner Author

Been picking at this one. I have a start but it's still pretty gnarly. Getting a rough idea of what the header looks like but basically every field I'm like "maybe this?" It looks like it's roughly: 4 byte magic number "1XPS", read as BE, 1 byte flag, 4 bytes padding(?), 4 bytes either speex mode when flag is 0 or unknown when flag is 1 2 bytes unknown

Unfortunately the main implementations are all some cpp nonsense so all the calls are behind vtables that I haven't worked out of the location of. Still getting a handle on how Ghidra works, and a plugin to resolve rtti wasn't working right. IDA was choking on it as well, giving binary ninja a shot.

The actual file seems to have the header followed by some kind of regular potentially padding data, then something that seems to be the actual speex payload. Once where the custom implementation falls out and it's just calling libspeex decode it should be fairly easy from there.

Hey! Thanks for checking this out, happy to see some progress.

I should probably link to this somewhere, maybe in the readme, there's a MAC build of the Bon Voyage executable with debug symbols which might help as it reveals function and class names: Dropbox Link

@actioninja
Copy link

so turns out these aren't speex frames, they're some kind of further encoded audio frames that do some kind of nonsense before actually calling the speex frame decode. Fun.

Seems like it might just be 1 byte frame size followed by the speex frame? not sure.

@actioninja
Copy link

actioninja commented Jul 8, 2023

Tentatively saying I think I've got it, working on writing a tool to decode spx1 files to wav now. If that works, then this is correct, and the true test of it actually being accurate will be reencoding

header:
4 bytes: Magic Number (SPX1 in Little Endian)
1 byte: Always 1
4 bytes: data size of unencoded data, not actually used for decoding seems to be some kind of reference number similar to other s2 datatypes
4 bytes: Speex mode. read as a signed type.
2 bytes: largest speex frame, helps prevent reallocations when decoding because the same buffer is reused

payload:
arbitrary number
1 byte: frame size in bytes
(number of bytes specified by first bytes): speex frame, can be directly decoded with libspeex

@LazyDuchess
Copy link
Owner Author

awesome, should be straightforward to turn into unity audioclips if the wav conversion works

@lingeringwillx
Copy link

lingeringwillx commented Jun 21, 2024

The format suggested by @actioninja is roughly correct:

4 bytes: magic header (SPX1)
1 byte: number of channels (always 1, mono)
4 bytes: decoded size
4 bytes: speex mode (always 2, ultra-wideband mode, sampling rate 32khz)
2 bytes: samples per frame/decoded frame size (640 samples, or 1280 bytes)

loop until the end of the file:
1 byte: encoded frame size
encoded speex frame

You would call speex_decode_int on the encoded frames to decode the file, the decoded frame size is always 640 samples/1280 bytes. This example in the speex website shows a similar approach to encoding and decoding.

The total decoded file size actually comes out to be a little larger than the decoded size written in the header. This is likely because zeros were appended to the end of the file before encoding so that the last frame would have the same size as the other frames. To work around this you could just allocate your array/buffer to the decoded size from the header + 1280 bytes, so that you won't need to resize the array later.

I've managed to decode the files using this format.

I found two C# libraries that can decode speex:
NSpeex: Pure C# library, It appears to be used as a dependency in one popular library.
SpeexSharp: C bindings to the original speex library.

@LazyDuchess
Copy link
Owner Author

That works great! Might implement NAudio as it's convenient for playing MP3s as well. Thank you!

@LazyDuchess
Copy link
Owner Author

Implemented, works like a charm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants