Skip to content

Conversation

@rom1v
Copy link
Collaborator

@rom1v rom1v commented Feb 22, 2025

The existing audio sources were:

  • output (default): forwards the whole audio output, and disables playback on the device (mapped to REMOTE_SUBMIX).
  • playback: captures the audio playback (Android apps can opt-out, so the whole output is not necessarily captured).
  • mic: captures the microphone (mapped to MIC).

This PR adds:

  • mic-unprocessed: captures the microphone unprocessed (raw) sound (mapped to UNPROCESSED).
  • mic-camcorder: captures the microphone tuned for video recording, with the same orientation as the camera if available (mapped to CAMCORDER).
  • mic-voice-recognition: captures the microphone tuned for voice recognition (mapped to VOICE_RECOGNITION).
  • mic-voice-communication: captures the microphone tuned for voice communications (it will for instance take advantage of echo cancellation or automatic gain control if available) (mapped to VOICE_COMMUNICATION).
  • voice-call: captures voice call (mapped to VOICE_CALL).
  • voice-call-uplink: captures voice call uplink only (mapped to VOICE_UPLINK).
  • voice-call-downlink: captures voice call downlink only (mapped to VOICE_DOWNLINK).
  • voice-performance: captures audio meant to be processed for live performance (karaoke), includes both the microphone and the device playback (mapped to VOICE_PERFORMANCE).

Discontinuities

The existing audio sources always produce a continuous audio stream. A major issue is that some new audio sources (like the "voice call" source) do not produce packets on silence (they only capture during a voice call).

The audio regulator (the component responsible to maintain a constant latency) assumed that the input audio stream was continuous. In this PR, it now detects discontinuities based on the input PTS (and adjusts its behavior). This only works correctly if the input PTS are "correct".

Another major problem is that, even if the capture timestamps are correct, some encoders (OPUS) rewrite the PTS based on the number of samples (ignoring the input PTS). As a consequence, when encoding in OPUS, the timings are broken: they represent a continuous audio stream where the silences are removed. This breaks the discontinuity detection in the audio regulator (we could work around the problem by relying on the current recv date, since the real time playback itself does not depend on PTS). But the most important problem is that it breaks recording timings. For example:

scrcpy --audio-source=voice-call --record=file.mp4

If the voice call does not start immediately, the audio will not be played at the correct date.

With the AAC encoder, it works (the encoder on the device does not rewrite the PTS based only on the number of samples):

scrcpy --audio-source=voice-call --record=file.mp4 --audio-codec=aac

This PR is in draft due to this unsolved issue.


Aims to fix #5670 and #5412.

@rom1v rom1v changed the base branch from master to dev February 22, 2025 12:00
@Victor239
Copy link

Can there also be an option to capture no sound? When using multiple virtual display windows and playing audio it usually plays on all windows currently with no way disable it except through the OS sound settings.

@rom1v
Copy link
Collaborator Author

rom1v commented Feb 25, 2025

Can there also be an option to capture no sound?

https://github.com/Genymobile/scrcpy/blob/master/doc/audio.md#no-audio

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 2, 2025

Another major problem is that, even if the capture timestamps are correct, some encoders (OPUS) rewrite the PTS based on the number of samples (ignoring the input PTS). As a consequence, when encoding in OPUS, the timings are broken: they represent a continuous audio stream where the silences are removed. This breaks the discontinuity detection in the audio regulator (we could work around the problem by relying on the current recv date, since the real time playback itself does not depend on PTS). But the most important problem is that it breaks recording timings.

This PR is in draft due to this unsolved issue.

Should be fixed by commit Fix PTS produced by the default OPUS encoder on this PR (the SHA1 will change on rebase, but currently it's 63d848f).

Please review/test/check.

@LaptopDev
Copy link

LaptopDev commented Mar 3, 2025

ref So because VOICE_UPLINK restricts 3rd party apps, microphone source cannot be passed from computer to phone during calls?

@yNEX
Copy link

yNEX commented Mar 5, 2025

I tested the changes from this PR using a private fork and built the project by using the GitHub Action. For my testing scenario, I received a WhatsApp call from a second phone. I tried both the --audio-source=voice-call-downlink option and voice-call-uplink and in both cases, the audio was transferred regardless of which phone was muted.

Additionally, with the regular --audio-source=playback option, the audio is no longer played back on the device. Is it possible to extend this behavior to voice calls as well?

I am using a Pixel 8 Pro (Android 15) and the Windows Client

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 5, 2025

I tried both the --audio-source=voice-call-downlink option and voice-call-uplink and in both cases, the audio was transferred

👍 Thank you for the test.

Fixed:

diff --git a/server/src/main/java/com/genymobile/scrcpy/audio/AudioSource.java b/server/src/main/java/com/genymobile/scrcpy/audio/AudioSource.java
index 6689611ad..d16b5e387 100644
--- a/server/src/main/java/com/genymobile/scrcpy/audio/AudioSource.java
+++ b/server/src/main/java/com/genymobile/scrcpy/audio/AudioSource.java
@@ -13,8 +13,8 @@ public enum AudioSource {
     MIC_VOICE_RECOGNITION("mic-voice-recognition", MediaRecorder.AudioSource.VOICE_RECOGNITION),
     MIC_VOICE_COMMUNICATION("mic-voice-communication", MediaRecorder.AudioSource.VOICE_COMMUNICATION),
     VOICE_CALL("voice-call", MediaRecorder.AudioSource.VOICE_CALL),
-    VOICE_CALL_UPLINK("voice-call-uplink", MediaRecorder.AudioSource.VOICE_CALL),
-    VOICE_CALL_DOWNLINK("voice-call-downlink", MediaRecorder.AudioSource.VOICE_CALL),
+    VOICE_CALL_UPLINK("voice-call-uplink", MediaRecorder.AudioSource.VOICE_UPLINK),
+    VOICE_CALL_DOWNLINK("voice-call-downlink", MediaRecorder.AudioSource.VOICE_DOWNLINK),
     VOICE_PERFORMANCE("voice-performance", MediaRecorder.AudioSource.VOICE_PERFORMANCE);
 
     private final String name;

Additionally, with the regular --audio-source=playback option, the audio is no longer played back on the device. Is it possible to extend this behavior to voice calls as well?

The playback audio source uses a specific API, where we can request to duplicate audio or not (--audio-dup). For the others, we have no control (Android determines the behavior).

@yNEX
Copy link

yNEX commented Mar 5, 2025

Thanks for the quick response! 👌🏼

My idea was to use scrcpy to transfer both game audio and voice chat from Call of Duty Mobile to my PC for streaming with OBS. While everything works fine for the most part, I’m encountering an issue with voice call audio. When headphones are connected directly to the phone, the game sound and voice chat are bundled together. However, since I’m using the headphones on my PC, the audio streams remain separated on the phone.

Do you have any suggestions for this use case? Unfortunately, a capture card isn’t an option as it reduces the refresh rate from 120 Hz to 60 Hz. If it’s more convenient, we could discuss this privately to avoid cluttering the PR comment section.

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 5, 2025

When headphones are connected directly to the phone, the game sound and voice chat are bundled together. However, since I’m using the headphones on my PC, the audio streams remain separated on the phone.

See #4084 #4087. Scrcpy has no control over this behavior.

@davidsmith91
Copy link

@rom1v Hello, I have a phone in country A with a local sim card. I have a pc in country B. I can already use scrcpy from PC-B to phone-A passing through PC-A adb server.

What I need to do is: be able to make phone calls.

From what I'm reading here, I should be able to receive the audio of the call using voice-call-downlink.
But any idea on how to send my voice into the call (as "uplink"?). I think I could just do it calling PC-A from PC-B (or streaming the microphone in some low latency way) and putting the speakers on (and the phone would capture my voice from the pc). Yeah not the best thing but it can work, few days ago I did a call with just someone putting a phone in front of another phone and using Signal + normal phone call.

Anyway, how can I use the voice-call-downlink? I'm on a macbook, I use homebrew to install scrcpy.
I don't know if I understood correctly, but you said that the phone cannot use the mic while the audio is forwarded?

So, I couldn't hear what the person in the call is saying through voice-call-downlink and at the same moment make the phone record from the mic what the PC-A is outputting from his speakers?

Thanks very much.

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 7, 2025

But any idea on how to send my voice into the call

See discussions in #3880.

@davidsmith91
Copy link

But any idea on how to send my voice into the call

See discussions in #3880.

@rom1v what about getting the voice downlink while using phone microphone? Is it possible? If you don't get what I mean please just read again my question above, you answered the part about sending audio to the microphone, but not the other part. Thanks

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 7, 2025

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 7, 2025

Should be fixed by commit Fix PTS produced by the default OPUS encoder

In fact, there are still several problems.

Firstly, the resulting audio stream is broken in VLC and Firefox (it works "fine" with warnings in mpv).

Secondly, the "fixed" PTS is not correct, because we push blocks of 960 samples, but the opus encoder outputs blocks of 1024 samples, so after 960 samples, it waits for the next 960 samples before producing an output packet… so fixing the PTS on the output side adds noise in the timestamps.

I don't know how to record a correct file while allowing to compensate for clock drift (so that the video and audio remains synchronized) or handle "missing" silence packets.

@davidsmith91
Copy link

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

Okay okay, but how can I use the voice downlink source? Can you ship it to homebrew so I can install from there? Or even as a release here on github, or a pre-release. Or am I forced to build from source? I need it on macos arm64

@yNEX
Copy link

yNEX commented Mar 8, 2025

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

Okay okay, but how can I use the voice downlink source? Can you ship it to homebrew so I can install from there? Or even as a release here on github, or a pre-release. Or am I forced to build from source? I need it on macos arm64

I actually tested both voice-downlink and voice-uplink with a WhatsApp call. However, I noticed that both audio sources seemed to include both up- and downlink audio. This could be due to WhatsApp potentially using the Android API differently than other applications (this is just my hypothesis - perhaps someone can confirm?). You'll need to test different scenarios as this feature is still under development.

I've created a fork that includes these changes. It's publicly available on my GitHub and built with the GitHub Action provided by this repo :)

@davidsmith91
Copy link

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

Okay okay, but how can I use the voice downlink source? Can you ship it to homebrew so I can install from there? Or even as a release here on github, or a pre-release. Or am I forced to build from source? I need it on macos arm64

I actually tested both voice-downlink and voice-uplink with a WhatsApp call. However, I noticed that both audio sources seemed to include both up- and downlink audio. This could be due to WhatsApp potentially using the Android API differently than other applications (this is just my hypothesis - perhaps someone can confirm?). You'll need to test different scenarios as this feature is still under development.

I've created a fork that includes these changes. It's publicly available on my GitHub and built with the GitHub Action provided by this repo :)

I would need to use it for normal phone calls using Android 14 or 15 phone.
At least, I hope I can get the voice-downlink. Then I have to find a way to make my voice go into the uplink.
Maybe some bluetooth thing is the only solution, but I haven't found anything worth pursuing

@yNEX
Copy link

yNEX commented Mar 8, 2025

what about getting the voice downlink while using phone microphone? Is it possible?

I don't know, I just expose the audio sources from the Android API. How the Android implementation behaves with and without mic is to be tested by device, I have no control over that.

Okay okay, but how can I use the voice downlink source? Can you ship it to homebrew so I can install from there? Or even as a release here on github, or a pre-release. Or am I forced to build from source? I need it on macos arm64

I actually tested both voice-downlink and voice-uplink with a WhatsApp call. However, I noticed that both audio sources seemed to include both up- and downlink audio. This could be due to WhatsApp potentially using the Android API differently than other applications (this is just my hypothesis - perhaps someone can confirm?). You'll need to test different scenarios as this feature is still under development.
I've created a fork that includes these changes. It's publicly available on my GitHub and built with the GitHub Action provided by this repo :)

I would need to use it for normal phone calls using Android 14 or 15 phone. At least, I hope I can get the voice-downlink. Then I have to find a way to make my voice go into the uplink. Maybe some bluetooth thing is the only solution, but I haven't found anything worth pursuing

Maybe Bluetooth over TCP/IP is a solution for you

@davidsmith91
Copy link

Maybe Bluetooth over TCP/IP is a solution for you

@yNEX any idea/example?

@yNEX
Copy link

yNEX commented Mar 10, 2025

Maybe Bluetooth over TCP/IP is a solution for you

@yNEX any idea/example?

I couldn't find any direct solution online for this. You might have to do some research. I don't know if USB over IP solutions like VirtualHere could help you. This could make a USB Bluetooth Adapter available over the Internet like it is locally attached to another PC. Just an idea, don't know if it works

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 16, 2025

Here is a more detailed explanation of the problems it causes. Any insights are appreciated.

Context

On the device, audio is captured using AudioRecord.

Audio "duration" can mean two different things:

  1. the number of samples divided by the frame rate
  2. the difference between two system clock instants

For example, if audio is captured at 48kHz, in theory, we should get 48000 samples per second (by definition). And for each block read, we retrieve the PTS (presentation timestamp) from the system clock (possibly via AudioRecord.getTimestamp(). However, these values may not be entirely accurate or precise.

The difference between accuracy and precision is illustrated by this image:

F_2_11

Concretely, in this case, imprecision means that every read of n samples does not correspond exactly to a PTS difference of n / samplerate. For example, when reading blocks of 960 samples (20ms), the PTS difference between each block might be 20.124, 20.789, 19.756, 21.112, 19.024… but on average, it's 20ms.

Inaccuracy, on the other hand, is related to clock drift: we cannot expect the audio clock to be absolutely exact. So in practice, 48kHz might mean that 48000 samples are produced every 1.003 seconds on average for example, or equivalently, that ~47856 samples are produced every second (not exactly 48000).

In addition to these issues, a temporary lag might cause a device to produce only 47000 samples during a given second instead of 48000. And (related to this PR), some audio sources do not produce any sample when disabled (i.e. 0 samples per second for a given period of time).

For real-time playback, these varations are compensated by the audio regulator, which does not even use PTS at all. The problem is for recording.

Recording

To record, the scrcpy client directly muxes the packets encoded on the device into an MP4 or MKV container (it does not reencode). For each media packet, two pieces of information are needed: the encoded data and the PTS.

But as we've seen before, the number of samples and the PTS do not exactly match. That's the core of the issue.

I don't know if containers (MKV and MP4) require them to match (I have already read somewhere that it is the case).

We could make them match by adjusting the PTS (btw this is what the OPUS encoder does out of our control):

PTS      20.453       40.754       59.943       80.519         1535.121     1554.989     1575.241
           +------------+------------+------------+     ...        +------------+------------+
           | 960 samples|            |            |     ...        |            |            |
           +------------+------------+------------+     ...        +------------+------------+
fixed    20.000       40.000       60.000       80.000         1540.000     1560.000     1580.000
 PTS

But after some time, the audio will be significantely out-of-sync with the video.

Ideally, for the scrcpy use case, I would like to write data and the original PTS as is (this is what the scrcpy client currently does), and the player would play them correctly with compensation. For small differences, it seems that it's more or less the case (not sure the behavior is absolutely correct though), but if there are "holes" (when recording voice when voice is disabled), it just does not work correctly, with any player.

So what should we do?

  1. store the original PTS not matching the data exactly (with fixes for OPUS/FLAC to get the PTS from the system clock)
    • clock drift is probably not compensated exactly after a long playback
    • it breaks when there are "holes" (in all players)
  2. recompute the generated PTS to match the number of samples (on the device)
    • audio and video will be out-of-sync after some time
  3. try to compensate holes and clock drift BEFORE encoding (and make PTS/samples match):
    • not trivial (especially with the AudioRecord API) since it must still work in real time (it must estimate when to insert silence samples into the encoder)
    • this generates silence data to send to the device even when no packets are produced (not a big deal though)
    • double-compensation (server + client) instead of end-to-end compensation (sub-optimal and may degrade quality twice)…
  4. decode and reencode on the client after clock drift and silence compensation
    • quality loss due to reencoding
    • record resampled data (not the original data)
    • more CPU usage
  5. other ideas?

@davidsmith91
Copy link

davidsmith91 commented Mar 17, 2025

look, I cannot understand what you're saying, but thank you for your effort !
i just need a quick fix to call remotely from another country B with a phone that is in country A
so I need a way to get and send audio for the call

imagine you're from germany, you have a phone in germany, you go to australia, you need to use the phone that you left in germany with a specific german number, how would you do it?

@yume-chan
Copy link
Contributor

I tested on my Redmi K70 Pro running Android 15, and all of VOICE_UPLINK, VOICE_DOWNLINK and VOICE_CALL keep producing samples even without active phone calls. The recorded files also just work.

I think we need to find out why most devices only produce about one quarter samples compare to "real time", and what those samples are (maybe they have a different format, but incorrectly interpreted?)

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 19, 2025

I tested on my Redmi K70 Pro running Android 15, and all of VOICE_UPLINK, VOICE_DOWNLINK and VOICE_CALL keep producing samples even without active phone calls.

👍 (the device I tested was Pixel 8)

most devices only produce about one quarter samples compare to "real time"

What do you mean, "one quarter samples"?

@yume-chan
Copy link
Contributor

What do you mean, "one quarter samples"?

I logged near

int r = recorder.read(outDirectBuffer, AudioConfig.MAX_READ_SIZE);

On Mi 11 running Android 13, each read call returns 960 samples (20ms), but takes around 80ms of real time, so I think those 960 samples might actually last 80ms, not 20ms.

@rom1v
Copy link
Collaborator Author

rom1v commented Mar 19, 2025

On Mi 11 running Android 13, each read call returns 960 samples (20ms), but takes around 80ms of real time, so I think those 960 samples might actually last 80ms, not 20ms.

Oh, ok. In fact, it's real time, but a 80ms block is produced every 80ms (I had noticed 40ms on my Pixel 8). So if you read several times, you should get the remaining samples immediately.

For example:

  • at t=80: 960 samples
  • at t=82: 960 samples
  • at t=84: 960 samples
  • at t=86: 960 samples
  • at t=160: 960 samples
  • at t=162: 960 samples
  • at t=164: 960 samples
  • at t=166: 960 samples
  • at t=240: 960 samples

Is it the case on your device?

rom1v added 2 commits March 29, 2025 14:54
Store the target audio source integer (one of the constants from
android.media.MediaRecorder.AudioSource) in the AudioSource enum (or -1
if not relevant).

This will simplify adding new audio sources.

PR #5870 <#5870>
Expose more audio sources from MediaRecorder.AudioSource.

Refs <https://developer.android.com/reference/android/media/MediaRecorder.AudioSource>

Fixes #5412 <#5412>
Fixes #5670 <#5670>
PR #5870 <#5870>
@huynhtanloc2612
Copy link

Hi @rom1v
I am trying latest scrcpy release version (v3.2) to see if voice call audio can be forward to my Windows laptop speaker or not. However, the voice call audio can only outputted from phone speaker with default scrcpy setting or with options (--audio-source=voice-call )
I tested on Galaxy S22 ultra (Android 14)
Do I need to add any other audio options to forward voice call audio to my laptop?

@rom1v
Copy link
Collaborator Author

rom1v commented Apr 1, 2025

Do I need to add any other audio options to forward voice call audio to my laptop?

Voice call audio is captured on your Android device, transmitted to your laptop, and played on your laptop.

Not sure if this is what you're asking, but this is only one-way: the microphone of your laptop is not captured to be forwarded to your Android device.

@huynhtanloc2612
Copy link

I expect that I can use the speaker and microphone of the laptop to answer a call (it should be two-way). I tried some tests but I could not hear the voice call audio come from laptop speaker.

@rom1v
Copy link
Collaborator Author

rom1v commented Apr 1, 2025

I expect that I can use the speaker and microphone of the laptop to answer a call (it should be two-way).

It is not. What has been added by this PR is the possibility to capture many audio sources from the Android device, including voice calls.

Forwarding the computer microphone to the Android device is not implemented. See #3880.

I tried some tests but I could not hear the voice call audio come from laptop speaker.

You mean from the laptop microphone I guess.

@huynhtanloc2612
Copy link

You mean from the laptop microphone I guess.

No, I meant the voice call audio had been captured from Android device was expected to be heard from laptop speaker.
Sorry, if I misunderstood the purpose this PR and the feature.

@rom1v
Copy link
Collaborator Author

rom1v commented Apr 1, 2025

I meant the voice call audio had been captured from Android device was expected to be heard from laptop speaker.

Yes, it should (if you specify --audio-source=voice-call).

Maybe what you mean is that you want to capture multiple audio sources at once (at least device audio output + voice calls). This is not supported.

@huynhtanloc2612
Copy link

huynhtanloc2612 commented Apr 1, 2025

Yes, it should (if you specify --audio-source=voice-call).

I had tried this option but it did not work in my test (no audio from laptop speaker)
scrcpy v3.2/Galaxy s22 ultra Android 14

@rom1v
Copy link
Collaborator Author

rom1v commented Apr 1, 2025

You're testing while passing a phone call, right?

What is the full output in the console? (run scrcpy --audio-source=voice-call -Vverbose)

@huynhtanloc2612
Copy link

You're testing while passing a phone call, right?

YES

What is the full output in the console? (run scrcpy --audio-source=voice-call -Vverbose)

Below is the full output.

scrcpy 3.2 <https://github.com/Genymobile/scrcpy>
INFO: ADB device found:
INFO:     --> (tcpip)  192.168.1.234:5555              device  SM_S908E
DEBUG: Device serial: 192.168.1.234:5555
DEBUG: Using server (portable): C:\Users\HuynhTanLoc\Downloads\scrcpy-win64-v3.2\scrcpy-win64-v3.2\scrcpy-server
C:\Users\HuynhTanLoc\Downloads\scrcpy-win64-v3.2\scrcpy-wi... file pushed, 0 skipped. 57.8 MB/s (90888 bytes in 0.001s)
[server] INFO: Device: [samsung] samsung SM-S908E (Android 14)
DEBUG: Server connected
DEBUG: Starting controller thread
DEBUG: Starting receiver thread
[server] DEBUG: Using video encoder: 'c2.qti.avc.encoder'
[server] DEBUG: Using audio encoder: 'c2.android.opus.encoder'
DEBUG: Using icon (portable): C:\Users\HuynhTanLoc\Downloads\scrcpy-win64-v3.2\scrcpy-win64-v3.2\icon.png
INFO[server] DEBUG: Display: using DisplayManager API
: Renderer: direct3d
DEBUG: Trilinear filtering disabled (not an OpenGL renderer)
DEBUG: Demuxer 'video': starting thread
DEBUG: Demuxer 'audio': starting thread
INFO: Texture: 1080x2312
[server] VERBOSE: DisplaySizeMonitor: onDisplayConfigurationChanged(148)
VERBOSE: input: touch [id=mouse] hover-move position=355,1268 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1266 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1263 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1261 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1258 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1256 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1253 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1251 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1246 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1241 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=353,1236 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: [Audio] Buffering: target=2400 avg=2812.800049 cur=2880 compensation=-412 (underflow=0)
VERBOSE: input: touch [id=mouse] hover-move position=355,1231 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=358,1216 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=360,1208 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=370,1196 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=375,1186 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=378,1178 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=380,1176 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=380,1168 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=378,1133 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=375,1108 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=375,1073 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=375,1038 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=375,1008 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=375,978 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=375,948 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=375,918 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=385,880 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=393,815 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=400,775 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=408,743 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=420,703 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=428,665 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=441,625 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=446,585 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=451,545 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=461,507 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=466,487 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=468,462 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=468,445 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=468,412 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=461,390 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=456,352 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=456,337 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=448,320 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=448,310 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=448,302 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=448,290 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=448,280 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=451,267 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=453,260 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=456,247 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=463,230 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=466,220 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=466,205 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=466,192 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=468,185 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=468,170 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=471,157 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=471,142 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=478,120 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=481,110 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=481,102 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=481,97 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=481,95 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=481,90 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=481,87 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=481,82 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=481,77 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=481,72 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=478,70 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=476,67 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=476,65 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=471,62 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=461,60 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=453,57 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=446,52 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=436,50 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: [Audio] Buffering: target=2400 avg=2687.713135 cur=2279 compensation=0 (underflow=0)
VERBOSE: input: touch [id=mouse] hover-move position=428,47 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=423,42 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=420,42 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=415,37 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=413,35 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=413,32 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=408,32 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=403,27 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=400,25 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=395,20 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=390,15 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=385,10 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=375,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=373,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=370,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: [Audio] Buffering: target=2400 avg=2707.746094 cur=2759 compensation=-307 (underflow=0)
VERBOSE: input: touch [id=mouse] hover-move position=448,10 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=446,12 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: [Audio] Buffering: target=2400 avg=2636.633789 cur=2681 compensation=-236 (underflow=0)
VERBOSE: input: touch [id=mouse] hover-move position=441,17 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=441,25 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=441,50 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: [Audio] Buffering: target=2400 avg=2585.670166 cur=2621 compensation=-185 (underflow=0)
VERBOSE: input: touch [id=mouse] hover-move position=428,37 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=423,25 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=418,20 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=415,12 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=415,5 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=415,2 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=415,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: [Audio] Buffering: target=2400 avg=2540.655029 cur=2573 compensation=-140 (underflow=0)
VERBOSE: [Audio] Buffering: target=2400 avg=2497.887695 cur=2538 compensation=-97 (underflow=0)
VERBOSE: [Audio] Buffering: target=2400 avg=2471.492188 cur=2513 compensation=-71 (underflow=0)
VERBOSE: input: touch [id=mouse] hover-move position=413,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=433,5 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=453,7 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=476,7 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=503,12 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=526,17 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=531,17 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=536,17 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=541,17 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=543,17 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=546,17 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=548,15 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=551,15 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=558,12 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=558,10 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=561,10 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=566,7 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=568,7 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=573,2 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=576,2 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=578,2 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=611,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: [Audio] Buffering: target=2400 avg=2459.705322 cur=2015 compensation=0 (underflow=0)
VERBOSE: input: touch [id=mouse] hover-move position=984,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=992,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=999,2 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=999,7 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: [Audio] Buffering: target=2400 avg=2446.473633 cur=2015 compensation=0 (underflow=0)
VERBOSE: input: touch [id=mouse] hover-move position=1002,7 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=1004,7 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=1004,2 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=1004,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: input: touch [id=mouse] hover-move position=1004,0 pressure=1.000000 action_button=000000 buttons=000000
VERBOSE: [Audio] Buffering: target=2400 avg=2454.006348 cur=2495 compensation=0 (underflow=0)
VERBOSE: [Audio] Buffering: target=2400 avg=2452.037109 cur=2975 compensation=0 (underflow=0)
VERBOSE: [Audio] Buffering: target=2400 avg=2439.832520 cur=2495 compensation=0 (underflow=0)
VERBOSE: [Audio] Buffering: target=2400 avg=2431.423828 cur=2495 compensation=0 (underflow=0)
DEBUG: User requested to quit
DEBUG: quit...
DEBUG: Controller stopped
DEBUG: Demuxer 'video': end of frames
DEBUG: Demuxer 'audio': end of frames
DEBUG: Receiver stopped
[server] DEBUG: Controller stopped
[server] DEBUG: Device message sender stopped
[server] DEBUG: Audio encoder stopped
[server] DEBUG: Screen streaming stopped
DEBUG: Server disconnected
DEBUG: Server terminated

@rom1v
Copy link
Collaborator Author

rom1v commented Apr 1, 2025

It appears that an audio stream is correctly transmitted to the client. Maybe that's a stream full of silence (for some reason on your device).

If you mirror the device audio output (i.e. without --audio-source=voice-call), you correctly get the device output sound on your computer, right?

@huynhtanloc2612
Copy link

It appears that an audio stream is correctly transmitted to the client. Maybe that's a stream full of silence (for some reason on your device).

The voice audio is only heard via phone speaker

If you mirror the device audio output (i.e. without --audio-source=voice-call), you correctly get the device output sound on your computer, right?

Yes, I can get the device output sound from laptop speaker.

@davidsmith91
Copy link

@rom1v I tested every type of voice call source and they all send both uplink and downlink (without differentiation). Also, you should add virtual microphone as a guy did with some custom code.. and then if you can mix voice downlink/uplink with phone audio in general and have the microphone feature, the phone would be 100% usable remotely to make calls etc

bartsaintgermain pushed a commit to bartsaintgermain/scrcpy that referenced this pull request Apr 11, 2025
Only enable them if SC_AUDIO_REGULATOR_DEBUG is set, as they may spam
the output.

PR Genymobile#5870 <Genymobile#5870>
bartsaintgermain pushed a commit to bartsaintgermain/scrcpy that referenced this pull request Apr 11, 2025
Report the number of silence samples inserted due to underflow every
second, along with the other metrics.

PR Genymobile#5870 <Genymobile#5870>
bartsaintgermain pushed a commit to bartsaintgermain/scrcpy that referenced this pull request Apr 11, 2025
The default OPUS and FLAC encoders on Android rewrite the input PTS so
that they exactly match the number of samples.

As a consequence:
 - audio clock drift is not compensated
 - implicit silences (without packets) are ignored

To work around this behavior, generate new PTS based on the current time
(after encoding) and the packet duration.

PR Genymobile#5870 <Genymobile#5870>
bartsaintgermain pushed a commit to bartsaintgermain/scrcpy that referenced this pull request Apr 11, 2025
The audio regulator assumed a continuous audio stream. But some audio
sources (like the "voice call" audio source) do not produce any packets
on silence, breaking this assumption.

Use PTS to detect such discontinuities.

PR Genymobile#5870 <Genymobile#5870>
bartsaintgermain pushed a commit to bartsaintgermain/scrcpy that referenced this pull request Apr 11, 2025
Store the target audio source integer (one of the constants from
android.media.MediaRecorder.AudioSource) in the AudioSource enum (or -1
if not relevant).

This will simplify adding new audio sources.

PR Genymobile#5870 <Genymobile#5870>
bartsaintgermain pushed a commit to bartsaintgermain/scrcpy that referenced this pull request Apr 11, 2025
@live-call-audio-inject
Copy link

Is it possible to have an app inject audio into a phone call or transfer to a voice bucket, perfect for telemarketers etc being put on hold, muzak, and told how their call is SO important etc. Or to turn it around on organizations "This call will be recorded for quality assurance (etc)" or "No consent is given to record this and copyright is claimed on any voice performance with a license fee of $10,000 you agree to and bind your employer to by continuation with this call". Time to fight back against big companies.

@live-call-audio-inject
Copy link

I'd be be happy to pay good coin for such an implementation

@rom1v
Copy link
Collaborator Author

rom1v commented Nov 8, 2025

Is it possible to have an app inject audio

Not possible via an Android app AFAIK. Via scrcpy, see #6439.

@live-call-audio-inject
Copy link

live-call-audio-inject commented Nov 8, 2025 via email

@davidsmith91
Copy link

Hi there Is it possible on Graphene OS ? What is the exact blockage and functions needed ?

On Saturday, November 8th, 2025 at 8:07 AM, Romain Vimont @.> wrote: rom1v left a comment [(Genymobile/scrcpy#5870)](#5870 (comment)) > Is it possible to have an app inject audio Not possible via an Android app AFAIK. Via scrcpy, see #6439. — Reply to this email directly, [view it on GitHub](#5870 (comment)), or unsubscribe. You are receiving this because you commented.Message ID: @.>

I think it should be added in the OS codebase . Probably the best you can do is only ask grapheneos on github to add this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] option to choose unprocessed microphone output of phone (and other processing options)

9 participants