-
Notifications
You must be signed in to change notification settings - Fork 0
Performance notes
Here are some thoughts related to noVNC performance including areas that have the most potential for improvement.
The python version of websockify is able to record the session stream
to a file as a Javascript array. The tests/vnc_playback.html
page can
load and playback the recording and report timing data. It has a real
time mode that plays back the recording using timestamps in the data.
It also has a performance mode that replays the stream as fast as
possible. This is very useful for performance profiling/tuning and is
one of the primary methods I use to determine if I am making forward
progress.
There is also a file tests/vnc_perf.html
which is similar (and uses
the same playback driver from include/playback.js
) but it loads
a file (currently hard-coded to data/multi.js
) that contains
multiple recordings; one for each encoding supported by noVNC. It
reports on each encoding separately and also gives total time for the
full run. I find this useful as an overall benchmark for noVNC.
Here are some recordings that are not checked into the main branch due
to their size. To use them, create a "tests/data" directory and
download the recordings to that directory. Then load
tests/vnc_playback.html?data=FILE
.
-
simple.js (161 KB): short recording of terminal scrolling. Encodings: copyrect and hextile.
-
demo1.js (3.0 MB): recording for online demo 1. Encodings: hextile only
-
bug_raw.js (4.1 MB): simple test case for webkit canvas render bug. Encodings: raw only.
-
multi.js (4.2 MB): multiple recordings in one file used
tests/vnc_perf.html
(not compatible withtests/vnc_playback.html
). Encodings: raw, copyrect, rre, hextile, tight_png
Here is
a screenshot from profiling several iterations of
tests/vnc_perf.html
using the Chrome profiler.
As you can see from the screenshot very little of the run-time is actually spent running Javascript itself. The results for firebug profiling of firefox 4 beta are similar. About 5% is spent in garbage collection, and almost 73% is spent doing something else, meaning just over 1/5th of the time is actually spent running Javascript code. No noVNC routine shows particularly high in the list. When I first started noVNC the profile was very different. It took a lot of benchmarking and trial and error (which is fun across multiple browsers) to reach that point.
This is good news, in that noVNC code is making good use of Javascript and avoiding Javascript operations that are inefficient. However, it is bad news in that the bottlenecks have been pushed into harder to measure places.
One place to start with optimization is to determine what is happening
in the line item (program)
. My belief is that the bulk of that line
item is the browser processing certain canvas rendering functions.
I will explain more about that below. To break down the (program)
line item will probably require profiling the browser process itself.
WebKit itself is easier to build (but is basically the same engine as
Chrome) so building WebKit with profile/debug info is probably the
direction to go in this area.
Of the normal RFB encodings noVNC supports "raw", "copyrect", "RRE" and "hextile". The "hextile" encoding is much more bandwidth efficient than "raw" or "RRE" but it is still not nearly as efficient as some of the other zlib based encodings: "zlib", "ZRLE", and "tight".
I briefly explored implementing the "tight" encoding in noVNC but tight uses zlib compression and decompressing zlib directly in Javascript is relatively slow so I gave up on that route.
However, there might be a way to trick the browser into doing zlib decompression in native code instead of Javascript. Tobias Schneider came up with this ingenious hack and presented it at the 2010 jsconf.eu conference (slides). According to Tobias, the code performs well but he removed it from his Gordon project because mobile Safari and Firefox limit the width of canvas elements to 5000px so it is only possible to decompress 5 KB of data.
Until somebody has an opportunity to explore the method Tobias came up with further, the plain tight encoding is not going to be efficient in noVNC. However, the fact that PNG images contain zlib compressed data is a good segway into the next topic...
This past summer I worked with a Google Summer of Code student (who was working on adding support for the tight encoding to QEMU/KVM) to design a variant of the tight protocol. Instead of using zlib for the basic compression section of the encoding, the pixel data is encoded as a full PNG image. PNG uses zlib compression internally for pixel data, but the advantage for noVNC is that the drawImage method can be used to render PNG images directly to the canvas without any decode.
Here is brief description of the tightPNG encoding. Here is a blog article from the student/developer who implemented tightPNG in QEMU.
If you have a VNC server that supports tightPNG, the encoding is MUCH faster than hextile. It is also more bandwidth efficient (comparable to standard tight encoding). The tightPNG encoding is in QEMU 0.13 (although it needs to be compiled with --vnc-enable-png to turn it on) and I also have a personal port of the encoding to libvncserver. The libvncserver port works but palette mode (non-true color) is not yet ported from the QEMU implementation.
Adding tightPNG encoding to other common VNC servers will probably provide the most performance bang for the buck in the long run. It will help noVNC performance significantly on all browsers and on all operating systems.
My belief is that the time allocated to (program)
in the
Chrome
profile is time that the browser is doing stuff on behalf of the
currently page/tab but after the page has yielded control.
I previously thought that (program)
was garbage collection time, but
in recent Chrome releases, that has been broken out into its own line
item as you can see. In the case of noVNC, I now think canvas
rendering is probably accounting for the bulk of time in (program)
.
My understanding of how canvas generally works is that when you run a canvas rendering method (such as fillRect or putImageData), the data is not actually rendered until the current Javascript thread yields control. Rather, when you run the method, the parameters and any array data they reference (in the case of putImageData) are copied to a canvas buffer. Once the Javascript thread yields control, the actual rendering of that data to the canvas occurs. I discovered a bug in WebKit canvas rendering a few months ago that seems to confirm this understanding.
It is likely that Canvas hardware acceleration will help quite a bit with performance of noVNC. However, I have not had an opportunity to test noVNC in a browser that has canvas hardware acceleration. My development is primarily on Linux and neither Chrome nor firefox have hardware acceleration enabled on Linux. But I suspect that canvas acceleration may help a lot.
Another option to explore is coalescing canvas operations. For example, with hextile encoding, the update region is broken down into 16x16 tiles. noVNC currently calls putImageData once for every tile. However, an alternate approach would be to call putImageData less often for multiple tiles at a time. I have done an experiment to which called putImageData for each full frame buffer update hextile rectangle but the performance was quite a bit worse. There is probably an optimal trade-off somewhere between calling putImageData for a single 16x16 tile at a time and calling putImageData for the whole FBU hextile rectangle.
Finding the right trade-off for putImageData granularity is deifnitely an area to explore. The right point is likely be different depending on browser, browser version and whether canvas hardware acceleration is enabled.
The ArrayBuffer typed arrays proposal could help noVNC but only if reads and writes are faster than normal Javascript arrays and especially if ArrayBuffers are used to replace the Canvas ImageData array type.
This summer I discovered that the ImageData "array buffers" returned by calls to getImageData (or createImageData) are actually special byte arrays and not standard Javascript arrays (i.e. you can only read/write values between 0-255). I was hopeful that I might be able to use ImageData arrays for RFB protocol decoding instead of Javascript arrays. However after doing some performance tests I found that ImageData arrays are somewhat slower for reading and about 50% slower than writing to normal Javascript arrays. :-(
I would expect that ArrayBuffer typed arrays should be faster than normal Javascript arrays for both reading and writing, but that is not necessarily a given. Also, I have not seen an indication of whether ArrayBuffers will replace ImageData for Canvas (or at least can be used in their place in an efficient way).
One thing that accounts for some latency in noVNC is the fact that the current WebSockets protocol only allows for UTF-8 encoded data. There is discussion of adding a binary mode to future revisions of the spec, but when this will happen in the protocol and in the API (which is a separate spec) is uncertain. For now, noVNC connects via websockify to enable connecting to VNC servers. websockify is a generic WebSockets to TCP socket proxy. It encodes/decodes all traffic to/from the WebSockets client using base64 to enable binary data to be sent in a UTF-8 compatible way.
The base64 encode and decode process in Javascript adds some latency, although you can see from the Chrome profile image that not much time is spent in base64 decode. If you are familiar with the poorly publicised window.atob and window.btoa routines in Javascript you might wonder why noVNC implements base64 encode/decode in Javascript directly. It turns out these routines are slightly slower than the Javascript versions for the method in which noVNC uses them. One of the problems with the native routines is that they encode and decode from and to Javascripts strings whereas noVNC needs to decode to and from arrays of numeric values. The overhead of doing this conversion seems to be where most of the penalty comes from in using the native routines.
Perhaps when ArrayBuffers are incorporated, there will be native browser routines that can do base64 encode/decode with ArrayBuffers. One can hope.
I have explored many ways of making the base64 decode (encode is far less important since the RFB protocol sends very little data) more efficient including using large pre-generated lookup tables instead of doing bit shifting (since Javascript is purportedly slow at bit-ops), but none of the approaches have yielded enough gain to be worth the extra complication (or even rise much above statistical noise).
Hopefully, at some point in the near future there will be the ability to send binary data directly via WebSockets so the base64 encoding requirement can be dropped. I have seen some discussion that ArrayBuffers would be used for the WebSockets API so that the client can interpret the WebSockets messages however they want to.
One area where noVNC is currently deficient is in its FBU (frame buffer update) request algorithm. One of the areas that I have not spent a great deal of time optimizing and tuning is the rate at which FBU requests. Ideally this should be automatically adaptive based on actual network performance and the rate of change of VNC server desktop, although failing that, having better knobs to tweak the settings manually would be a start. This is one area that might have some low hanging fruit.
The check_rate and fbu_req_rate configuration variables can adjust the frequency with which mouse movements and FBU requests are sent. This part of noVNC is messy because more simplistic algorithms have not worked well for me. But there is a lot of room for cleaning up the code and implementing a more adaptive algorithm.
In particular the deficiency in the FBU request algorithm can be seen when playing videos which leads to the next topic...
Video playback performance in noVNC is particular poor at the moment but this is an artifact of the FBU request algorithm more than a unique performance problem in noVNC rendering. This is one of the areas where adaptive FBU requests would help a great deal.
In the RFB/VNC protocol, FBUs (frame buffer updates) are only sent in response to a client FBU request. This is how the client can avoid being overwhelmed by too much traffic from the server. When a client send an FBU request with the incremental flag set, the server will send a FBU with rectangles for all the areas of the FBU area that have changed since the last FBU was sent to the client.
Currently the noVNC FBU request algorithm works like this:
-
When the user/client is idling (no keyboard, mouse movement, or mouse clicks) then an FBU request is sent every conf.fbu_req_rate milliseconds. Right now it set to 1413 ms (about 1.5 seconds). In other words, if you are just sitting there watching a video the frame rate will be less than 1 frame per second.
-
If the user moves the mouse the mouse movement data is accumulated in a buffer. Every conf.check_rate milliseconds a timer fires that checks the mouse buffer. If there is anything in the buffer, it is sent with an FBU request appended to it (so the server will send back an FBU with the updated mouse cursor). In other words, mouse pointer movements are updated at 5 frames a second.
-
If the user presses a key or clicks the mouse, any accumulated mouse buffer data plus the keypress or mouse click is sent immediately along with an appended FBU request.
The above algorithm is tuned to maximize interactivity while minimizing browser and network load.noVNC is currently tuned for virtual machine administration because that was the itch that it was originally designed to scratch. And it works quite well for that sort of situation. However, for any activity where the server is updating the display asynchronously from user interaction (i.e. playing a video or first person shooter for example), the frame rate will be low.
If you are watching a video via noVNC and press the shift key rapidly while doing so the effective frame rate will increase significantly. This is because noVNC sends an FBU request every time you press or release the shift key.