Skip to content

Commit

Permalink
Convert utf8->latin1 before decoding JSON-RPC payloads (#353)
Browse files Browse the repository at this point in the history
* Convert utf8->latin1 before decoding JSON-RPC payloads

This (hopefully) fixes #287.

I'm not completely sure why this is occurring, but non-ascii characters
were seemingly double-encoded, at least on my system (Ubuntu 22.04 via
WSL, VS Code desktop on Windows). I've confirmed with `:io.getopts()`
right before `IO.read/2` is called that the encoding is set to
`:latin1`, but the result after `JsonRpc.decode/1` was that text would
be utf8 double-encoded -- that is, it was what you would expect from:

    latin1_data
    |> :unicode.characters_to_binary(:latin1, :utf8)
    |> :unicode.characters_to_binary(:latin1, :utf8)

This would result in `Document.Line` text containing more bytes than it
should, which causes text edits to fail, leading behind random extra
bytes.

I don't love this "fix" because it feels very much like a band-aid, not
addressing whatever the root issue is, but I'd also rather things work
in documents containing multi-byte characters.

* Update comment explaining utf8->latin1
  • Loading branch information
zachallaun authored and scohen committed Sep 5, 2023
1 parent be6edd6 commit 5d3cbdf
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions apps/server/lib/lexical/server/transport/std_io.ex
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ defmodule Lexical.Server.Transport.StdIO do

with {:ok, content_length} <-
header_value(headers, "content-length", &String.to_integer/1),
{:ok, data} <- read(device, content_length),
{:ok, data} <- read_body(device, content_length),
{:ok, message} <- JsonRpc.decode(data) do
callback.(message)
end
Expand All @@ -112,10 +112,16 @@ defmodule Lexical.Server.Transport.StdIO do
end
end

defp read(device, amount) do
defp read_body(device, amount) do
case IO.read(device, amount) do
data when is_binary(data) or is_list(data) -> {:ok, data}
other -> other
data when is_binary(data) or is_list(data) ->
# Ensure that incoming data is latin1 to prevent double-encoding to utf8 later
# See https://github.com/lexical-lsp/lexical/issues/287 for context.
data = :unicode.characters_to_binary(data, :utf8, :latin1)
{:ok, data}

other ->
other
end
end

Expand Down

0 comments on commit 5d3cbdf

Please sign in to comment.