Skip to content

[Data] SessionFileHandler uses cp1252 encoding on windows resulting in logging errors #59967

@Famok

Description

@Famok

What happened + What you expected to happen

tldr:

  1. ray.data wants to log characters (e.g. checkmarks) that can (probably) only be encoded via utf-8
  2. the SessionFileHandler relies on locale to determine the encoding and chooses cp1252 in my case (windows 11)
  3. Then the logger errors on these characters: UnicodeEncodeError: 'charmap' codec can't encode characters in position 58-59: character maps to

Suggestion: Use utf-8 as default encoding on windows

More Info:
Running a very simple example (see below) including ray.data

Results in an encoding error

--- Logging error ---
Traceback (most recent call last):
  File "<my env>\Lib\site-packages\ray\data\_internal\execution\streaming_executor.py", line 790, in get_next
    bundle = state.get_output_blocking(output_split_idx)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<my env>\Lib\site-packages\ray\data\_internal\execution\streaming_executor_state.py", line 456, in get_output_blocking
    raise StopIteration()
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<my env>\Lib\logging\__init__.py", line 1163, in emit
    stream.write(msg + self.terminator)
  File "<my env>\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 58-59: character maps to <undefined>
Call stack:
  File "<my env>\Lib\threading.py", line 1032, in _bootstrap
    self._bootstrap_inner()
  File "<my env>\Lib\threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "<my env>\Lib\threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "<my env>\Lib\site-packages\ray\data\_internal\util.py", line 1018, in _run_filling_worker
    for idx, item in enumerate(base_iterator):
  File "<my env>\Lib\site-packages\ray\data\_internal\execution\interfaces\executor.py", line 34, in __next__
    return self.get_next()
  File "<my env>\Lib\site-packages\ray\data\_internal\execution\legacy_compat.py", line 76, in get_next
    bundle = self._base_iterator.get_next(output_split_idx)
  File "<my env>\Lib\site-packages\ray\data\_internal\execution\streaming_executor.py", line 812, in get_next
    self._executor.shutdown(
  File "<my env>\Lib\site-packages\ray\data\_internal\execution\streaming_executor.py", line 304, in shutdown
    logger.info(desc)
  File "<my env>\Lib\logging\__init__.py", line 1539, in info
    self._log(INFO, msg, args, **kwargs)
  File "<my env>\Lib\logging\__init__.py", line 1684, in _log
    self.handle(record)
  File "<my env>\Lib\logging\__init__.py", line 1700, in handle
    self.callHandlers(record)
  File "<my env>\Lib\logging\__init__.py", line 1762, in callHandlers
    hdlr.handle(record)
  File "<my env>\Lib\logging\__init__.py", line 1028, in handle
    self.emit(record)
  File "<my env>\Lib\site-packages\ray\data\_internal\logging.py", line 126, in emit
    self._handler.emit(record)
Message: '✔️  Dataset dataset_3_0 execution finished in 0.05 seconds'
Arguments: ()

The error is caused due to logging using an encoding that cannot handle the characters ray.data wants to print for the file SessionFileHandler(logging.Handler).

I've narrowed it down to this line in ray/data/_internal/logging.py
self._handler = logging.FileHandler(self._path)

Changing it to:
self._handler = logging.FileHandler(self._path, encoding='utf-8')

Seems to do the trick.

Otherwise logging.FileHandler seems to rely on locale which results in cp1252. E.g. on my machine this is:

import locale
print(locale.getlocale())
('de_DE', 'cp1252')

My operating system is:
OS Name Microsoft Windows 11 Enterprise
Version 10.0.26100 Build 26100
Other OS Description Not Available
OS Manufacturer Microsoft Corporation

Versions / Dependencies

conda list
Name                    Version                   Build  Channel
absl-py                   2.3.1                    pypi_0    pypi
aiohappyeyeballs          2.6.1                    pypi_0    pypi
aiohttp                   3.13.3                   pypi_0    pypi
aiohttp-cors              0.8.1                    pypi_0    pypi
aiosignal                 1.4.0                    pypi_0    pypi
amqp                      5.3.1                    pypi_0    pypi
annotated-doc             0.0.4                    pypi_0    pypi
annotated-types           0.7.0                    pypi_0    pypi
anyio                     4.12.1                   pypi_0    pypi
asttokens                 3.0.1                    pypi_0    pypi
attrs                     25.4.0                   pypi_0    pypi
billiard                  4.2.4                    pypi_0    pypi
bzip2                     1.0.8                h0ad9c76_8    conda-forge
celery                    5.6.2                    pypi_0    pypi
certifi                   2026.1.4                 pypi_0    pypi
cffi                      2.0.0                    pypi_0    pypi
charset-normalizer        3.4.4                    pypi_0    pypi
click                     8.3.1                    pypi_0    pypi
click-didyoumean          0.3.1                    pypi_0    pypi
click-plugins             1.1.1.2                  pypi_0    pypi
click-repl                0.3.0                    pypi_0    pypi
cloudpickle               3.1.2                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
colorful                  0.5.8                    pypi_0    pypi
cryptography              46.0.3                   pypi_0    pypi
cupy-cuda12x              13.6.0                   pypi_0    pypi
decorator                 5.2.1                    pypi_0    pypi
distlib                   0.4.0                    pypi_0    pypi
dm-tree                   0.1.9                    pypi_0    pypi
executing                 2.2.1                    pypi_0    pypi
farama-notifications      0.0.4                    pypi_0    pypi
fastapi                   0.128.0                  pypi_0    pypi
fastrlock                 0.8.3                    pypi_0    pypi
filelock                  3.20.2                   pypi_0    pypi
frozenlist                1.8.0                    pypi_0    pypi
fsspec                    2025.12.0                pypi_0    pypi
google-api-core           2.28.1                   pypi_0    pypi
google-auth               2.47.0                   pypi_0    pypi
googleapis-common-protos  1.72.0                   pypi_0    pypi
grpcio                    1.76.0                   pypi_0    pypi
gymnasium                 1.1.1                    pypi_0    pypi
h11                       0.16.0                   pypi_0    pypi
httptools                 0.7.1                    pypi_0    pypi
idna                      3.11                     pypi_0    pypi
importlib-metadata        8.7.1                    pypi_0    pypi
ipython                   9.9.0                    pypi_0    pypi
ipython-pygments-lexers   1.1.1                    pypi_0    pypi
jedi                      0.19.2                   pypi_0    pypi
jsonschema                4.25.1                   pypi_0    pypi
jsonschema-specifications 2025.9.1                 pypi_0    pypi
kombu                     5.6.2                    pypi_0    pypi
libexpat                  2.7.3                hac47afa_0    conda-forge
libffi                    3.5.2                h52bdfb6_0    conda-forge
liblzma                   5.8.1                h2466b09_2    conda-forge
libsqlite                 3.51.1               hf5d6505_1    conda-forge
libzlib                   1.3.1                h2466b09_2    conda-forge
lz4                       4.4.5                    pypi_0    pypi
matplotlib-inline         0.2.1                    pypi_0    pypi
msgpack                   1.1.2                    pypi_0    pypi
multidict                 6.7.0                    pypi_0    pypi
numpy                     2.4.0                    pypi_0    pypi
opencensus                0.11.4                   pypi_0    pypi
opencensus-context        0.1.3                    pypi_0    pypi
openssl                   3.6.0                h725018a_0    conda-forge
opentelemetry-api         1.39.1                   pypi_0    pypi
opentelemetry-exporter-prometheus 0.60b1                   pypi_0    pypi
opentelemetry-proto       1.39.1                   pypi_0    pypi
opentelemetry-sdk         1.39.1                   pypi_0    pypi
opentelemetry-semantic-conventions 0.60b1                   pypi_0    pypi
ormsgpack                 1.7.0                    pypi_0    pypi
packaging                 25.0                     pypi_0    pypi
pandas                    2.3.3                    pypi_0    pypi
parso                     0.8.5                    pypi_0    pypi
pip                       25.3               pyh8b19718_0    conda-forge
platformdirs              4.5.1                    pypi_0    pypi
prometheus-client         0.23.1                   pypi_0    pypi
prompt-toolkit            3.0.52                   pypi_0    pypi
propcache                 0.4.1                    pypi_0    pypi
proto-plus                1.27.0                   pypi_0    pypi
protobuf                  6.33.2                   pypi_0    pypi
pure-eval                 0.2.3                    pypi_0    pypi
py-spy                    0.4.1                    pypi_0    pypi
pyarrow                   22.0.0                   pypi_0    pypi
pyasn1                    0.6.1                    pypi_0    pypi
pyasn1-modules            0.4.2                    pypi_0    pypi
pycparser                 2.23                     pypi_0    pypi
pydantic                  2.12.5                   pypi_0    pypi
pydantic-core             2.41.5                   pypi_0    pypi
pygments                  2.19.2                   pypi_0    pypi
pyopenssl                 25.3.0                   pypi_0    pypi
python                    3.12.12         h0159041_1_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
python-dotenv             1.2.1                    pypi_0    pypi
pytz                      2025.2                   pypi_0    pypi
pyyaml                    6.0.3                    pypi_0    pypi
ray                       2.53.0                   pypi_0    pypi
referencing               0.37.0                   pypi_0    pypi
requests                  2.32.5                   pypi_0    pypi
rpds-py                   0.30.0                   pypi_0    pypi
rsa                       4.9.1                    pypi_0    pypi
scipy                     1.16.3                   pypi_0    pypi
setuptools                80.9.0             pyhff2d567_0    conda-forge
six                       1.17.0                   pypi_0    pypi
smart-open                7.5.0                    pypi_0    pypi
stack-data                0.6.3                    pypi_0    pypi
starlette                 0.50.0                   pypi_0    pypi
tensorboardx              2.6.4                    pypi_0    pypi
tk                        8.6.13               h2c6b04d_3    conda-forge
traitlets                 5.14.3                   pypi_0    pypi
typing-extensions         4.15.0                   pypi_0    pypi
typing-inspection         0.4.2                    pypi_0    pypi
tzdata                    2025.3                   pypi_0    pypi
tzlocal                   5.3.1                    pypi_0    pypi
ucrt                      10.0.26100.0         h57928b3_0    conda-forge
urllib3                   2.6.2                    pypi_0    pypi
uvicorn                   0.40.0                   pypi_0    pypi
vc                        14.3                h2df5915_10    defaults
vc14_runtime              14.44.35208         h818238b_34    conda-forge
vcomp14                   14.44.35208         h818238b_34    conda-forge
vine                      5.1.0                    pypi_0    pypi
virtualenv                20.35.4                  pypi_0    pypi
watchfiles                1.1.1                    pypi_0    pypi
wcwidth                   0.2.14                   pypi_0    pypi
websockets                15.0.1                   pypi_0    pypi
wheel                     0.45.1             pyhd8ed1ab_1    conda-forge
wrapt                     2.0.1                    pypi_0    pypi
yarl                      1.22.0                   pypi_0    pypi
zipp                      3.23.0                   pypi_0    pypi

Reproduction script

import ray
# Create dataset from synthetic data.
ds = ray.data.range(1000)
# Create dataset from in-memory data.
ds = ray.data.from_items(
    [{"col1": i, "col2": i * 2} for i in range(1000)]
)
ds.take(1)

Issue Severity

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesobservabilityIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or ProfilingstabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)usabilitywindows

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions