-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Description
What happened + What you expected to happen
tldr:
- ray.data wants to log characters (e.g. checkmarks) that can (probably) only be encoded via utf-8
- the SessionFileHandler relies on locale to determine the encoding and chooses cp1252 in my case (windows 11)
- Then the logger errors on these characters: UnicodeEncodeError: 'charmap' codec can't encode characters in position 58-59: character maps to
Suggestion: Use utf-8 as default encoding on windows
More Info:
Running a very simple example (see below) including ray.data
Results in an encoding error
--- Logging error ---
Traceback (most recent call last):
File "<my env>\Lib\site-packages\ray\data\_internal\execution\streaming_executor.py", line 790, in get_next
bundle = state.get_output_blocking(output_split_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<my env>\Lib\site-packages\ray\data\_internal\execution\streaming_executor_state.py", line 456, in get_output_blocking
raise StopIteration()
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<my env>\Lib\logging\__init__.py", line 1163, in emit
stream.write(msg + self.terminator)
File "<my env>\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 58-59: character maps to <undefined>
Call stack:
File "<my env>\Lib\threading.py", line 1032, in _bootstrap
self._bootstrap_inner()
File "<my env>\Lib\threading.py", line 1075, in _bootstrap_inner
self.run()
File "<my env>\Lib\threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
File "<my env>\Lib\site-packages\ray\data\_internal\util.py", line 1018, in _run_filling_worker
for idx, item in enumerate(base_iterator):
File "<my env>\Lib\site-packages\ray\data\_internal\execution\interfaces\executor.py", line 34, in __next__
return self.get_next()
File "<my env>\Lib\site-packages\ray\data\_internal\execution\legacy_compat.py", line 76, in get_next
bundle = self._base_iterator.get_next(output_split_idx)
File "<my env>\Lib\site-packages\ray\data\_internal\execution\streaming_executor.py", line 812, in get_next
self._executor.shutdown(
File "<my env>\Lib\site-packages\ray\data\_internal\execution\streaming_executor.py", line 304, in shutdown
logger.info(desc)
File "<my env>\Lib\logging\__init__.py", line 1539, in info
self._log(INFO, msg, args, **kwargs)
File "<my env>\Lib\logging\__init__.py", line 1684, in _log
self.handle(record)
File "<my env>\Lib\logging\__init__.py", line 1700, in handle
self.callHandlers(record)
File "<my env>\Lib\logging\__init__.py", line 1762, in callHandlers
hdlr.handle(record)
File "<my env>\Lib\logging\__init__.py", line 1028, in handle
self.emit(record)
File "<my env>\Lib\site-packages\ray\data\_internal\logging.py", line 126, in emit
self._handler.emit(record)
Message: '✔️ Dataset dataset_3_0 execution finished in 0.05 seconds'
Arguments: ()
The error is caused due to logging using an encoding that cannot handle the characters ray.data wants to print for the file SessionFileHandler(logging.Handler).
I've narrowed it down to this line in ray/data/_internal/logging.py
self._handler = logging.FileHandler(self._path)
Changing it to:
self._handler = logging.FileHandler(self._path, encoding='utf-8')
Seems to do the trick.
Otherwise logging.FileHandler seems to rely on locale which results in cp1252. E.g. on my machine this is:
import locale
print(locale.getlocale())
('de_DE', 'cp1252')
My operating system is:
OS Name Microsoft Windows 11 Enterprise
Version 10.0.26100 Build 26100
Other OS Description Not Available
OS Manufacturer Microsoft Corporation
Versions / Dependencies
conda list
Name Version Build Channel
absl-py 2.3.1 pypi_0 pypi
aiohappyeyeballs 2.6.1 pypi_0 pypi
aiohttp 3.13.3 pypi_0 pypi
aiohttp-cors 0.8.1 pypi_0 pypi
aiosignal 1.4.0 pypi_0 pypi
amqp 5.3.1 pypi_0 pypi
annotated-doc 0.0.4 pypi_0 pypi
annotated-types 0.7.0 pypi_0 pypi
anyio 4.12.1 pypi_0 pypi
asttokens 3.0.1 pypi_0 pypi
attrs 25.4.0 pypi_0 pypi
billiard 4.2.4 pypi_0 pypi
bzip2 1.0.8 h0ad9c76_8 conda-forge
celery 5.6.2 pypi_0 pypi
certifi 2026.1.4 pypi_0 pypi
cffi 2.0.0 pypi_0 pypi
charset-normalizer 3.4.4 pypi_0 pypi
click 8.3.1 pypi_0 pypi
click-didyoumean 0.3.1 pypi_0 pypi
click-plugins 1.1.1.2 pypi_0 pypi
click-repl 0.3.0 pypi_0 pypi
cloudpickle 3.1.2 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
colorful 0.5.8 pypi_0 pypi
cryptography 46.0.3 pypi_0 pypi
cupy-cuda12x 13.6.0 pypi_0 pypi
decorator 5.2.1 pypi_0 pypi
distlib 0.4.0 pypi_0 pypi
dm-tree 0.1.9 pypi_0 pypi
executing 2.2.1 pypi_0 pypi
farama-notifications 0.0.4 pypi_0 pypi
fastapi 0.128.0 pypi_0 pypi
fastrlock 0.8.3 pypi_0 pypi
filelock 3.20.2 pypi_0 pypi
frozenlist 1.8.0 pypi_0 pypi
fsspec 2025.12.0 pypi_0 pypi
google-api-core 2.28.1 pypi_0 pypi
google-auth 2.47.0 pypi_0 pypi
googleapis-common-protos 1.72.0 pypi_0 pypi
grpcio 1.76.0 pypi_0 pypi
gymnasium 1.1.1 pypi_0 pypi
h11 0.16.0 pypi_0 pypi
httptools 0.7.1 pypi_0 pypi
idna 3.11 pypi_0 pypi
importlib-metadata 8.7.1 pypi_0 pypi
ipython 9.9.0 pypi_0 pypi
ipython-pygments-lexers 1.1.1 pypi_0 pypi
jedi 0.19.2 pypi_0 pypi
jsonschema 4.25.1 pypi_0 pypi
jsonschema-specifications 2025.9.1 pypi_0 pypi
kombu 5.6.2 pypi_0 pypi
libexpat 2.7.3 hac47afa_0 conda-forge
libffi 3.5.2 h52bdfb6_0 conda-forge
liblzma 5.8.1 h2466b09_2 conda-forge
libsqlite 3.51.1 hf5d6505_1 conda-forge
libzlib 1.3.1 h2466b09_2 conda-forge
lz4 4.4.5 pypi_0 pypi
matplotlib-inline 0.2.1 pypi_0 pypi
msgpack 1.1.2 pypi_0 pypi
multidict 6.7.0 pypi_0 pypi
numpy 2.4.0 pypi_0 pypi
opencensus 0.11.4 pypi_0 pypi
opencensus-context 0.1.3 pypi_0 pypi
openssl 3.6.0 h725018a_0 conda-forge
opentelemetry-api 1.39.1 pypi_0 pypi
opentelemetry-exporter-prometheus 0.60b1 pypi_0 pypi
opentelemetry-proto 1.39.1 pypi_0 pypi
opentelemetry-sdk 1.39.1 pypi_0 pypi
opentelemetry-semantic-conventions 0.60b1 pypi_0 pypi
ormsgpack 1.7.0 pypi_0 pypi
packaging 25.0 pypi_0 pypi
pandas 2.3.3 pypi_0 pypi
parso 0.8.5 pypi_0 pypi
pip 25.3 pyh8b19718_0 conda-forge
platformdirs 4.5.1 pypi_0 pypi
prometheus-client 0.23.1 pypi_0 pypi
prompt-toolkit 3.0.52 pypi_0 pypi
propcache 0.4.1 pypi_0 pypi
proto-plus 1.27.0 pypi_0 pypi
protobuf 6.33.2 pypi_0 pypi
pure-eval 0.2.3 pypi_0 pypi
py-spy 0.4.1 pypi_0 pypi
pyarrow 22.0.0 pypi_0 pypi
pyasn1 0.6.1 pypi_0 pypi
pyasn1-modules 0.4.2 pypi_0 pypi
pycparser 2.23 pypi_0 pypi
pydantic 2.12.5 pypi_0 pypi
pydantic-core 2.41.5 pypi_0 pypi
pygments 2.19.2 pypi_0 pypi
pyopenssl 25.3.0 pypi_0 pypi
python 3.12.12 h0159041_1_cpython conda-forge
python-dateutil 2.9.0.post0 pypi_0 pypi
python-dotenv 1.2.1 pypi_0 pypi
pytz 2025.2 pypi_0 pypi
pyyaml 6.0.3 pypi_0 pypi
ray 2.53.0 pypi_0 pypi
referencing 0.37.0 pypi_0 pypi
requests 2.32.5 pypi_0 pypi
rpds-py 0.30.0 pypi_0 pypi
rsa 4.9.1 pypi_0 pypi
scipy 1.16.3 pypi_0 pypi
setuptools 80.9.0 pyhff2d567_0 conda-forge
six 1.17.0 pypi_0 pypi
smart-open 7.5.0 pypi_0 pypi
stack-data 0.6.3 pypi_0 pypi
starlette 0.50.0 pypi_0 pypi
tensorboardx 2.6.4 pypi_0 pypi
tk 8.6.13 h2c6b04d_3 conda-forge
traitlets 5.14.3 pypi_0 pypi
typing-extensions 4.15.0 pypi_0 pypi
typing-inspection 0.4.2 pypi_0 pypi
tzdata 2025.3 pypi_0 pypi
tzlocal 5.3.1 pypi_0 pypi
ucrt 10.0.26100.0 h57928b3_0 conda-forge
urllib3 2.6.2 pypi_0 pypi
uvicorn 0.40.0 pypi_0 pypi
vc 14.3 h2df5915_10 defaults
vc14_runtime 14.44.35208 h818238b_34 conda-forge
vcomp14 14.44.35208 h818238b_34 conda-forge
vine 5.1.0 pypi_0 pypi
virtualenv 20.35.4 pypi_0 pypi
watchfiles 1.1.1 pypi_0 pypi
wcwidth 0.2.14 pypi_0 pypi
websockets 15.0.1 pypi_0 pypi
wheel 0.45.1 pyhd8ed1ab_1 conda-forge
wrapt 2.0.1 pypi_0 pypi
yarl 1.22.0 pypi_0 pypi
zipp 3.23.0 pypi_0 pypi
Reproduction script
import ray
# Create dataset from synthetic data.
ds = ray.data.range(1000)
# Create dataset from in-memory data.
ds = ray.data.from_items(
[{"col1": i, "col2": i * 2} for i in range(1000)]
)
ds.take(1)
Issue Severity
None
Metadata
Metadata
Assignees
Labels
Type
Projects
Status