Skip to content

Java image amazon/aws-lambda-java:21 should contain package glibc-langpack-en for full unicode support #234

Open
@jensrutschmann

Description

@jensrutschmann

Running Java applications using the current base image amazon/aws-lambda-java:21.2025.02.24.10-x86_64 produces exceptions when reading from or writing to files with non-ascii characters in their filenames.

This is because the read-only system property sun.jnu.encoding is initialized with the value ANSI_X3.4-1968 by the JVM and then the Java IO code cannot map the unicode characters to this encoding when constructing the File / Path objects.
The underlying problem seems to be the absence of the proper glibc extensions for the English locale in the base image: installing the package glibc-langpack-en fixes the problem.

This issue is NOT reproducible with the images for other Java versions

  • 17: amazon/aws-lambda-java:17.2025.02.24.09-x86_64
  • 11: amazon/aws-lambda-java:11.2025.02.24.09-x86_64

Would you therefore consider adding this package back to default Java 21 image as well?

Potentially related issues:

Steps to reproduce

Save this Java program as a file called DebugEnv.java:

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;


public class DebugEnv {

    public static void main(String[] args) throws IOException {
        System.out.println("Environment Variables:");
        System.getenv().forEach((k, v) -> System.out.println(k + ": " + v));
        System.out.println();

        System.out.println("sun.jnu.encoding: " + System.getProperty("sun.jnu.encoding"));

        Path tmpDir = Paths.get(System.getProperty("java.io.tmpdir"));
        Files.write(tmpDir.resolve("Germän Ümläüts.txt"), "does not matter".getBytes(StandardCharsets.UTF_8));
        Files.write(tmpDir.resolve("隨機文字.txt"), "does not matter".getBytes(StandardCharsets.UTF_8));
    }

}

Then run this docker command from the same directory:

docker run --rm --platform=linux/amd64 -it --entrypoint=java -v ${PWD}/DebugEnv.java:/var/task/DebugEnv.java amazon/aws-lambda-java:21 DebugEnv.java

Output:

Environment Variables:
HOME: /root
LAMBDA_RUNTIME_DIR: /var/runtime
LAMBDA_TASK_ROOT: /var/task
PATH: /var/lang/bin:/usr/local/bin:/usr/bin/:/bin:/opt/bin
TZ: :/etc/localtime
LD_LIBRARY_PATH: /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib
TERM: xterm
LANG: en_US.UTF-8
HOSTNAME: 54ff4216deb1

sun.jnu.encoding: ANSI_X3.4-1968
Exception in thread "main" java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: t?st.txt
	at java.base/sun.nio.fs.UnixPath.encode(Unknown Source)
	at java.base/sun.nio.fs.UnixPath.<init>(Unknown Source)
	at java.base/sun.nio.fs.UnixFileSystem.getPath(Unknown Source)
	at java.base/java.nio.file.Path.resolve(Unknown Source)
	at DebugEnv.main(DebugEnv.java:18)

As you can see the LANG environment variable has been properly set to en_US.UTF-8, yet the sun.jnu.encoding system property has been intialized by the JVM with ANSI_X3.4-1968. Consequently, the file write operations fail with an exception.

Fixing the image

Open a shell in the default image:

docker run --rm --platform=linux/amd64 -it --entrypoint=/bin/sh -v ${PWD}/DebugEnv.java:/var/task/DebugEnv.java amazon/aws-lambda-java:21

Install package glibc-langpack-en:

dnf install glibc-langpack-en

Run the demo program:

java DebugEnv.java 

Output:

Environment Variables:
HOME: /root
SHLVL: 1
LAMBDA_RUNTIME_DIR: /var/runtime
PATH: /var/lang/bin:/usr/local/bin:/usr/bin/:/bin:/opt/bin
LAMBDA_TASK_ROOT: /var/task
TZ: :/etc/localtime
LD_LIBRARY_PATH: /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib
TERM: xterm
PWD: /var/task
_: /var/lang/bin/java
LANG: en_US.UTF-8
HOSTNAME: 4a0c0c2fbc7f

sun.jnu.encoding: UTF-8

As you can see this time the sun.jnu.encoding system property has been intialized by the JVM with UTF-8 and the file write operations succeed without any exceptions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions