Skip to content

Commit ee5db79

Browse files
bhanutejagkBhanu Teja GoshikondasirutBuasaijunpuf
authored
[TensorFlow][Inference][Sagemaker] TensorFlow 2.19.0 Currency Release (#4883)
* Building inf image * added dockerd_ec2_entrypoint.sh file * fixed syntax * modified env variable with values directly * rearranged build_image creation line to top of file * added setup.sources.sh file to add NVIDIA's package & Tensorflow serving reporitories * removed libssl1.1 line and modified libreadline-gplv2-dev * removed libssl1.1 installation and modified libreadline-gplv2-dev to libreadline-dev in gpu file * changed TF_SERVING_VERSION_GIT_COMMIT value and removed dockerd_ec2_entrypoint file * added libssl3 since it is a dependency for nginx * changed nginx installation from Focal(20.04) to Ubuntu Jammy(22.04) * trying to debug error with libssl1.1 instead of libssl3 * removed libssl1.1 * installing libcudnn and nccl via dpkg after python installation to remove 3.10 conflict * installing python before installing packages using apt install * installing wget before installing python * insatalling curl, gnupg2, ca-certificates also before python installation * divided installtion of packages as needed for python compilation and not needed ones * installing libssl1.1 * added allowlist to remove vulnerability on gpu image * removed logic for build to only test * setup nvidia repositories through cuda-keyring and building image and also skipped telemetry tests * modified wget installation code after installing wget * added 2.19 to skip framework telemetry * checking with openssl/libssl1.1.1p to check for CVE's and removed skipping of bashrc and entrypoint telemtry * Upgraded OpenSSL to a newer version that doesn't have these vulnerabilities & checking only security tests * building the image * upgraded libssl1.1 to 1.1.1t * replace 1.1.1t with 1.1.1o for libssl1.1 * upgraded libssl1.1 version to 1.1.1f-1ubuntu2.24 * changed libssl version to libssl1.1_1.1.1-1ubuntu2.1~18.04.23_amd64.deb * upgraded libssl1.1 to 1.1.1-1ubuntu2.1~18.04.23 * removed libssl1.1 installation to check if it even needed to remove error * uninstalling libssl1.1 after installing cuda packages to check for errors * Add license file content test (#4890) * Add license file content test * use short verison * test no build * print string * enable build * fix allowlist * rebuild * buildtest ec2 * test arm * build test inference * disable arm64 mode * disable build * revert toml * update EFA to 1.41.0 vllm to 0.9.0.1 (#4898) * update EFA to 1.41.0 vllm to 0.9.0.1 * upgraded base image to cuda 12.2.2 to check for errors of openssl * changed cuda base image version to 12.2.0 from 12.2.2 * added allowlist.json file for the openssl CVE's * enabled some more tests * removed some commented lines * reverted back toml file * Removed all the commented lines for cleaner code --------- Co-authored-by: Bhanu Teja Goshikonda <[email protected]> Co-authored-by: Sirut Buasai <[email protected]> Co-authored-by: Junpu Fan <[email protected]>
1 parent 0f90b4f commit ee5db79

File tree

6 files changed

+1061
-2
lines changed

6 files changed

+1061
-2
lines changed
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
account_id: &ACCOUNT_ID <set-$ACCOUNT_ID-in-environment>
2+
prod_account_id: &PROD_ACCOUNT_ID 763104351884
3+
region: &REGION <set-$REGION-in-environment>
4+
framework: &FRAMEWORK tensorflow
5+
version: &VERSION 2.19.0
6+
short_version: &SHORT_VERSION 2.19
7+
arch_type: x86
8+
#autopatch_build: "True"
9+
10+
repository_info:
11+
inference_repository: &INFERENCE_REPOSITORY
12+
image_type: &INFERENCE_IMAGE_TYPE inference
13+
root: !join [ *FRAMEWORK, "/", *INFERENCE_IMAGE_TYPE ]
14+
repository_name: &REPOSITORY_NAME !join [pr, "-", *FRAMEWORK, "-", *INFERENCE_IMAGE_TYPE]
15+
repository: &REPOSITORY !join [ *ACCOUNT_ID, .dkr.ecr., *REGION, .amazonaws.com/, *REPOSITORY_NAME ]
16+
release_repository_name: &RELEASE_REPOSITORY_NAME !join [ *FRAMEWORK, "-", *INFERENCE_IMAGE_TYPE ]
17+
release_repository: &RELEASE_REPOSITORY !join [ *PROD_ACCOUNT_ID, .dkr.ecr., *REGION, .amazonaws.com/,
18+
*RELEASE_REPOSITORY_NAME ]
19+
20+
context:
21+
inference_context: &INFERENCE_CONTEXT
22+
start_cuda_compat:
23+
source: docker/build_artifacts/start_cuda_compat.sh
24+
target: start_cuda_compat.sh
25+
dockerd_entrypoint:
26+
source: docker/build_artifacts/dockerd_entrypoint.sh
27+
target: dockerd_entrypoint.sh
28+
sagemaker_package_name:
29+
source: docker/build_artifacts/sagemaker
30+
target: sagemaker
31+
init:
32+
source: docker/build_artifacts/__init__.py
33+
target: __init__.py
34+
dockerd-entrypoint:
35+
source: docker/build_artifacts/dockerd-entrypoint.py
36+
target: dockerd-entrypoint.py
37+
deep_learning_container:
38+
source: ../../src/deep_learning_container.py
39+
target: deep_learning_container.py
40+
41+
images:
42+
BuildSageMakerTensorflowCPUInferencePy3DockerImage:
43+
<<: *INFERENCE_REPOSITORY
44+
build: &TENSORFLOW_CPU_INFERENCE_PY3 false
45+
image_size_baseline: &IMAGE_SIZE_BASELINE 4899
46+
framework_version: &FRAMEWORK_VERSION 2.19.0
47+
device_type: &DEVICE_TYPE cpu
48+
python_version: &DOCKER_PYTHON_VERSION py3
49+
tag_python_version: &TAG_PYTHON_VERSION py312
50+
os_version: &OS_VERSION ubuntu22.04
51+
tag: !join [ *FRAMEWORK_VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *OS_VERSION, "-sagemaker" ]
52+
latest_release_tag: !join [ *FRAMEWORK_VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *OS_VERSION, "-sagemaker" ]
53+
docker_file: !join [ docker/, *SHORT_VERSION, /, *DOCKER_PYTHON_VERSION, /Dockerfile., *DEVICE_TYPE ]
54+
target: sagemaker
55+
enable_test_promotion: true
56+
context:
57+
<<: *INFERENCE_CONTEXT
58+
BuildSageMakerTensorflowGPUInferencePy3DockerImage:
59+
<<: *INFERENCE_REPOSITORY
60+
build: &TENSORFLOW_GPU_INFERENCE_PY3 false
61+
image_size_baseline: &IMAGE_SIZE_BASELINE 13100
62+
framework_version: &FRAMEWORK_VERSION 2.19.0
63+
device_type: &DEVICE_TYPE gpu
64+
python_version: &DOCKER_PYTHON_VERSION py3
65+
tag_python_version: &TAG_PYTHON_VERSION py312
66+
cuda_version: &CUDA_VERSION cu122
67+
os_version: &OS_VERSION ubuntu22.04
68+
tag: !join [ *FRAMEWORK_VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-sagemaker" ]
69+
latest_release_tag: !join [ *FRAMEWORK_VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-sagemaker" ]
70+
docker_file: !join [ docker/, *SHORT_VERSION, /, *DOCKER_PYTHON_VERSION, /, *CUDA_VERSION, /Dockerfile., *DEVICE_TYPE ]
71+
target: sagemaker
72+
enable_test_promotion: true
73+
context:
74+
<<: *INFERENCE_CONTEXT

tensorflow/inference/buildspec.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
buildspec_pointer: buildspec-2-18-sm.yml
1+
buildspec_pointer: buildspec-2-19-sm.yml
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
########################################################
2+
# _____ ____ ____ ___
3+
# | ____/ ___|___ \ |_ _|_ __ ___ __ _ __ _ ___
4+
# | _|| | __) | | || '_ ` _ \ / _` |/ _` |/ _ \
5+
# | |__| |___ / __/ | || | | | | | (_| | (_| | __/
6+
# |_____\____|_____| |___|_| |_| |_|\__,_|\__, |\___|
7+
# |___/
8+
# ____ _
9+
# | _ \ ___ ___(_)_ __ ___
10+
# | |_) / _ \/ __| | '_ \ / _ \
11+
# | _ < __/ (__| | |_) | __/
12+
# |_| \_\___|\___|_| .__/ \___|
13+
# |_|
14+
########################################################
15+
16+
FROM tensorflow/serving:2.19.0-devel as build_image
17+
18+
FROM ubuntu:22.04 AS base_image
19+
20+
ENV DEBIAN_FRONTEND=noninteractive \
21+
LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib"
22+
23+
RUN apt-get update \
24+
&& apt-get upgrade -y \
25+
&& apt-get autoremove -y \
26+
&& apt-get clean \
27+
&& rm -rf /var/lib/apt/lists/*
28+
29+
FROM base_image AS ec2
30+
31+
LABEL maintainer="Amazon AI"
32+
LABEL dlc_major_version="1"
33+
34+
ARG PYTHON=python3.12
35+
ARG PYTHON_PIP=python3-pip
36+
ARG PIP=pip3
37+
ARG PYTHON_VERSION=3.12.10
38+
ARG TFS_SHORT_VERSION=2.19
39+
40+
41+
# ENV variable to be passed to SageMaker stage
42+
ENV PIP=${PIP}
43+
ENV PYTHON=${PYTHON}
44+
ENV PYTHON_VERSION=${PYTHON_VERSION}
45+
46+
# See http://bugs.python.org/issue19846
47+
ENV LANG=C.UTF-8
48+
# Python won’t try to write .pyc or .pyo files on the import of source modules
49+
ENV PYTHONDONTWRITEBYTECODE=1
50+
ENV PYTHONUNBUFFERED=1
51+
ENV LD_LIBRARY_PATH='/usr/local/lib:$LD_LIBRARY_PATH'
52+
ENV MODEL_BASE_PATH=/models
53+
# The only required piece is the model name in order to differentiate endpoints
54+
ENV MODEL_NAME=model
55+
ENV DEBIAN_FRONTEND=noninteractive
56+
57+
# First install basic tools needed for Python compilation
58+
RUN apt-get update \
59+
&& apt-get -y install --no-install-recommends \
60+
ca-certificates \
61+
curl \
62+
wget \
63+
gnupg2 \
64+
build-essential \
65+
zlib1g-dev \
66+
libssl-dev \
67+
libbz2-dev \
68+
liblzma-dev \
69+
libffi-dev \
70+
libreadline-dev \
71+
libncursesw5-dev \
72+
libsqlite3-dev \
73+
libgdbm-dev \
74+
tk-dev \
75+
libc6-dev \
76+
openssl \
77+
&& apt-get clean \
78+
&& rm -rf /var/lib/apt/lists/*
79+
80+
# Install python3.12
81+
RUN wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz \
82+
&& tar -xvf Python-${PYTHON_VERSION}.tgz \
83+
&& cd Python-${PYTHON_VERSION} \
84+
&& ./configure && make && make install \
85+
&& rm -rf ../Python-${PYTHON_VERSION}*
86+
87+
# Install remaining packages
88+
RUN apt-get update \
89+
&& apt-get -y install --no-install-recommends \
90+
emacs \
91+
git \
92+
unzip \
93+
vim \
94+
&& apt-get clean \
95+
&& rm -rf /var/lib/apt/lists/*
96+
97+
RUN ${PIP} --no-cache-dir install --upgrade \
98+
pip \
99+
setuptools
100+
101+
RUN ${PIP} install --no-cache-dir \
102+
"awscli<2" \
103+
boto3 \
104+
"cython<3.0" \
105+
gevent \
106+
requests \
107+
grpcio \
108+
"protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3" \
109+
packaging \
110+
# using --no-dependencies to avoid installing tensorflow binary
111+
&& ${PIP} install --no-dependencies --no-cache-dir \
112+
tensorflow-serving-api=="2.19.0"
113+
114+
# Some TF tools expect a "python" binary
115+
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python \
116+
&& ln -s $(which ${PIP}) /usr/bin/pip
117+
118+
119+
# Install TF Serving pkg
120+
COPY --from=build_image /usr/local/bin/tensorflow_model_server /usr/bin/tensorflow_model_server
121+
122+
# Expose ports
123+
# gRPC and REST
124+
EXPOSE 8500 8501
125+
126+
# Set where models should be stored in the container
127+
RUN mkdir -p ${MODEL_BASE_PATH}
128+
129+
ADD https://raw.githubusercontent.com/aws/deep-learning-containers/master/src/deep_learning_container.py /usr/local/bin/deep_learning_container.py
130+
131+
RUN chmod +x /usr/local/bin/deep_learning_container.py
132+
133+
COPY bash_telemetry.sh /usr/local/bin/bash_telemetry.sh
134+
135+
RUN chmod +x /usr/local/bin/bash_telemetry.sh
136+
137+
RUN echo 'source /usr/local/bin/bash_telemetry.sh' >> /etc/bash.bashrc
138+
139+
# Create a script that runs the model server so we can use environment variables
140+
# while also passing in arguments from the docker command line
141+
RUN echo '#!/bin/bash \n\n' > /usr/bin/tf_serving_entrypoint.sh \
142+
&& echo 'bash /usr/local/bin/bash_telemetry.sh >/dev/null 2>&1 || true' > /usr/bin/tf_serving_entrypoint.sh \
143+
&& echo '/usr/bin/tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"' >> /usr/bin/tf_serving_entrypoint.sh \
144+
&& chmod +x /usr/bin/tf_serving_entrypoint.sh
145+
146+
RUN HOME_DIR=/root \
147+
&& curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \
148+
&& unzip ${HOME_DIR}/oss_compliance.zip -d ${HOME_DIR}/ \
149+
&& cp ${HOME_DIR}/oss_compliance/test/testOSSCompliance /usr/local/bin/testOSSCompliance \
150+
&& chmod +x /usr/local/bin/testOSSCompliance \
151+
&& chmod +x ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh \
152+
&& ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \
153+
&& rm -rf ${HOME_DIR}/oss_compliance*
154+
155+
RUN curl https://aws-dlc-licenses.s3.amazonaws.com/tensorflow-${TFS_SHORT_VERSION}/license.txt -o /license.txt
156+
157+
RUN rm -rf /tmp/*
158+
159+
CMD ["/usr/bin/tf_serving_entrypoint.sh"]
160+
161+
#################################################################
162+
# ____ __ __ _
163+
# / ___| __ _ __ _ ___| \/ | __ _| | _____ _ __
164+
# \___ \ / _` |/ _` |/ _ \ |\/| |/ _` | |/ / _ \ '__|
165+
# ___) | (_| | (_| | __/ | | | (_| | < __/ |
166+
# |____/ \__,_|\__, |\___|_| |_|\__,_|_|\_\___|_|
167+
# |___/
168+
# ___ ____ _
169+
# |_ _|_ __ ___ __ _ __ _ ___ | _ \ ___ ___(_)_ __ ___
170+
# | || '_ ` _ \ / _` |/ _` |/ _ \ | |_) / _ \/ __| | '_ \ / _ \
171+
# | || | | | | | (_| | (_| | __/ | _ < __/ (__| | |_) | __/
172+
# |___|_| |_| |_|\__,_|\__, |\___| |_| \_\___|\___|_| .__/ \___|
173+
# |___/ |_|
174+
#################################################################
175+
176+
FROM ec2 AS sagemaker
177+
178+
LABEL maintainer="Amazon AI"
179+
LABEL dlc_major_version="1"
180+
181+
# Specify accept-bind-to-port LABEL for inference pipelines to use SAGEMAKER_BIND_TO_PORT
182+
# https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-real-time.html
183+
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
184+
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
185+
186+
ARG TFS_SHORT_VERSION=2.19
187+
ENV SAGEMAKER_TFS_VERSION="${TFS_SHORT_VERSION}"
188+
ENV PATH="$PATH:/sagemaker"
189+
190+
# nginx + njs
191+
RUN curl -s http://nginx.org/keys/nginx_signing.key | apt-key add - \
192+
&& echo 'deb http://nginx.org/packages/ubuntu/ jammy nginx' >> /etc/apt/sources.list \
193+
&& apt-get update \
194+
&& apt-get -y install --no-install-recommends \
195+
nginx \
196+
nginx-module-njs \
197+
&& apt-get clean \
198+
&& rm -rf /var/lib/apt/lists/*
199+
200+
# the Pins are for the TFS SageMaker Toolkit
201+
RUN ${PIP} install --no-cache-dir \
202+
falcon==3.1.0 \
203+
"gunicorn>=22.0.0"
204+
205+
COPY ./sagemaker /sagemaker
206+
207+
# Expose ports
208+
# gRPC and REST
209+
EXPOSE 8500 8501
210+
211+
RUN HOME_DIR=/root \
212+
&& curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \
213+
&& unzip ${HOME_DIR}/oss_compliance.zip -d ${HOME_DIR}/ \
214+
&& cp ${HOME_DIR}/oss_compliance/test/testOSSCompliance /usr/local/bin/testOSSCompliance \
215+
&& chmod +x /usr/local/bin/testOSSCompliance \
216+
&& chmod +x ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh \
217+
&& ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \
218+
&& rm -rf ${HOME_DIR}/oss_compliance*
219+
220+
RUN rm -rf /tmp/*
221+
222+
CMD ["/usr/bin/tf_serving_entrypoint.sh"]

0 commit comments

Comments
 (0)