|
85 | 85 | "source": [
|
86 | 86 | "import sys\n",
|
87 | 87 | "\n",
|
88 |
| - "!{sys.executable} -m pip install \"scikit_learn==0.20.0\" \"sagemaker>=2.75.1\" \"transformers==4.6.1\" \"datasets==1.6.2\" \"nltk==3.4.4\"" |
| 88 | + "!{sys.executable} -m pip install \"scikit_learn==0.20.0\" \"sagemaker>=2.86.1\" \"transformers==4.6.1\" \"datasets==1.6.2\" \"nltk==3.4.4\"" |
89 | 89 | ]
|
90 | 90 | },
|
91 | 91 | {
|
92 | 92 | "cell_type": "markdown",
|
93 | 93 | "metadata": {},
|
94 | 94 | "source": [
|
95 |
| - "Make sure SageMaker version is >= 2.75.1" |
| 95 | + "Make sure SageMaker version is >= 2.86.1" |
96 | 96 | ]
|
97 | 97 | },
|
98 | 98 | {
|
|
1097 | 1097 | "\n",
|
1098 | 1098 | "#### Concurrent invocations - `max_concurrency`\n",
|
1099 | 1099 | " \n",
|
1100 |
| - "Serverless Inference manages predefined scaling policies and quotas for the capacity of your endpoint. Serverless endpoints have a quota for how many concurrent invocations can be processed at the same time. If the endpoint is invoked before it finishes processing the first request, then it handles the second request concurrently. You can set the maximum concurrency for a <b>single endpoint up to 50</b>, and the total number of serverless endpoint variants you can host in a Region is 50. The total concurrency you can share between all serverless endpoints per Region in your account is 200. The maximum concurrency for an individual endpoint prevents that endpoint from taking up all the invocations allowed for your account, and any endpoint invocations beyond the maximum are throttled." |
| 1100 | + "Serverless Inference manages predefined scaling policies and quotas for the capacity of your endpoint. Serverless endpoints have a quota for how many concurrent invocations can be processed at the same time. If the endpoint is invoked before it finishes processing the first request, then it handles the second request concurrently. You can set the maximum concurrency for a <b>single endpoint up to 200</b>, and the total number of serverless endpoint variants you can host in a Region is 50. The total concurrency you can share between all serverless endpoints per Region in your account is 200. The maximum concurrency for an individual endpoint prevents that endpoint from taking up all the invocations allowed for your account, and any endpoint invocations beyond the maximum are throttled." |
1101 | 1101 | ]
|
1102 | 1102 | },
|
1103 | 1103 | {
|
|
1114 | 1114 | ")"
|
1115 | 1115 | ]
|
1116 | 1116 | },
|
1117 |
| - { |
1118 |
| - "cell_type": "markdown", |
1119 |
| - "metadata": {}, |
1120 |
| - "source": [ |
1121 |
| - "### HuggingFace Inference Image `URI`\n", |
1122 |
| - "\n", |
1123 |
| - "In order to deploy the SageMaker Endpoint with Serverless configuration, we will need to supply the HuggingFace Inference Image URI." |
1124 |
| - ] |
1125 |
| - }, |
1126 |
| - { |
1127 |
| - "cell_type": "code", |
1128 |
| - "execution_count": null, |
1129 |
| - "metadata": {}, |
1130 |
| - "outputs": [], |
1131 |
| - "source": [ |
1132 |
| - "image_uri = sagemaker.image_uris.retrieve(\n", |
1133 |
| - " framework=\"huggingface\",\n", |
1134 |
| - " base_framework_version=\"pytorch1.7\",\n", |
1135 |
| - " region=sess.boto_region_name,\n", |
1136 |
| - " version=\"4.6\",\n", |
1137 |
| - " py_version=\"py36\",\n", |
1138 |
| - " instance_type=\"ml.m5.large\",\n", |
1139 |
| - " image_scope=\"inference\",\n", |
1140 |
| - ")\n", |
1141 |
| - "image_uri" |
1142 |
| - ] |
1143 |
| - }, |
1144 | 1117 | {
|
1145 | 1118 | "cell_type": "markdown",
|
1146 | 1119 | "metadata": {},
|
|
1157 | 1130 | "source": [
|
1158 | 1131 | "%%time\n",
|
1159 | 1132 | "\n",
|
1160 |
| - "predictor = huggingface_estimator.deploy(\n", |
1161 |
| - " serverless_inference_config=serverless_config, image_uri=image_uri\n", |
1162 |
| - ")" |
| 1133 | + "predictor = huggingface_estimator.deploy(serverless_inference_config=serverless_config)" |
1163 | 1134 | ]
|
1164 | 1135 | },
|
1165 | 1136 | {
|
|
0 commit comments