Skip to content

Commit e2b6980

Browse files
contentisRudra-Ji
andauthored
v0.2.0 (#217)
* fix typo (#79) * Print correct profile when engine is loaded * Use scripts callbacks * increase max resolution * adding native LoRA support, avoiding model.py import error and some refactoring * more refactoring and added typing * Enable torch fallback * update inastall * change default XL engine * cc independent lora * Update Instrucitons --------- Co-authored-by: Rudra <[email protected]>
1 parent 4c2bcaf commit e2b6980

File tree

12 files changed

+1504
-2229
lines changed

12 files changed

+1504
-2229
lines changed

README.md

Lines changed: 29 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
1-
# TensorRT Extension for Stable Diffusion
1+
# TensorRT Extension for Stable Diffusion
22

3-
This extension enables the best performance on NVIDIA RTX GPUs for Stable Diffusion with TensorRT.
4-
5-
You need to install the extension and generate optimized engines before using the extension. Please follow the instructions below to set everything up.
6-
7-
Supports Stable Diffusion 1.5 and 2.1. Native SDXL support coming in a future release. Please use the [dev branch](https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/dev) if you would like to use it today. Note that the Dev branch is not intended for production work and may break other things that you are currently using.
3+
This extension enables the best performance on NVIDIA RTX GPUs for Stable Diffusion with TensorRT.
4+
You need to install the extension and generate optimized engines before using the extension. Please follow the instructions below to set everything up.
5+
Supports Stable Diffusion 1.5,2.1, SDXL, SDXL Turbo, and LCM. For SDXL and SDXL Turbo, we recommend using a GPU with 12 GB or more VRAM for best performance due to its size and computational intensity.
86

97
## Installation
108

@@ -15,44 +13,50 @@ Example instructions for Automatic1111:
1513
3. Copy the link to this repository and paste it into URL for extension's git repository
1614
4. Click Install
1715

16+
1817
## How to use
1918

20-
1. Click on the “Generate Default Engines” button. This step takes 2-10 minutes depending on your GPU. You can generate engines for other combinations.
19+
1. Click on the “Generate Default Engines” button. This step takes 2-10 minutes depending on your GPU. You can generate engines for other combinations.
2120
2. Go to Settings → User Interface → Quick Settings List, add sd_unet. Apply these settings, then reload the UI.
22-
3. Back in the main UI, select the TRT model from the sd_unet dropdown menu at the top of the page.
23-
4. You can now start generating images accelerated by TRT. If you need to create more Engines, go to the TensorRT tab.
21+
3. Back in the main UI, select “Automatic” from the sd_unet dropdown menu at the top of the page if not already selected.
22+
4. You can now start generating images accelerated by TRT. If you need to create more Engines, go to the TensorRT tab.
2423

2524
Happy prompting!
2625

26+
### LoRA
27+
28+
To use LoRA / LyCORIS checkpoints they first need to be converted to a TensorRT format. This can be done in the TensorRT extension in the Export LoRA tab.
29+
1. Select a LoRA checkpoint from the dropdown.
30+
2. Export. (This will not generate an engine but only convert the weights in ~20s)
31+
3. You can use the exported LoRAs as usual using the prompt embedding.
32+
33+
2734
## More Information
2835

2936
TensorRT uses optimized engines for specific resolutions and batch sizes. You can generate as many optimized engines as desired. Types:
30-
31-
- The "Export Default Engines” selection adds support for resolutions between 512x512 and 768x768 for Stable Diffusion 1.5 and 768x768 to 1024x1024 for SDXL with batch sizes 1 to 4.
32-
- Static engines support a single specific output resolution and batch size.
33-
- Dynamic engines support a range of resolutions and batch sizes, at a small cost in performance. Wider ranges will use more VRAM.
37+
- The "Export Default Engines” selection adds support for resolutions between `512 x 512` and 768x768 for Stable Diffusion 1.5 and 2.1 with batch sizes 1 to 4. For SDXL, this selection generates an engine supporting a resolution of `1024 x 1024` with a batch size of `1`.
38+
- Static engines support a single specific output resolution and batch size.
39+
- Dynamic engines support a range of resolutions and batch sizes, at a small cost in performance. Wider ranges will use more VRAM.
40+
- The first time generating an engine for a checkpoint will take longer. Additional engines generated for the same checkpoint will be much faster.
3441

3542
Each preset can be adjusted with the “Advanced Settings” option. More detailed instructions can be found [here](https://nvidia.custhelp.com/app/answers/detail/a_id/5487/~/tensorrt-extension-for-stable-diffusion-web-ui).
3643

3744
### Common Issues/Limitations
3845

39-
**HIRES FIX:** If using the hires.fix option in Automatic1111 you must build engines that match both the starting and ending resolutions. For instance, if initial size is `512 x 512` and hires.fix upscales to `1024 x 1024`, you must either generate two engines, one at 512 and one at 1024, or generate a single dynamic engine that covers the whole range.
40-
Having two seperate engines will heavily impact performance at the moment. Stay tuned for updates.
46+
**HIRES FIX**: If using the hires.fix option in Automatic1111 you must build engines that match both the starting and ending resolutions. For instance, if the initial size is `512 x 512` and hires.fix upscales to `1024 x 1024`, you must generate a single dynamic engine that covers the whole range.
4147

42-
**Resolution:** When generating images the resolution needs to be a multiple of 64. This applies to hires.fix as well, requiring the low and high-res to be divisible by 64.
48+
**Resolution**: When generating images, the resolution needs to be a multiple of 64. This applies to hires.fix as well, requiring the low and high-res to be divisible by 64.
4349

44-
**Failing CMD arguments:**
50+
**Failing CMD arguments**:
4551

46-
- `medvram` and `lowvram` Have caused issues when compiling the engine and running it.
52+
- `medvram` and `lowvram` Have caused issues when compiling the engine.
4753
- `api` Has caused the `model.json` to not be updated. Resulting in SD Unets not appearing after compilation.
48-
49-
**Failing installation or TensorRT tab not appearing in UI:** This is most likely due to a failed install. To resolve this manually use this [guide](https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/issues/27#issuecomment-1767570566).
54+
- Failing installation or TensorRT tab not appearing in UI: This is most likely due to a failed install. To resolve this manually use this [guide](https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/issues/27#issuecomment-1767570566).
5055

5156
## Requirements
57+
Driver:
5258

53-
**Driver**:
54-
55-
- Linux: >= 450.80.02
56-
- Windows: >=452.39
59+
Linux: >= 450.80.02
60+
- Windows: >= 452.39
5761

58-
We always recommend keeping the driver up-to-date for system wide performance improvments.
62+
We always recommend keeping the driver up-to-date for system wide performance improvements.

datastructures.py

Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
from dataclasses import dataclass
2+
from enum import Enum
3+
from json import JSONEncoder
4+
import torch
5+
6+
7+
class SDVersion(Enum):
8+
SD1 = 1
9+
SD2 = 2
10+
SDXL = 3
11+
Unknown = -1
12+
13+
def __str__(self):
14+
return self.name
15+
16+
@classmethod
17+
def from_str(cls, str):
18+
try:
19+
return cls[str]
20+
except KeyError:
21+
return cls.Unknown
22+
23+
def match(self, sd_model):
24+
if sd_model.is_sd1 and self == SDVersion.SD1:
25+
return True
26+
elif sd_model.is_sd2 and self == SDVersion.SD2:
27+
return True
28+
elif sd_model.is_sdxl and self == SDVersion.SDXL:
29+
return True
30+
elif self == SDVersion.Unknown:
31+
return True
32+
else:
33+
return False
34+
35+
36+
class ModelType(Enum):
37+
UNET = 0
38+
CONTROLNET = 1
39+
LORA = 2
40+
UNDEFINED = -1
41+
42+
@classmethod
43+
def from_string(cls, s):
44+
return getattr(cls, s.upper(), None)
45+
46+
def __str__(self):
47+
return self.name.lower()
48+
49+
50+
@dataclass
51+
class ModelConfig:
52+
profile: dict
53+
static_shapes: bool
54+
fp32: bool
55+
inpaint: bool
56+
refit: bool
57+
lora: bool
58+
vram: int
59+
unet_hidden_dim: int = 4
60+
61+
def is_compatible_from_dict(self, feed_dict: dict):
62+
distance = 0
63+
for k, v in feed_dict.items():
64+
_min, _opt, _max = self.profile[k]
65+
v_tensor = torch.Tensor(list(v.shape))
66+
r_min = torch.Tensor(_max) - v_tensor
67+
r_opt = (torch.Tensor(_opt) - v_tensor).abs()
68+
r_max = v_tensor - torch.Tensor(_min)
69+
if torch.any(r_min < 0) or torch.any(r_max < 0):
70+
return (False, distance)
71+
distance += r_opt.sum() + 0.5 * (r_max.sum() + 0.5 * r_min.sum())
72+
return (True, distance)
73+
74+
def is_compatible(
75+
self, width: int, height: int, batch_size: int, max_embedding: int
76+
):
77+
distance = 0
78+
sample = self.profile["sample"]
79+
embedding = self.profile["encoder_hidden_states"]
80+
81+
batch_size *= 2
82+
width = width // 8
83+
height = height // 8
84+
85+
_min, _opt, _max = sample
86+
if _min[0] > batch_size or _max[0] < batch_size:
87+
return (False, distance)
88+
if _min[2] > height or _max[2] < height:
89+
return (False, distance)
90+
if _min[3] > width or _max[3] < width:
91+
return (False, distance)
92+
93+
_min_em, _opt_em, _max_em = embedding
94+
if _min_em[1] > max_embedding or _max_em[1] < max_embedding:
95+
return (False, distance)
96+
97+
distance = (
98+
abs(_opt[0] - batch_size)
99+
+ abs(_opt[2] - height)
100+
+ abs(_opt[3] - width)
101+
+ 0.5 * (abs(_max[2] - height) + abs(_max[3] - width))
102+
)
103+
104+
return (True, distance)
105+
106+
107+
class ModelConfigEncoder(JSONEncoder):
108+
def default(self, o: ModelConfig):
109+
return o.__dict__
110+
111+
112+
@dataclass
113+
class ProfileSettings:
114+
bs_min: int
115+
bs_opt: int
116+
bs_max: int
117+
h_min: int
118+
h_opt: int
119+
h_max: int
120+
w_min: int
121+
w_opt: int
122+
w_max: int
123+
t_min: int
124+
t_opt: int
125+
t_max: int
126+
static_shape: bool = False
127+
128+
def __str__(self) -> str:
129+
return "Batch Size: {}-{}-{}\nHeight: {}-{}-{}\nWidth: {}-{}-{}\nToken Count: {}-{}-{}".format(
130+
self.bs_min,
131+
self.bs_opt,
132+
self.bs_max,
133+
self.h_min,
134+
self.h_opt,
135+
self.h_max,
136+
self.w_min,
137+
self.w_opt,
138+
self.w_max,
139+
self.t_min,
140+
self.t_opt,
141+
self.t_max,
142+
)
143+
144+
def out(self):
145+
return (
146+
self.bs_min,
147+
self.bs_opt,
148+
self.bs_max,
149+
self.h_min,
150+
self.h_opt,
151+
self.h_max,
152+
self.w_min,
153+
self.w_opt,
154+
self.w_max,
155+
self.t_min,
156+
self.t_opt,
157+
self.t_max,
158+
)
159+
160+
def token_to_dim(self, static_shapes: bool):
161+
self.t_min = (self.t_min // 75) * 77
162+
self.t_opt = (self.t_opt // 75) * 77
163+
self.t_max = (self.t_max // 75) * 77
164+
165+
if static_shapes:
166+
self.t_min = self.t_max = self.t_opt
167+
self.bs_min = self.bs_max = self.bs_opt
168+
self.h_min = self.h_max = self.h_opt
169+
self.w_min = self.w_max = self.w_opt
170+
self.static_shape = True
171+
172+
def get_latent_dim(self):
173+
return (
174+
self.h_min // 8,
175+
self.h_opt // 8,
176+
self.h_max // 8,
177+
self.w_min // 8,
178+
self.w_opt // 8,
179+
self.w_max // 8,
180+
)
181+
182+
def get_a1111_batch_dim(self):
183+
static_batch = self.bs_min == self.bs_max == self.bs_opt
184+
if self.t_max <= 77:
185+
return (self.bs_min * 2, self.bs_opt * 2, self.bs_max * 2)
186+
elif self.t_max > 77 and static_batch:
187+
return (self.bs_opt, self.bs_opt, self.bs_opt)
188+
elif self.t_max > 77 and not static_batch:
189+
if self.t_opt > 77:
190+
return (self.bs_min, self.bs_opt, self.bs_max * 2)
191+
return (self.bs_min, self.bs_opt * 2, self.bs_max * 2)
192+
else:
193+
raise Exception("Uncovered case in get_batch_dim")
194+
195+
196+
class ProfilePrests:
197+
def __init__(self):
198+
self.profile_presets = {
199+
"512x512 | Batch Size 1 (Static)": ProfileSettings(
200+
1, 1, 1, 512, 512, 512, 512, 512, 512, 75, 75, 75
201+
),
202+
"768x768 | Batch Size 1 (Static)": ProfileSettings(
203+
1, 1, 1, 768, 768, 768, 768, 768, 768, 75, 75, 75
204+
),
205+
"1024x1024 | Batch Size 1 (Static)": ProfileSettings(
206+
1, 1, 1, 1024, 1024, 1024, 1024, 1024, 1024, 75, 75, 75
207+
),
208+
"256x256 - 512x512 | Batch Size 1-4": ProfileSettings(
209+
1, 1, 4, 256, 512, 512, 256, 512, 512, 75, 75, 150
210+
),
211+
"512x512 - 768x768 | Batch Size 1-4": ProfileSettings(
212+
1, 1, 4, 512, 512, 768, 512, 512, 768, 75, 75, 150
213+
),
214+
"768x768 - 1024x1024 | Batch Size 1-4": ProfileSettings(
215+
1, 1, 4, 768, 1024, 1024, 768, 1024, 1024, 75, 75, 150
216+
),
217+
}
218+
self.default = ProfileSettings(
219+
1, 1, 4, 512, 512, 768, 512, 512, 768, 75, 75, 150
220+
)
221+
self.default_xl = ProfileSettings(
222+
1, 1, 1, 1024, 1024, 1024, 1024, 1024, 1024, 75, 75, 75
223+
)
224+
225+
def get_settings_from_version(self, version: str):
226+
static = False
227+
if version == "Default":
228+
return *self.default.out(), static
229+
if "Static" in version:
230+
static = True
231+
return *self.profile_presets[version].out(), static
232+
233+
def get_choices(self):
234+
return list(self.profile_presets.keys()) + ["Default"]
235+
236+
def get_default(self, is_xl: bool):
237+
if is_xl:
238+
return self.default_xl
239+
return self.default

0 commit comments

Comments
 (0)