-
Notifications
You must be signed in to change notification settings - Fork 60
User experience improvement for VACE-Wan2.1-1.3B-Preview #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Amazing work! I have been using mostly pose and depth control to good effect. I have found it not to work too well with sone existing loras especially those trained on styles. would love to have a tile/upscale method. |
Hi there, It would be great to be able to keep the first image passed as input unchanged (as with the Fun Control model), in case we wish to start from an existing scene without altering it. This model is for me the best video model we currently have in the opensource world, so a huge thanks for your work. |
Hey, thanks for the amazing model(s)! One thing that has come up couple of times in feedback, people would like some way to better choose which control type is actually used, this is referring to the method used in union controlnets, I don't know if it's possible here though. Another thing that may already be possible, but I'm unsure how to do would be adjusting the strength of control and reference separately. And related to that, using multiple separate VACE inputs, for which I have used workaround to simply apply VACE multiple times, but it's somewhat clumsy. As for the memory use of VACE there are some things I don't fully understand, the initial memory use was very high, a problem may have been caused by me or something else than your implementation, however when I did these changes the memory usage was reduced considerably without any drawback that I can see: |
It might be important to be able to edit the strength, so I could blend the original with the reference. It would also be nice to be able to add more than one reference for different purposes. I also thought it would be nice if we could do it with styles, meaning, for example, if you add a comic book image, it would take the style of the image and not just what you see. I don't know if I'm perhaps rambling. Thank you very much for your work. |
First, thank you! This is one of the coolest toolkits to happen for opensource! The work you're doing is incredible. |
For me a focus on maintaining identity consistency would be amazing! I find identity often changes through generations. Thank you for your excellent work 👍 |
Many thanks for you great model.
|
Still great thanks to the team that made and open sourced this amazing tool. This IS the most versatile image / video creation tool that I've ever encountered, and it has already achieved quite a lot even in the preview version. |
@kijai Thanks for your contribution of making VACE to run perfectly in ComfyUI workflows!
VACE supports multi control types on a single run, but only supports spatial-separate control signal. For example, you can control a person with pose and the background with depth, but no overlap here. You can try overlapping control signals and may it works, but we are not explicitly train the model like that.
We didn't fully realize its importance here and we only test
You did it right. The original implementation is inherited from the training code. For inference, it can be much easier. |
Maybe you should try dwpose with facial points. Please refer to the right bottom case of Figure1 of our arxiv paper |
We just add VACE blocks to the WAN1.3B T2V model, and didn't tune the main Wan T2V backbone. Any LoRA works before should works here. Maybe add more details to your prompt. |
If you want something as a part of output video to remain unchanged, just put it to the input context frames (not to reference images, which will not be a part of the output video). And don't forget to set the corresponding mask region to black, which means do not change the pixels under black masks. |
Yes, I think
I believe our
You can just tell model the style you want in prompt in that case. |
I believe VACE provides a lot of possibilities to control identity. Are you doing long video generation through multiple runs? |
I have implemented a similar fix in WanGP. The old code created a temporary huge tensor that fragmented the VRAM. |
Thank you for your great work. During my usage, the biggest need I’ve felt is for both start and end frames + process control. Currently, in Kijai’s workflow, we can only use a single reference frame and control frames, which makes character features uncontrollable when the person turns around or undergoes significant changes. Edit: |
Thank you, it works well! |
You can indeed have reference image, and additionally any number of input images in the input_frames batch, in any index. You mark the frames you want to keep as black with the input_mask, and the frames you want to change as white. |
Thank you very much for the clarification! I didn’t expect that the feature I wanted most had already been implemented. Now, I’m just looking forward to the final 14B model. |
Sorry, we only tested multi-control of separated fore/background and multiple objects. In your case, the scribble part is a bit too detail to be a independent object, and I believe expand it to the whole head would be better.
No, we didn't include camera motion data to VACE training. |
Hello, I would like to ask whether the layout_track task can automatically obtain the bbox and label from the original video |
Another use case I tried was creating a seamless transition between two different videos. I added a few frames of the first video at the beginning of the source video, followed by a solid gray frame, and then a few frames from the second video at the end. Then I created the corresponding mask. I tried about 15 times with both simple and complex prompts, but most of the time it just did a quick fade between the two videos. Only once did I get a result that was somewhat okay. Would training a LoRA with transition videos help, or would that just be a waste of time? |
Can't wait for 14B version... the 1.3B preview is really fantasic! Thanks for this! |
yes, check this out: Line 28 in 9ea71b5
different ways to do this. |
In my opinion, prompt is underestimated. Wan2.1 is very sensitive to the prompt. I suggest adjusting prompt and see. |
Why the 14B version is listed as 720x1080 and not as 720x1280? |
I've tested the dwpose with facial points and I personally do not think it is as good as training with Mediapipe Face instead.
|
Is it possible to use a controlnets as an initial frame and end frame instead of image? Does VACE have start frame and end frame? |
You mean using control frames such as pose/depth instead of natural video frames? Yes, VACE is very versatile and flexible to use. You might explore boldly and find. |
Thank you for the information. Mediapipe Face is great for lip sync. |
Oh, sorry for the mistake. To set this straight, the resolution of 14B version is 720x1280. |
Does it have start frame and end frame control? I don't think I've seen that in the project. |
We didn't exhaustively implement every preprocessing for all tasks, because there are so many ways to use VACE. Basically you can compose and input 3 types of frames in any order: natural video frames (with black masks to keep them in the output OR white masks to apply colorization), control frames (with white masks), and masked in/outpainting frames (with grey pixels and white mask covering these pixels). |
We are now working on the final 14b version, and we appreciate the voices from users and developers. Let us know if you have experience issues with VACE-Wan2.1-1.3B-Preview or other interesting new ideas that you believe VACE can achieve.
Describe the task you are doing with VACE in detail below, and you may get improvements on it with the final version VACE!
Please give us more feedback!
The text was updated successfully, but these errors were encountered: