A question about the implementation of loss function #685

mkxdxdxd · 2025-01-23T07:25:08Z

mkxdxdxd
Jan 23, 2025

Hi, I was reading your code and found the part that disagrees with my knowledge.

In CogVideo/finetune/models/cogvideox_i2v/lora_trainer.py compute_loss, there's a denoising procedure.

# Denoise
        latent_pred = self.components.scheduler.get_velocity(predicted_noise, latent_noisy, timesteps)

        alphas_cumprod = self.components.scheduler.alphas_cumprod[timesteps]
        weights = 1 / (1 - alphas_cumprod)
        while len(weights.shape) < len(latent_pred.shape):
            weights = weights.unsqueeze(-1)

        loss = torch.mean((weights * (latent_pred - latent) ** 2).reshape(batch_size, -1), dim=1)
        loss = loss.mean()

        return loss

In the last part, you defined a loss using latent_pred and latent.
latent_pred corresponds to velocity at time step t, whereas latent corresponds to ground truth latent image representation.

As far as I know, velocity does not directly correspond to latent image representation, I think there should be one more line reconstructing from velocity to latent image space.

If this reconstruction is not necessary, could you explain how this loss works during training?

Thank you!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about the implementation of loss function #685

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

A question about the implementation of loss function #685

mkxdxdxd Jan 23, 2025

Replies: 0 comments

mkxdxdxd
Jan 23, 2025