Unable to see training process #1060

ismailkattar · 2024-08-25T09:52:17Z

ismailkattar
Aug 25, 2024

As u can see I can only see how much it took after the epoch training end. what can i do to see the progress while the epoch is running

LuluW8071 · 2024-08-25T10:49:56Z

LuluW8071
Aug 25, 2024

@ismailkattar You can iterate print statements on certain steps like:

def train_step(model, dataloader, loss_fn, optimizer, device, epoch):
    model.train()

    # Setup train loss and accuracy values
    train_loss, train_acc = 0, 0

    # Loop through dataloader data batches
    for step, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device) 

        y_pred = model(X)
        loss = loss_fn(y_pred, y) 
        train_loss += loss.item() 
        optimizer.zero_grad()  
        loss.backward()
        optimizer.step() 

        y_pred_class = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
        train_acc += (y_pred_class == y).sum().item() / len(y_pred)

        if step % 500 == 0:
            avg_loss = train_loss / (step + 1)
            avg_acc = train_acc / (step + 1)
            print(f"Step {step+1}/{len(dataloader)}: Avg Loss = {avg_loss:.4f}, Avg Accuracy = {avg_acc:.4f}")

    # Adjust metrics to get avg. loss and accuracy per batch 
    train_loss = train_loss / len(dataloader)
    train_acc = train_acc / len(dataloader)
    return train_loss, train_acc

This should print statements on each 500 steps of total steps. You can change the value as per ur need.

Note:
I have adjusted the printing on step iteration to show average or mean values of losses and accuracy. Now it should display both steps and epochs metrics.

0 replies

mrdbourke · 2024-08-28T08:05:41Z

mrdbourke
Aug 28, 2024
Maintainer

Hey @ismailkattar ,

You can create a progress_bar with tqdm like below and update it during training with the set_postfix method:

# Training loop with tqdm
epochs = 5
for epoch in range(epochs):
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    # Use tqdm to wrap the dataloader
    progress_bar = tqdm(dataloader, desc=f"Epoch {epoch+1}/{epochs}")

    for inputs, labels in progress_bar:
        optimizer.zero_grad()
        
        # Forward pass
        outputs = model(inputs)
        loss = loss_fn(outputs, labels)
        
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        
        # Update running loss
        running_loss += loss.item() * inputs.size(0)

        # Calculate accuracy
        _, predicted = torch.max(outputs, 1)
        correct_predictions += (predicted == labels).sum().item()
        total_predictions += labels.size(0)
        
        # Update tqdm description with metrics
        progress_bar.set_postfix({
            "loss": running_loss / total_predictions,
            "accuracy": correct_predictions / total_predictions
        })
    
    # Print epoch summary
    print(f"Epoch {epoch+1}/{epochs} - Loss: {running_loss / total_predictions:.4f}, Accuracy: {correct_predictions / total_predictions:.4f}")

This should continually print out metric updates to the progress bar as the model trains.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to see training process #1060

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Unable to see training process #1060

Uh oh!

ismailkattar Aug 25, 2024

Replies: 2 comments

Uh oh!

Uh oh!

LuluW8071 Aug 25, 2024

Uh oh!

Uh oh!

mrdbourke Aug 28, 2024 Maintainer

ismailkattar
Aug 25, 2024

LuluW8071
Aug 25, 2024

mrdbourke
Aug 28, 2024
Maintainer