Why is the length of token_log_probabilities always one greater than the length of predictions? 