Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use model for making predictions? #6

Open
adityakapri opened this issue Sep 12, 2019 · 6 comments
Open

How to use model for making predictions? #6

adityakapri opened this issue Sep 12, 2019 · 6 comments

Comments

@adityakapri
Copy link

Once the model has been rained how to do prediction using this?I have examples with no labels, i need to find all the predicted labels .

@ThilinaRajapakse
Copy link
Owner

ThilinaRajapakse commented Sep 12, 2019

Easiest way to do it would probably be something like this. I am setting label to 0 for all the examples, but the labels will not be used.

def tokenize(all_data):
    test_examples = [InputExample(0, sentence, None, '0') for sentence in all_data]
    label_list = ["0", "1"]

    num_labels = len(label_list)
    test_examples_len = len(test_examples)
    label_map = {label: i for i, label in enumerate(label_list)}

    test_features = convert_examples_to_features(test_examples, label_list, max_seq_len, tokenizer, output_mode,
        cls_token_at_end=bool('model_type' == 'xlnet'),            # xlnet has a cls token at the end
        cls_token=tokenizer.cls_token,
        cls_token_segment_id=2 if 'model_type' == 'xlnet' else 0,
        sep_token=tokenizer.sep_token,
        sep_token_extra=bool('model_type' == 'roberta'),
        pad_on_left=True,                 # pad on the left for xlnet
        pad_token=tokenizer.convert_tokens_to_ids([tokenizer.pad_token])[0],
        pad_token_segment_id= 4 if 'model_type' == 'xlnet' else 0)

    all_input_ids = torch.tensor([f.input_ids for f in test_features], dtype=torch.long)
    all_input_mask = torch.tensor([f.input_mask for f in test_features], dtype=torch.long)
    all_segment_ids = torch.tensor([f.segment_ids for f in test_features], dtype=torch.long)
    all_label_ids = torch.tensor([f.label_id for f in test_features], dtype=torch.long)

    test_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids)
    return test_data

def get_predictions(model, test_data):
    model.eval()
    test_sampler = SequentialSampler(test_data)
    eval_dataloader = DataLoader(test_data, sampler=test_sampler, batch_size=eval_batch_size)
    preds = None
    for batch in eval_dataloader:
        with torch.no_grad():
            batch = tuple(t for t in batch)
            inputs = {'input_ids': batch[0],
                  'attention_mask': batch[1],
                  'token_type_ids': batch[2],
                  'labels': batch[3]}
   

            outputs = model(**inputs)
            _, logits = outputs[:2]
        if not preds:
            preds = logits.detach().numpy()
        else:
            preds = np.append(preds, logits.detach().cpu().numpy(), axis=0)

        preds = np.argmax(preds, axis=1)

    return preds

You can use the tokenize() function to prepare the data, send it to get_predictions() and collect the predictions.

There may be cleaner ways of doing this but it didn't seem worth the trouble for me (the class specification for InputExample says label can be set to None for test data, but that would also require a lot more changes to the code). These two functions are adapted from something similar I wrote for an API that generates predictions. The API is working, so the approach is sound. However, I haven't tested the specific code I provided here, so let me know if it throws any bugs and I can see about fixing them.

@Magpi007
Copy link

Is not the get_mismatched function taking out wrong predictions? It could be possible to just adjust this function to get both right and wrong preds?

@ThilinaRajapakse
Copy link
Owner

It's certainly possible. It's original purpose was to give insight into examples that the model was getting wrong.

@Mahhos
Copy link

Mahhos commented Jan 28, 2020

I've got two questions.

  1. what is the format of all_data in def tokenize(all_data): function? Is it ".tsv" file in the same format as "train.tsv" and "dev.tsv"?
  2. Where to put these functions and how should we call these functions?

@Mahhos
Copy link

Mahhos commented Jan 29, 2020

When I am running the tokenize function, I am getting ValueError: Number of processes must be at least 1. However, when I print os.cpu_count() it shows 2. Do you have any idea why?

@djSharma7
Copy link

Can we get classification results on the basis of labels along with their polarities.
For example- The product is good, but the price is very high..
Results --
Product -Positive (Polarity)
Price - Negative (Polarity)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants