Converting Colab notebook results to CoNLL format #74

linguist89 · 2020-12-16T13:01:12Z

I've been running the notebook and getting the results to work fine, but I want to convert the results into the CoNLL format so that I can compare documents from the CRAFT corpus using the LEA metric. Is there anyway to convert the output (i.e. the sample.out.txt) file to CoNLL format?

mandarjoshi90 · 2020-12-16T18:42:12Z

I haven't used the notebook, so I might be missing something. If I understand this right, you're trying to convert the jsonlines output of predict.py. Perhaps, you could directly use evaluate.py? It will create a temp file in CoNLL format (just look through the log of evaluate.py for a file in /tmp) which is processed by the Perl script.

linguist89 · 2020-12-17T12:36:46Z

Thanks for the response. I've been trying that a few different ways, but it doesn't save any temp files. I'm running the evaluate script like this with $CHOSEN_MODEL being bert_base: !python evaluate.py $CHOSEN_MODEL
and the directories in the evironment.conf file is as follows:

train_path = ${data_dir}/train.english.128.jsonlines
eval_path = ${data_dir}/dev.english.128.jsonlines
conll_eval_path = ${data_dir}/dev.english.v4_gold_conll

I have set eval_mode to false in the evaluate.py file because I don't need it to compare anything to CoNLL, I just need that temp file (CoNLL format that you mentioned). It does not produce any file though, just outputs the results (which are 0%, but that's expected). I have saved the output of the predict.py file as dev.english.128.jsonlines, so it's loading the output of predict as the dev set. Is this the correct way to use it?

mandarjoshi90 · 2020-12-18T03:00:30Z

Yeah that's correct. The specific lines you need should be something like this:

Use standard file APIs to check for files with this prefix.
Loaded 343 eval examples.
Predicted conll file: /tmp/tmpi6yrp0jo
Official result for muc
version: 8.01 /data/BERT-coref/coref/conll-2012/scorer/v8.01/lib/CorScorer.pm

====== TOTALS =======
...

IIRC, this should be the relevant code: https://github.com/mandarjoshi90/coref/blob/master/conll.py#L92

linguist89 · 2020-12-21T09:29:50Z

I get that output and the file in tmp directory, but it's always empty, so I'm not sure what's going on.
Also with the conll.py file, I can't really make heads or tails of how the input data should be structured. Is there a way to figure that about because I've tried to work through the code, but can't get results from it. Any advice?

mandarjoshi90 · 2020-12-21T19:05:47Z

I see. I'm not quite sure why you're getting the output in the jsonlines file but not the tmp file. I can't think of anything that's obviously wrong. Have you tried stepping through this function? At the very least, the variables predictions and subtoken_map should be populated, and if so, that would indicate a problem further down the pipeline.

https://github.com/mandarjoshi90/coref/blob/master/conll.py#L17

handesirikci · 2021-01-29T12:08:38Z

@linguist89 I’m also dealing with the same problem as you. Did you find any solution?

linguist89 · 2021-01-29T12:15:33Z

@handesirikci I have not revisited the problem in a while because I've had my time taken up with a different project. I am going to have to get back to it soon though. However, I did come up with something that can be tried:

Using the Colab Notebook's code for converting the BERT output into human-readable clusters, I think you could add those clusters to a CoNLL-parsed (without coref) version of the output.
There are so many different ways people have CoNLL formatted (i.e. not a consistent amount of columns), but what I've seen is that Coref is always the last column.

This might be an approach you could try. I will be revisiting this problem in a week or so, but if you manage to figure out before then please post it here.

handesirikci · 2021-02-17T11:52:34Z

@linguist89 finally we have found a way to print results in tmp directory and also obtained the results of the evaluation. We found the tokenized and gold annotated thirty files in this repo. We gave the already tokenized data to model as input and the gold annotated format of the article in file named as "dev.english.v4_gold_conll". But you have to change the first column of the gold annotated file from doc id to genre name which is "nw" in our case.

linguist89 · 2021-02-17T16:48:10Z

@handesirikci Thanks for the update. I've done everything you specified, but I get the following error:

Traceback (most recent call last):
File "evaluate.py", line 26, in
model.evaluate(session, official_stdout=True, eval_mode=True)
File "/content/coref/independent.py", line 564, in evaluate
conll_results = conll.evaluate_conll(self.config["conll_eval_path"], coref_predictions, self.subtoken_maps, official_stdout )
File "/content/coref/conll.py", line 95, in evaluate_conll
output_conll(gold_file, prediction_file, predictions, subtoken_maps)
File "/content/coref/conll.py", line 46, in output_conll
start_map, end_map, word_map = prediction_map[doc_key]
KeyError: 'nw_0'

I changed the name of the first column, but it's giving me this error. Did you just change it to "nw" or were there other characters as well?

handesirikci · 2021-02-17T17:49:34Z

@linguist89 if you got this error message, you have to change name of genre to "nw_0". hope it works

sayalighodekar · 2021-02-19T08:09:35Z

I've been running the notebook and getting the results to work fine, but I want to convert the results into the CoNLL format so that I can compare documents from the CRAFT corpus using the LEA metric. Is there anyway to convert the output (i.e. the sample.out.txt) file to CoNLL format?

https://github.com/boberle/corefconversion
hope this repo helps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting Colab notebook results to CoNLL format #74

Converting Colab notebook results to CoNLL format #74

linguist89 commented Dec 16, 2020

mandarjoshi90 commented Dec 16, 2020 •

edited

Loading

linguist89 commented Dec 17, 2020

mandarjoshi90 commented Dec 18, 2020

linguist89 commented Dec 21, 2020

mandarjoshi90 commented Dec 21, 2020

handesirikci commented Jan 29, 2021

linguist89 commented Jan 29, 2021

handesirikci commented Feb 17, 2021

linguist89 commented Feb 17, 2021

handesirikci commented Feb 17, 2021

sayalighodekar commented Feb 19, 2021

Converting Colab notebook results to CoNLL format #74

Converting Colab notebook results to CoNLL format #74

Comments

linguist89 commented Dec 16, 2020

mandarjoshi90 commented Dec 16, 2020 • edited Loading

linguist89 commented Dec 17, 2020

mandarjoshi90 commented Dec 18, 2020

linguist89 commented Dec 21, 2020

mandarjoshi90 commented Dec 21, 2020

handesirikci commented Jan 29, 2021

linguist89 commented Jan 29, 2021

handesirikci commented Feb 17, 2021

linguist89 commented Feb 17, 2021

handesirikci commented Feb 17, 2021

sayalighodekar commented Feb 19, 2021

mandarjoshi90 commented Dec 16, 2020 •

edited

Loading