-
Notifications
You must be signed in to change notification settings - Fork 94
Converting Colab notebook results to CoNLL format #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I haven't used the notebook, so I might be missing something. If I understand this right, you're trying to convert the |
Thanks for the response. I've been trying that a few different ways, but it doesn't save any temp files. I'm running the evaluate script like this with $CHOSEN_MODEL being bert_base: !python evaluate.py $CHOSEN_MODEL train_path = ${data_dir}/train.english.128.jsonlines I have set eval_mode to false in the evaluate.py file because I don't need it to compare anything to CoNLL, I just need that temp file (CoNLL format that you mentioned). It does not produce any file though, just outputs the results (which are 0%, but that's expected). I have saved the output of the predict.py file as dev.english.128.jsonlines, so it's loading the output of predict as the dev set. Is this the correct way to use it? |
Yeah that's correct. The specific lines you need should be something like this:
IIRC, this should be the relevant code: https://github.com/mandarjoshi90/coref/blob/master/conll.py#L92 |
I get that output and the file in tmp directory, but it's always empty, so I'm not sure what's going on. |
I see. I'm not quite sure why you're getting the output in the jsonlines file but not the tmp file. I can't think of anything that's obviously wrong. Have you tried stepping through this function? At the very least, the variables https://github.com/mandarjoshi90/coref/blob/master/conll.py#L17 |
@linguist89 I’m also dealing with the same problem as you. Did you find any solution? |
@handesirikci I have not revisited the problem in a while because I've had my time taken up with a different project. I am going to have to get back to it soon though. However, I did come up with something that can be tried:
This might be an approach you could try. I will be revisiting this problem in a week or so, but if you manage to figure out before then please post it here. |
@linguist89 finally we have found a way to print results in tmp directory and also obtained the results of the evaluation. We found the tokenized and gold annotated thirty files in this repo. We gave the already tokenized data to model as input and the gold annotated format of the article in file named as "dev.english.v4_gold_conll". But you have to change the first column of the gold annotated file from doc id to genre name which is "nw" in our case. |
@handesirikci Thanks for the update. I've done everything you specified, but I get the following error: Traceback (most recent call last): I changed the name of the first column, but it's giving me this error. Did you just change it to "nw" or were there other characters as well? |
@linguist89 if you got this error message, you have to change name of genre to "nw_0". hope it works |
https://github.com/boberle/corefconversion |
I've been running the notebook and getting the results to work fine, but I want to convert the results into the CoNLL format so that I can compare documents from the CRAFT corpus using the LEA metric. Is there anyway to convert the output (i.e. the sample.out.txt) file to CoNLL format?
The text was updated successfully, but these errors were encountered: