Add a LSTM-CRF model at Conlll2003 Dataset #122

hazelnutsgz · 2019-01-10T08:16:14Z

Description

Add the LSTM-CRF model for Conll2003 dataset at reproduction dir based on fastNLP lib, inspired by the paper https://arxiv.org/pdf/1508.01991.pdf

Main reason

Provide a new demo for how fastNLP can facilitate the development of the deep learning model. FYI:
https://github.com/hazelnutsgz/fastNLP/tree/hazelnutsgz-crf-lstm/reproduction/LSTM-CRF

Checklist 检查下面各项是否完成

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (例如[bugfix]修复bug，[new]添加新功能，[test]修改测试，[rm]删除旧代码)
Changes are complete (i.e. I finished coding on this PR) 修改完成才提PR
All changes have test coverage 修改的部分顺利通过测试。对于fastnlp/fastnlp/的修改，测试代码必须提供在fastnlp/test/。
Code is well-documented 注释写好，API文档会从注释中抽取
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change 修改导致例子或tutorial有变化，请找核心开发人员

Changes

Interactive jupyter notebook
Well-structured codebase for training & testing
A README file for the instruction

Mention:

@yhcc @xpqiu @FengZiYjun @2017alan

Update README

codecov-io · 2019-01-10T08:19:33Z

Codecov Report

Merging #122 into master will decrease coverage by 0.11%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master    #122      +/-   ##
=========================================
- Coverage   70.31%   70.2%   -0.12%     
=========================================
  Files          82      82              
  Lines        5407    5407              
=========================================
- Hits         3802    3796       -6     
- Misses       1605    1611       +6

Impacted Files	Coverage Δ
fastNLP/models/biaffine_parser.py	`94.44% <0%> (-2.23%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef82c1f...e67e354. Read the comment docs.

xpqiu · 2019-01-10T09:26:00Z

Great!
the logs and binary files are unnecessary to be committed.

Update README

hazelnutsgz · 2019-01-10T11:15:33Z

OK, I have updated my commit just now, thanks for your careful review.

xuyige · 2019-01-11T06:47:29Z

i think the data as well as the training code may not necessary in reproduction
the reproduction should contain a trained model that can be directly used

hazelnutsgz · 2019-01-11T07:29:41Z

i think the data as well as the training code may not necessary in reproduction
the reproduction should contain a trained model that can be directly used

Thanks for your comments @xuyige , the followings are my replies and proposals:

Reply

Could you please elaborate the meaning of "the training code"? I am a little confused.
I think the dataset, say, the Conll2003, is necessary for fastNLP users to reproduce the work, the reasons are as follows:
1. Acquiring the Conll2003 dataset with fastNLP is not as easy as acquiring mnist dataset by network api provided by the framework(tf&torch), so the preloaded dataset is necessary for the NLP novice to ramp up with the project.
2. Some other projects under the reproduction directory, say, Char-aware_NLM , also introduced the train.txt, test.txt

Proposal

Based on the design of how tf&pytorch loaded the mnist dataset(by network), I think the fastNLP may consider the data downloading APIs for some widely acknowledged NLP datasets, eg, SQUAD.

xuyige · 2019-01-12T17:06:54Z

i think the data as well as the training code may not necessary in reproduction
the reproduction should contain a trained model that can be directly used

Thanks for your comments @xuyige , the followings are my replies and proposals:

Reply

Could you please elaborate the meaning of "the training code"? I am a little confused.

I think the dataset, say, the Conll2003, is necessary for fastNLP users to reproduce the work, the reasons are as follows:

Acquiring the Conll2003 dataset with fastNLP is not as easy as acquiring mnist dataset by network api provided by the framework(tf&torch), so the preloaded dataset is necessary for the NLP novice to ramp up with the project.

Some other projects under the reproduction directory, say, Char-aware_NLM , also introduced the train.txt, test.txt

Proposal

Based on the design of how tf&pytorch loaded the mnist dataset(by network), I think the fastNLP may consider the data downloading APIs for some widely acknowledged NLP datasets, eg, SQUAD.

i am so regret to point out that the char-aware-nlm were borrowed from other projects.
it is an outdated version. codes haven't been updated for months.
the data downloading has been considered, but a server is need for downloading, so we put it into todo list
the preload dataset is a good suggestion, we will discuss it soon

LinyangLee

should be working, codes seem to be fine
but things still don't add up
I am currently working on this one

hazelnutsgz · 2019-01-18T02:28:53Z

should be working, codes seem to be fine
but things still don't add up
I am currently working on this one

Thanks for your review~

Hou-jing · 2022-06-26T12:11:27Z

Is the util file missing here? When I run it, I am prompted that there is no load_data function.

yhcc · 2022-06-27T04:50:10Z

Yes, we don't have load_data function. You may use an old version.

hazelnutsgz added 5 commits January 9, 2019 15:19

Add a new demo

2555015

Refract the codebase, build the workflow pipeline

9d5adfa

Add new dataset

58b1ed0

Add choosing device option

8130bd1

Update README

5378f5b

Update README

hazelnutsgz changed the title ~~Hazelnutsgz crf lstm~~ Add a LSTM-CRF model at Conlll2003 Dataset Jan 10, 2019

Update README

8128e6a

Update README

FengZiYjun requested review from FengZiYjun and xuyige January 11, 2019 02:59

Fix a bug in metric calculate

e67e354

xpqiu requested a review from LinyangLee January 17, 2019 07:03

LinyangLee reviewed Jan 17, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a LSTM-CRF model at Conlll2003 Dataset #122

Add a LSTM-CRF model at Conlll2003 Dataset #122

hazelnutsgz commented Jan 10, 2019

codecov-io commented Jan 10, 2019 •

edited

xpqiu commented Jan 10, 2019

hazelnutsgz commented Jan 10, 2019

xuyige commented Jan 11, 2019

hazelnutsgz commented Jan 11, 2019

xuyige commented Jan 12, 2019

Reply

Proposal

LinyangLee left a comment

hazelnutsgz commented Jan 18, 2019

Hou-jing commented Jun 26, 2022

yhcc commented Jun 27, 2022

Add a LSTM-CRF model at Conlll2003 Dataset #122

Are you sure you want to change the base?

Add a LSTM-CRF model at Conlll2003 Dataset #122

Conversation

hazelnutsgz commented Jan 10, 2019

Description

Main reason

Checklist 检查下面各项是否完成

Changes

Mention:

codecov-io commented Jan 10, 2019 • edited

Codecov Report

xpqiu commented Jan 10, 2019

hazelnutsgz commented Jan 10, 2019

xuyige commented Jan 11, 2019

hazelnutsgz commented Jan 11, 2019

Reply

Proposal

xuyige commented Jan 12, 2019

Reply

Proposal

LinyangLee left a comment

Choose a reason for hiding this comment

hazelnutsgz commented Jan 18, 2019

Hou-jing commented Jun 26, 2022

yhcc commented Jun 27, 2022

codecov-io commented Jan 10, 2019 •

edited