fine tune on a new dataset #9

un-lock-me · 2021-10-07T05:27:34Z

Hi @thu-coai @hzhwcmhf @MaLiN2223 @zqwerty @xiaotianzi @truthless11 and thanks so much for making your code available.
I want to fine tune the code on a new dataset that the format is very similar to IMDB dataset (it has a couple of sentences and label is positive/negative/neutral). Could you please advise on what changes I need to make?

I appreciate your time and help :).

un-lock-me · 2021-10-07T05:58:13Z

Another question is that for preprocessing the new dataset do I need to all the script in this link: https://github.com/thu-coai/SentiLARE/tree/master/preprocess
If so, is there any order for doing that?

Thanks :)

kepei1106 · 2021-11-17T07:19:23Z

Hi, I suggest that you can follow these steps to adapt our codes to your own dataset:

Prepare your own dataset in the same format as our provided raw dataset, such as IMDB. The link to download the raw dataset / preprocessed dataset is provided in README.
Preprocess the raw dataset with our codes. If your task is sentence-level sentiment classfication, you should refer to prep_sent.py. You may need additional files like SentiWordNet and the representation of its glosses. We have mentioned this in our code.
Run the classification code on your own dataset just as on IMDB. Some arguments may be modified such as the data path.

Hope this can help you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fine tune on a new dataset #9

fine tune on a new dataset #9

un-lock-me commented Oct 7, 2021 •

edited

Loading

un-lock-me commented Oct 7, 2021

kepei1106 commented Nov 17, 2021

fine tune on a new dataset #9

fine tune on a new dataset #9

Comments

un-lock-me commented Oct 7, 2021 • edited Loading

un-lock-me commented Oct 7, 2021

kepei1106 commented Nov 17, 2021

un-lock-me commented Oct 7, 2021 •

edited

Loading