Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fine tune on a new dataset #9

Open
un-lock-me opened this issue Oct 7, 2021 · 2 comments
Open

fine tune on a new dataset #9

un-lock-me opened this issue Oct 7, 2021 · 2 comments

Comments

@un-lock-me
Copy link

un-lock-me commented Oct 7, 2021

Hi @thu-coai @hzhwcmhf @MaLiN2223 @zqwerty @xiaotianzi @truthless11 and thanks so much for making your code available.
I want to fine tune the code on a new dataset that the format is very similar to IMDB dataset (it has a couple of sentences and label is positive/negative/neutral). Could you please advise on what changes I need to make?

I appreciate your time and help :).

@un-lock-me
Copy link
Author

Another question is that for preprocessing the new dataset do I need to all the script in this link: https://github.com/thu-coai/SentiLARE/tree/master/preprocess
If so, is there any order for doing that?

Thanks :)

@kepei1106
Copy link
Member

Hi, I suggest that you can follow these steps to adapt our codes to your own dataset:

  1. Prepare your own dataset in the same format as our provided raw dataset, such as IMDB. The link to download the raw dataset / preprocessed dataset is provided in README.
  2. Preprocess the raw dataset with our codes. If your task is sentence-level sentiment classfication, you should refer to prep_sent.py. You may need additional files like SentiWordNet and the representation of its glosses. We have mentioned this in our code.
  3. Run the classification code on your own dataset just as on IMDB. Some arguments may be modified such as the data path.

Hope this can help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants