Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving SAT takes forever #20

Open
hankcs opened this issue Apr 4, 2022 · 2 comments
Open

Solving SAT takes forever #20

hankcs opened this issue Apr 4, 2022 · 2 comments

Comments

@hankcs
Copy link

hankcs commented Apr 4, 2022

Dear authors,

Thank you for releasing your wonderful code, it really helped my understanding of your paper. If you don't mind, I have a question regarding data preprocessing. It just takes forever to solve SAT using the base_amr.yaml config.

Console logs:

Loading the cached dataset
Max number of permutations to resolve assignment ambiguity: 165198623617843200000
... reduced to 2048 permutations with max of 24 greedily resolved assignments
0 erroneously matched sentences with companion

57274 sentences in the train split
3460 sentences in the validation split
2457 sentences in the test split
789678 nodes in the train split
properties:  ['transformed']
Edge frequency: 5.17 %
4319 words in the relative label vocabulary
114 words in the edge label vocabulary
242 characters in the vocabulary
Caching the dataset...

0 erroneously matched sentences with companion
Generating possible rules using 4 CPUs...
Solving SAT...

It has been hanging on this line for days.
I'm using a server with power CPUs (Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz) and hundreds of GBs of memories.

@davda54
Copy link
Collaborator

davda54 commented Apr 4, 2022

Hi, thanks for using PERIN! Generally, solving the SAT problem can take a couple of hours and it’s quite possible that the algorithm will not be able to find any solution in a reasonable time for a custom dataset. The SAT heuristics can be very unpredictable. Are you using the official AMR dataset from MRP2020 or some other one?

You can also use a greedy search for a suboptimal solution, for this exact reason. Just call the function get_smallest_rule_set with approximate=True and it will run a faster algorithm. The solution will not be as good, but it shouldn't lead to any significant performance drop.

@hankcs
Copy link
Author

hankcs commented Apr 4, 2022

Thanks for your prompt rely. I'm using MRP2020_Train_Dev-2020CoNLL_CFMRP_LDC2020E05.tgz from LDC, which might not be exactly the same with the one used in MRP2020 competion. Maybe split_dataset.sh creates random split too?

I'll try approximate=True and other solvers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants