Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ambiguous features on Swedish verbs #6

Open
2 tasks done
andrewtavis opened this issue Jan 3, 2022 · 8 comments
Open
2 tasks done

Fix ambiguous features on Swedish verbs #6

andrewtavis opened this issue Jan 3, 2022 · 8 comments
Labels
data Relates to data or Wikidata help wanted Extra attention is needed

Comments

@andrewtavis
Copy link
Member

andrewtavis commented Jan 3, 2022

Terms

Languages

Swedish

Description

Many Swedish verbs have ambiguous features that don't allow their conjugations to be properly classified. Specifically, there are doubles of many feature sets, as can be seen on the Wikidata page for the verb överge. These duplicates should be distinguished, and the formatting script for Swedish verbs should be updated, as it is now written to remove any verb that has a duplicate value caused by ambiguous features.

@andrewtavis andrewtavis added good first issue Good for newcomers help wanted Extra attention is needed data Relates to data or Wikidata labels Jan 3, 2022
@andrewtavis andrewtavis transferred this issue from scribe-org/Scribe-iOS Mar 29, 2022
@Ainali
Copy link

Ainali commented Aug 12, 2022

Just a quick note that the formatting script for Swedish verbs has moved to src/scribe_data/extract_transform/Swedish/verbs/format_verbs.py.

@andrewtavis
Copy link
Member Author

Updated, @Ainali! Thank you 🙏

@Ainali
Copy link

Ainali commented Aug 12, 2022

I'll be using this query to clean up most of the data errors. Around 80-85% of the results there should be split into two separate lexemes. The rest are cases where there really are two acceptable forms. However, in several of these, one of the forms is not modern and should be marked as such. The query should probably check for language style (P6191) and filter some values.

@andrewtavis
Copy link
Member Author

This is so epic, @Ainali 😊 Thanks so much! Would be happy to talk with you after the hackathon about what changes need to happen to the query. After a check in I can try to make the changes, or we can do a quick call to talk over what needs to change. Whatever works best for you :)

Really happy to have this issue getting some love!

@Ainali
Copy link

Ainali commented Aug 17, 2022

I have now split all the ones that needed to be split into different lexemes. The ones that are left (21 in the query above) are probably mostly synonyms, but I have asked around to see if there is something grammatical that could be added to them to highlight any eventual nuance between them.

@andrewtavis
Copy link
Member Author

I was thinking about messaging you about this 😊 Really thanks so much for your efforts!

Do I need to change anything in the query, or can I just run the normal update process? We still have some minor bug fixes for autocomplete and will add in a basic autosuggest prior to the next release, but we should have it out by say the end of next week :)

@Ainali
Copy link

Ainali commented Aug 19, 2022

For now, it will just be an improvement if you run the normal update process. But I think we should keep the issue open to figure out the last remaining part.

@andrewtavis
Copy link
Member Author

Sounds great, thanks @Ainali :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Relates to data or Wikidata help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants