Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal 0002 - Labelling UI & Managing user/AI labels #37

Open
Bossett opened this issue Jul 10, 2023 · 0 comments
Open

Proposal 0002 - Labelling UI & Managing user/AI labels #37

Bossett opened this issue Jul 10, 2023 · 0 comments

Comments

@Bossett
Copy link

Bossett commented Jul 10, 2023

If the approach is to be federated moderation services, there's an opportunity to enforce some rules that will enhance the user experience when posting content that will be moderated by automated tools.

New required endpoints

Labelling services should be required to implement a moderation check endpoint, which should return a label, or a non-label indicating that content will be human-reviewed (or cannot be reviewed in real time). This endpoint should take a complete post object, but not a post that has already been committed to a repo.

The purpose of this endpoint is to allow an end-user, through the/an app, to 'pre-check' a post before it gets posted.

New UI functions

On posting, similar to the language selection, users should be prompted to assign a label they think is appropriate. This should follow the model set by the language dialog, but maybe display in a menu for content types, etc..

Submission logic

During the Post process, the app should check the endpoint, and if the user-desired label is of a lower 'level' than the value returned by the endpoint, the user should be warned and allowed to 'upgrade' the rating. The user can choose to 'post anyway'.

The app & services should treat the user provided label as correct. Human moderators (or through reporting feedback, etc.) can now focus on downgraded posts to identify problem users, and where moderators agree with a user rating, that can be used to train moderation models.

New Policy

The AI/etc. does not need to be 100% in this model. It just needs to be good enough that it doesn't overwhelm moderators with bad decisions.

Implementation will require specific new policy that describes the different ratings. Because of the user-provided labels, there should be an easy basis for appeals, etc. if the rules for each rating are objective enough.

Future application & interaction with multiple labelling services

When federating labelling services, there will be conflict between results from different labelling tools. When a user subscribes to a specific labelling service, it should be possible to place it in a ranked list (possibly per category but this may be a UI nightmare). The highest rank wins and gets to label the post. "User Provided" should be in this ranked list and default to the top.

Labelling services should be allowed to return "I won't label this" for some material (hence ranks - so a service can be specifically for hate speech without requiring it to know about impersonation, and it can 'fall through').

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant