Skip to content

eparkatgithub/KOCOH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

KOCOH (KOrean COntext-dependent Hate speech) Dataset

This is Korean context-dependent hate speech dataset, KOCOH.
한국어 설명

Main concept

  • Hate speech
    • Verbal or non-verbal expressions that propagate or promote prejudice or discrimination against minorities (groups with common identity who have relatively diminished political and social power), or denigrate, insult, or threaten individuals or groups based on their attributes as minorities, or incites discrimination, hostility, or violence against them (Hong, 2018)
  • Context-dependent hate speech
    • Hate speech interpreted as hateful through contextual factors rather than explicit content alone
    • In other words, without context, the statement itself may appear neutral or even positive in interpretation
    • Examples
      Context Comment Hate
      여성 고용이 미흡해 정부가 불이익을 준 기업의 이름이 공개되었다.
      A list of companies penalized by the government for inadequate female employment was released
      믿을 만한 그룹이라는 얘기네
      This means they're a trustworthy group.
      1
      직원에게 높은 수준의 복지를 제공한 기업의 이름이 공개되었다.
      A list of companies that provide high-level welfare benefits to employees was released.
      믿을 만한 그룹이라는 얘기네
      This means they're a trustworthy group.
      0

Data Description

  • Source: Dcinside Real-time Best Gallery
  • Period: 2024/06/02-23
  • Size
    Total Type 1 Type 2 Type 3
    2,005 539 539 927
  • Types and examples
    Context Comment Hate
    Type 1 Actually written context A Actually written comment C 1
    광주광역시의 맛있는 음식 다섯 가지를 소개했다.
    Introduced five delicious foods from Gwangju Metropolitan City.
    먹고 싶은데 여권 들고 가기 귀찮아::
    I want to eat them but it's annoying to bring my passport.
    Type 2 Created context B Comment D(same as comment C) 0
    일본 오사카의 맛있는 음식 다섯 가지를 소개했다.
    Introduced five delicious foods from Osaka, Japan.
    먹고 싶은데 여권 들고 가기 귀찮아::
    I want to eat them but it's annoying to bring my passport.
    Type 3 Actually written context A Actually written comment E 0
    광주광역시의 맛있는 음식 다섯 가지를 소개했다.
    Introduced five delicious foods from Gwangju Metropolitan City.
    역시 광주다
    That's Gwangju for you.
  • Columns
    Column Description
    index Data index
    set Post index
    type Type number (1~3 labeling)
    date The date the DC Inside post was written
    link The link to the DC Inside post
    title The title of the DC Inside post
    context Summary of the DC Inside post content / Created context
    comment Comments collected from DC Inside
    hate speech Whether it is hate speech (0 or 1 labeling)
    gender Target: gender (0 or 1 labeling)
    disability Target: disability (0 or 1 labeling)
    race/nation Target: race/nation (0 or 1 labeling)
    region Target: region (0 or 1 labeling)
    age Target: age (0 or 1 labeling)
    note Ikiyano style (-no)

Reference

Hong, S. (2018). When Words Hurt. Across.

Citation

@article{ART003173761,
author={박은아 and 송상헌},
title={KOCOH: 맥락 의존적 혐오 표현 탐지를 위한 데이터 세트},
journal={한국어학},
issn={1226-9123},
year={2025},
volume={106},
pages={251-277}
}

License

TBD

About

KOCOH (KOrean COntext-dependent Hate speech) Dataset

Topics

Resources

Stars

Watchers

Forks