Skip to content

Utsuho is a Python module that facilitates bidirectional conversion between half-width katakana and full-width katakana in Japanese.

License

Notifications You must be signed in to change notification settings

juno-rmks/utsuho

Repository files navigation

Utsuho

Utsuho is a Python module that facilitates bidirectional conversion between half-width katakana and full-width katakana in Japanese. Furthermore, it offers bidirectional conversion between hiragana and katakana.

Note

From version 2.0.0 onward, you can now perform bidirectional conversion between hiragana and katakana.

Background and Goals

In the Japanese character set, both half-width and full-width characters exist. In Japanese, the same data can be represented in either half-width or full-width characters. However, there is no standard for data representation in either half-width or full-width characters. When using Japanese data, you may often encounter inconsistencies between half-width and full-width characters.

In the Python standard library, Unicode string normalization enables the conversion from half-width katakana to full-width katakana. However, this process may involve unnecessary transformations, such as decomposing composite characters and converting full-width alphanumeric symbols to half-width. Furthermore, there is no support for the reverse conversion from full-width katakana to half-width katakana.

Utsuho supports bidirectional conversion between half-width and full-width katakana in Japanese without unnecessary transformations. Additionally, by providing a means to unify various Japanese representations, Utsuho aims to enhance the utility of Japanese data.

Origin of the Name "Utsuho"

The name "Utsuho" originates from "Utsuho Monogatari," believed to have been composed during the Heian period. This narrative includes descriptions related to learning katakana.

Installing

Install and update using pip:

pip install Utsuho

Usage

Using Utsuho, you can perform bidirectional conversion between half-width and full-width characters, as well as between hiragana and katakana.

Conversion from Half-Width to Full-Width Characters

To convert from half-width to full-width characters, use the following code:

from utsuho import HalfToFullConverter

halfwidth = 'キョウトシ サキョウク ギンカクジチョウ 2'
cnv = HalfToFullConverter()
fullwidth = cnv.convert(halfwidth)

In the above example, the variable fullwidth will store the conversion result as "キョウトシ サキョウク ギンカクジチョウ 2".

Conversion from Full-Width to Half-Width Characters

To convert from full-width to half-width characters, use the following code:

from utsuho import FullToHalfConverter

fullwidth = 'キョウトシ サキョウク ギンカクジチョウ 2'
cnv = FullToHalfConverter()
halfwidth = cnv.convert(fullwidth)

In the above example, the variable halfwidth will store the conversion result as "キョウトシ サキョウク ギンカクジチョウ 2".

Conversion from Hiragana to Katakana

To convert from hiragana to katakana, use the following code:

from utsuho import HiraganaToKatakanaConverter

hiragana = 'きょうとし さきょうく ぎんかくじちょう 2'
cnv = HiraganaToKatakanaConverter()
katakana = cnv.convert(hiragana)

In the above example, the variable katakana will store the conversion result as "キョウトシ サキョウク ギンカクジチョウ 2".

Conversion from Katakana to Hiragana

To convert from katakana to hiragana, use the following code:

from utsuho import KatakanaToHiraganaConverter

katakana = 'キョウトシ サキョウク ギンカクジチョウ 2'
cnv = KatakanaToHiraganaConverter()
hiragana = cnv.convert(katakana)

In the above example, the variable hiragana will store the coversion result as "きょうとし さきょうく ぎんかくじちょう 2".

Configuration of Conversion Behaviors Between Half-Width and Full-Width Characters

The conversion behaviors can be configured by passing an instance of the WidthConverterConfig class as an argument to the constructors of either the HalfToFullConverter class or the FullToHalfConverter class.

Note

From version 2.0.0 onward, you must use WidthConverterConfig instead of ConverterConfig.

from utsuho import WidthConverterConfig, HalfToFullConverter

conf = WidthConverterConfig(
  ascii_symbol=False,   # Disable conversion of ASCII symbols
  ascii_digit=False,    # Disable conversion of ASCII digits
  ascii_alphabet=False, # Disable conversion of ASCII alphabets
)
cnv = HalfToFullConverter(conf)

The conversion behaviors that can be configured with the WidthConverterConfig class are as follows:

Parameter Default Value Description
punctuation True Whether to convert punctuations.
corner_bracket True Whether to convert corner brackets.
conjunction_mark True Whether to convert conjunction marks.
length_mark True Whether to convert length marks.
space True Whether to convert spaces.
ascii_symbol True Whether to convert ASCII symbols.
ascii_digit True Whether to convert digits.
ascii_alphabet True Whether to convert alphabets.
wave_dash False Whether to convert wave dashes.

CLI (Command Line Interface)

Utsuho not only serves as a Python library but also provides a straightforward command-line interface that can be used interactively.

Syntax

You can use the --help option to show the CLI syntax.

% utsuho --help
Usage: utsuho [OPTIONS] COMMAND [ARGS]...

  Utsuho is a Python module that facilitates bidirectional conversion between
  half-width katakana and full-width katakana in Japanese. Furthermore, it
  offers bidirectional conversion between hiragana and katakana.

Options:
  --version  Show the version.
  --help     Show this message and exit.

Commands:
  full-to-half          Convert from full-width to half-width characters.
  half-to-full          Convert from half-width to full-width characters.
  hiragana-to-katakana  Convert from hiragana to katakana.
  katakana-to-hiragana  Convert from katakana to hiragana.

--version Option

You can use the --version option to display the version of Utsuho. After displaying the version, Utsuho will exit.

If specified along with other options or commands, Utsuho will display its version and exit.

% utsuho --version
Utsuho x.x.x

Commands

Utsuho provides the following commands:

  • full-to-half Command

    This command performs the conversion from full-width to half-width characters.

  • half-to-full Command

    This command performs the conversion from half-width to full-width characters.

  • hiragana-to-katakana Command

    This command executes the conversion from hiragana to katakana.

  • katakana-to-hiragana Command

    This command executes the conversion from katakana to hiragana.

full-to-half Command

This command performs the conversion from full-width to half-width characters.

You can use the --help option to show the command syntax.

% utsuho full-to-half --help
Usage: utsuho full-to-half [OPTIONS] TEXT

  Convert from full-width to half-width characters.

Options:
  -f, --file  Whether to use TEXT as a file path.
  --help      Show this message and exit.

You can convert full-width characters contained in the TEXT to half-width characters. The conversion result will be output to the standard output.

% utsuho full-to-half "キョウトシ サキョウク ギンカクジチョウ 2"
キョウトシ サキョウク ギンカクジチョウ 2

--file Option

The TEXT parameter is treated as the path to a file that contains the string to be converted.

You can convert full-width characters in the file to half-width characters. The conversion result will be output to the standard output.

Create a file named "full.txt" containing full-width characters.

full.txt:

キョウトシ サキョウク ギンカクジチョウ 2

Execute the command with the --file option and the file path "full.txt":

% utsuho full-to-half --file full.txt
キョウトシ サキョウク ギンカクジチョウ 2

half-to-full Command

This command performs the conversion from half-width to full-width characters.

You can use the --help option to show the command syntax.

% utsuho half-to-full --help
Usage: utsuho half-to-full [OPTIONS] TEXT

  Convert from half-width to full-width characters.

Options:
  -f, --file  Whether to use TEXT as a file path.
  --help      Show this message and exit.

You can convert half-width characters contained in the TEXT to full-width characters. The conversion result will be output to the standard output.

% utsuho half-to-full "キョウトシ サキョウク ギンカクジチョウ 2"
キョウトシ サキョウク ギンカクジチョウ 2

--file Option

The TEXT parameter is treated as the path to a file that contains the string to be converted.

You can convert half-width characters in the file to full-width characters. The conversion result will be output to the standard output.

Create a file named "half.txt" containing half-width characters.

half.txt:

キョウトシ サキョウク ギンカクジチョウ 2

Execute the command with the --file option and the file path "half.txt":

% utsuho half-to-full --file half.txt
キョウトシ サキョウク ギンカクジチョウ 2

hiragana-to-katakana Command

This command performs the conversion from hiragana to katakana.

You can use the --help option to show the command syntax.

% utsuho hiragana-to-katakana --help
Usage: utsuho hiragana-to-katakana [OPTIONS] TEXT

  Convert from hiragana to katakana.

Options:
  -f, --file  Whether to use TEXT as a file path.
  --help      Show this message and exit.

You can convert hiragana contained in the TEXT to katakna. The conversion result will be output to the standard output.

% utsuho hiragana-to-katakana "きょうとし さきょうく ぎんかくじちょう 2"
キョウトシ サキョウク ギンカクジチョウ 2

--file Option

The TEXT parameter is treated as the path to a file that contains the string to be converted.

You can convert hiragana in the file to katakana. The conversion result will be output to the standard output.

Create a file named "hiragana.txt" containing hiragana.

hiragana.txt:

きょうとし さきょうく ぎんかくじちょう 2

Execute the command with the --file option and the file path "hiragana.txt":

% utsuho hiragana-to-katakana --file hiragana.txt
キョウトシ サキョウク ギンカクジチョウ 2

katakana-to-hiragana Command

This command performs the conversion from katakana to hiragana.

You can use the --help option to show the command syntax.

% utsuho katakana-to-hiragana --help
Usage: utsuho katakana-to-hiragana [OPTIONS] TEXT

  Convert from katakana to hiragana.

Options:
  -f, --file  Whether to use TEXT as a file path.
  --help      Show this message and exit.

You can convert katakana contained in the TEXT to hiragana. The conversion result will be output to the standard output.

% utsuho katakana-to-hiragana "キョウトシ サキョウク ギンカクジチョウ 2"
きょうとし さきょうく ぎんかくじちょう 2

--file Option

The TEXT parameter is treated as the path to a file that contains the string to be converted.

You can convert katakana in the file to hiragana. The conversion result will be output to the standard output.

Create a file named "katakana.txt" containing hiragana.

katakana.txt:

キョウトシ サキョウク ギンカクジチョウ 2

Execute the command with the --file option and the file path "katakana.txt":

% utsuho katakana-to-hiragana --file katakana.txt
きょうとし さきょうく ぎんかくじちょう 2

License

This project is licensed under the terms of the Apache license 2.0.

See the "LICENSE" file for license rights and limitations.

Links

About

Utsuho is a Python module that facilitates bidirectional conversion between half-width katakana and full-width katakana in Japanese.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages