Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Unicode characters in text file [sf#3] #47

Open
teambob opened this issue Jun 7, 2015 · 5 comments
Open

[Feature] Unicode characters in text file [sf#3] #47

teambob opened this issue Jun 7, 2015 · 5 comments

Comments

@teambob
Copy link
Owner

teambob commented Jun 7, 2015

Reported by andrewpunch on 2004-06-09 03:25 UTC
Characters which are non-ASCII are thrown away when
writing to a text file.

There is no way around this while we write to an ASCII
file.

There are some other options for file formats:

  • mapping to a code page (e.g. ANSI)
  • quoted printable (same as email)
  • UTF8
  • Straight Unicode

From a design perspective this could be achieved by
creating maps from a single unicode character to one or
more bytes.

There could be a map for:

  • ASCII
  • ANSI
  • iso8859 standards
  • UTF8
  • straight unicode

The map need not be static. It may be dynamic. For
example the ASCII map may allow through all character
codes with a unicode value less than 0x0080.

There must be a process for when a unicode character is
not mappable using the current map.

Created on behalf of David at Nutmeg.

@teambob teambob changed the title Unicode characters in text file [Feature] Unicode characters in text file [sf#3] Jun 7, 2015
@teambob
Copy link
Owner Author

teambob commented Jun 7, 2015

Commented by andrewpunch on 2004-06-10 10:50 UTC
Logged In: YES
user_id=928005

Another quick possibility is to use UTF-32 (little endian)
encoding. This allows access to all the characters in a
document without loss of information.

The technical specification is here:
http://www.unicode.org/faq/specifications-jda.html

@teambob
Copy link
Owner Author

teambob commented Jun 7, 2015

Commented by andrewpunch on 2004-10-11 06:43 UTC
Logged In: YES
user_id=928005

Detirmination:
Text writer will output UTF8 by default in next version.
This will be compatible with ASCII for english characters,
but will keep other characters.

ASCII, UTF16/32 and other mappings will be available as
options in later versions.

@teambob
Copy link
Owner Author

teambob commented Jun 7, 2015

Updated by andrewpunch on 2004-10-11 06:43 UTC

  • summary: Non-ASCII characters in text file --> Unicode characters in text file

@teambob
Copy link
Owner Author

teambob commented Jun 7, 2015

Updated by andrewpunch on 2004-10-11 06:45 UTC

  • priority: 5 --> 8

@teambob
Copy link
Owner Author

teambob commented Jun 7, 2015

Commented by andrewpunch on 2005-04-11 12:41 UTC
Logged In: YES
user_id=928005

This is scheduled for inclusion in 3.2.0 as UTF8 output for
"text" files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant