[Feature] Unicode characters in text file [sf#3] #47

teambob · 2015-06-07T02:16:03Z

Reported by andrewpunch on 2004-06-09 03:25 UTC
Characters which are non-ASCII are thrown away when
writing to a text file.

There is no way around this while we write to an ASCII
file.

There are some other options for file formats:

mapping to a code page (e.g. ANSI)
quoted printable (same as email)
UTF8
Straight Unicode

From a design perspective this could be achieved by
creating maps from a single unicode character to one or
more bytes.

There could be a map for:

ASCII
ANSI
iso8859 standards
UTF8
straight unicode

The map need not be static. It may be dynamic. For
example the ASCII map may allow through all character
codes with a unicode value less than 0x0080.

There must be a process for when a unicode character is
not mappable using the current map.

Created on behalf of David at Nutmeg.

teambob · 2015-06-07T02:18:08Z

Commented by andrewpunch on 2004-06-10 10:50 UTC
Logged In: YES
user_id=928005

Another quick possibility is to use UTF-32 (little endian)
encoding. This allows access to all the characters in a
document without loss of information.

The technical specification is here:
http://www.unicode.org/faq/specifications-jda.html

teambob · 2015-06-07T02:18:13Z

Commented by andrewpunch on 2004-10-11 06:43 UTC
Logged In: YES
user_id=928005

Detirmination:
Text writer will output UTF8 by default in next version.
This will be compatible with ASCII for english characters,
but will keep other characters.

ASCII, UTF16/32 and other mappings will be available as
options in later versions.

teambob · 2015-06-07T02:18:17Z

Updated by andrewpunch on 2004-10-11 06:43 UTC

summary: Non-ASCII characters in text file --> Unicode characters in text file

teambob · 2015-06-07T02:18:21Z

Updated by andrewpunch on 2004-10-11 06:45 UTC

priority: 5 --> 8

teambob · 2015-06-07T02:18:25Z

Commented by andrewpunch on 2005-04-11 12:41 UTC
Logged In: YES
user_id=928005

This is scheduled for inclusion in 3.2.0 as UTF8 output for
"text" files.

teambob changed the title ~~Unicode characters in text file~~ [Feature] Unicode characters in text file [sf#3] Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Unicode characters in text file [sf#3] #47

[Feature] Unicode characters in text file [sf#3] #47

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015

[Feature] Unicode characters in text file [sf#3] #47

[Feature] Unicode characters in text file [sf#3] #47

Comments

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015

teambob commented Jun 7, 2015