-
Notifications
You must be signed in to change notification settings - Fork 14
/
README.af
238 lines (150 loc) · 7.38 KB
/
README.af
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
=for humans
This file is written in the POD format, use pod2html or similar to view it.
=head1 NOTE
This file is now really historical. Unicode support in DBD::ODBC has
matured greatly since Alexander's initial patch although I greatly
appreciate it.
=head1 Unicode patch for DBD::ODBC 1.13
Date: 2006-03-04
=head2 Goals
=over 4
=item * Allow an application on Win32 using DBD::ODBC to store and fetch Unicode
strings in a MS SQL Server 2000 database.
=item * Minimal changes to the existing driver code.
=item * Don't break working code.
=item * Make the DBD::ODBC "do the right thing" with the UTF8 flag.
=item * A first step towards proper Unicode support for DBD:ODBC.
=back
=head2 Legal disclaimer
The Unicode patch for DBD::ODBC was initially written by Alexander Foken
(alexander at foken dot de). I wrote this patch during my working hours for a
project that needs Unicode support in DBD::ODBC on Windows, and I have the
permission of my former employer to publish this patch.
You may distribute under the terms of either the GNU General Public License or
the Artistic License, as specified in the Perl README file.
=head2 Applying the Patch
=over 4
=item 1. get DBD::ODBC 1.13 from CPAN
=item 2. unpack it
=item 3. C<patch -p1 E<lt> DBD-ODBC-1.13-Unicode.patch>
=item 4. build DBD::ODBC as usual (C<perl Makefile.pl>, C<make>, C<make test>,
C<make install>)
=back
=head2 Tested Databases
=over 4
=item Microsoft SQL Server 2000 SP3
Works, using the ODBC driver from the MS SQL 2000 CDROM.
You need to install SP1, SP2, and SP3 on both client and server, because
DBD::ODBC needs a recent set of MDAC libraries.
=item Oracle 9.2
Works, using the ODBC driver from the Oracle 9.2 CDROM.
You have to set the environment variables C<NLS_NCHAR=AL32UTF8> and
C<NLS_LANG=AMERICAN_AMERICA.AL32UTF8> (or any other language setting ending with
"C<.AL32UTF8>") before loading the patched DBD::ODBC to make Oracle return
Unicode data. (See also "Oracle and Unicode" in the POD of DBD::Oracle.)
And you have to enable the "C<Force SQL_WCHAR support>" Workaround in the Oracle
ODBC Driver Configuration to make Oracle return Unicode to a non-Unicode
application. Alternatively, you can include "C<FWC=T>" in your connect string.
Better try to use DBD::Oracle to access Oracle with less overhead and better
support for Oracle-specific stuff.
=item Oracle 8.x
Not tested but known not to support Unicode very well.
Quoting the DBD::Oracle man page: "I<Oracle 8 client libraries have a number of
bugs related to character set handling, especially when connected to an Oracle
9+ server.> [...] I<If you wish to use Unicode, I recommend upgrading client and
server to Oracle 9 or later.>"
=item MS Access 2000
Unicode tests fail because MS Access 2000 seems not to give any hints about
Unicode, so all data is treated as non-Unicode. You do not want to use this
combination.
You may want to try the MSDE, it has the SQL Server engine, but with a lower
connection limit and without GUI tools. There are several 3rd party GUIs for the
MSDE.
=item PostgreSQL 8.0.3
Some tests from the original DBD::ODBC 1.13 fail with PostgreSQL 8.0.3, so you
may not want to use DBD::ODBC to connect to PostgreSQL 8.0.3.
Unicode tests fail because PostgreSQL seems not to give any hints about Unicode,
so all data is treated as non-Unicode.
Better try to use DBD::Pg to access PostgreSQL with less overhead and better
support for PostgreSQL-specific stuff. DBD::Pg has a driver attribute named
C<pg_enable_utf8>, set it to 1 and you have proper Unicode support.
=back
=head2 Tested Operating Systems and ODBC Managers
=over 4
=item MS Windows 2000 Professional and Server, using the standard ODBC Manager
from Microsoft.
=back
(Yes, this list should be longer.)
=head2 Known Problems
Perl 5.8.1 or newer is required. Older Perl before 5.8.0 lacked proper Unicode
support. Perl 5.8.0 lacks some auxillary functions for Unicode.
Unicode is supported only for SQL statement parameters and data returned by the
fetch methods, SQL statements are still treated as native encoding. If you need
a unicode constant in an SQL statement, you have to pass it as parameter or use
SQL functions to convert your constant from native encoding to Unicode.
All data passed to the patched DBD::ODBC for C<SQL_C_WCHAR>, C<SQL_WCHAR>,
C<SQL_WVARCHAR>, and C<SQL_WLONGVARCHAR> is treated as Unicode, even if it is
not Unicode. F<unicode_helper.c> should check the UTF8 flag of the scalar and
pass a value different from C<CP_UTF8> as first argument to
C<WideCharToMultiByte()>. The problem is to know what encoding is used for the
data in the scalar.
Binding of unicode output parameters is untested (I don't need them) and likely
to fail.
The patched DBD::ODBC may fail to compile on non-Win32 platforms. It needs a
header file named F<wchar.h> defining at least the following:
=over 4
=item A C<WCHAR> data type capable of storing a single Unicode character.
Microsoft uses C<typedef wchar_t WCHAR> in F<wchar.h>, and C<typedef unsigned
short wchar_t;> in F<ctype.h>.
=item C<WCHAR * wcscpy(WCHAR *, const WCHAR *)>
C<strcpy()> for wide characters. Microsoft declares this function in both
F<string.h> and F<wchar.h>.
=item C<size_t wcslen(const WCHAR *)>
C<strlen()> for wide characters, returns character count, not bytes. Microsoft
declares this function in both F<string.h> and F<wchar.h>.
=item WideCharToMultiByte() and MultiByteToWideChar() functions
Encoding converter functions. WideChar in this context means the native Unicode
representation of the ODBC API (UCS-2 or UTF-16LE for Windows), MultiByte in
this context means Perls native Unicode representation (UTF-8). Microsoft
declares the two functions in F<winnls.h>. The C<CP_UTF8> argument tells the
function that the MultiByte string is in UTF-8 encoding.
=back
=head2 Technical
This patch adds support for C<SQL_C_WCHAR>, C<SQL_WCHAR>, C<SQL_WVARCHAR>, and
C<SQL_WLONGVARCHAR>. Strings written to columns that are reported as
C<SQL_WCHAR>, C<SQL_WVARCHAR>, or C<SQL_WLONGVARCHAR> are automatically
converted to 16 bit Unicode using the Windows API function
C<WideCharToMultiByte()>, return values reported as C<SQL_C_WCHAR>,
C<SQL_WCHAR>, C<SQL_WVARCHAR>, or C<SQL_WLONGVARCHAR> are converted back to 8
bit Unicode (UTF-8) using the Windows API function C<MultiByteToWideChar()> and
have Perl's UTF8 flag set except for empty strings.
=head2 Tests
This patch adds two new tests, F<t/40UnicodeRoundTrip.t> and F<t/41Unicode.t>.
Test 40 checks that Unicode strings can be entered as bound parameters and are
returned unmodified. Test 41 creates a table, writes and reads unicode data
using various bind variants.
When using Oracle, the empty string test in F<t/40UnicodeRoundTrip.t> is skipped
because Oracle converts empty strings to NULL in this situation.
I had to add C<SQL_WCHAR>, C<SQL_WVARCHAR>, C<SQL_WLONGVARCHAR> to
F<t/ODBCTEST.pm>, because Oracle in the setup described above returns Unicode
more often than expected.
I added F<t/UChelp.pm>, that exports two utility functions for unicode string
tests:
=over 4
=item C<dumpstr($)>
Dumps a string, indicating its Unicode flag, length and all characters in ASCII
notation.
=item C<utf_eq_ok($$$)>
Compares two strings that may be contain Unicode, and calls C<pass()> or
C<cmp_ok()>.
=back
=head2 See also
=over 4
=item * Microsoft ODBC documentation
=item * Microsoft API documentation
=item * http://www.unicode.org/
=item * DBI
=item * DBD::ODBC
=item * DBD::Oracle
=item * DBD::Pg
=back