Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression tests for different fetch size on 32-bit platforms #227

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

df7cb
Copy link
Contributor

@df7cb df7cb commented Oct 1, 2021

Net diff between server_options.out and server_options_1.out is:

--- expected/server_options.out 2021-10-01 16:19:20.320972814 +0200
+++ expected/server_options_1.out 2021-10-01 16:22:04.428838428 +0200
@@ -235,11 +235,11 @@

-- Negative test cases for fetch_size option, should error out.
ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '-60000');
-ERROR: "fetch_size" requires an integer value between 1 to 18446744073709551615
+ERROR: "fetch_size" requires an integer value between 1 to 4294967295
ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '123abc');
-ERROR: "fetch_size" requires an integer value between 1 to 18446744073709551615
+ERROR: "fetch_size" requires an integer value between 1 to 4294967295
ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '999999999999999999999');
-ERROR: "fetch_size" requires an integer value between 1 to 18446744073709551615
+ERROR: "fetch_size" requires an integer value between 1 to 4294967295
-- Cleanup fetch_size test objects.
DROP FOREIGN TABLE table30000;
DROP SERVER fetch101;

Net diff between server_options.out and server_options_1.out is:

--- expected/server_options.out	2021-10-01 16:19:20.320972814 +0200
+++ expected/server_options_1.out	2021-10-01 16:22:04.428838428 +0200
@@ -235,11 +235,11 @@

 -- Negative test cases for fetch_size option, should error out.
 ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '-60000');
-ERROR:  "fetch_size" requires an integer value between 1 to 18446744073709551615
+ERROR:  "fetch_size" requires an integer value between 1 to 4294967295
 ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '123abc');
-ERROR:  "fetch_size" requires an integer value between 1 to 18446744073709551615
+ERROR:  "fetch_size" requires an integer value between 1 to 4294967295
 ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '999999999999999999999');
-ERROR:  "fetch_size" requires an integer value between 1 to 18446744073709551615
+ERROR:  "fetch_size" requires an integer value between 1 to 4294967295
 -- Cleanup fetch_size test objects.
 DROP FOREIGN TABLE  table30000;
 DROP SERVER fetch101;
@df7cb
Copy link
Contributor Author

df7cb commented Oct 1, 2021

This fixes the regression tests on 32-bit platforms.

A more correct fix would be to make the max fetch size int64_max instead of INT_MAX, but that's more invasive.

@jeevanladhe
Copy link
Contributor

Thanks, Christoph for your patch. But as of now, we do not test mysql_fdw
on 32-bit platforms.

mysql_fdw runs as a part of the PostgreSQL or EDB Postgres Advanced
Server. PostgresSQL server also actively tests only a couple of 32-bit
platforms[1]. EDB Postgres Advanced Server does not support 32-bit
platforms starting version 11 onwards[2].

We do not have any near-term plans to do active testing and certify
mysql_fdw for 32-bit platforms. But, having said that, if we notice that
a substantial user base is looking forward to the same then we may
consider the same.

Regards,
Jeevan Ladhe

[1] https://buildfarm.postgresql.org/cgi-bin/show_status.pl
[2] https://www.enterprisedb.com/product-compatibility

@df7cb
Copy link
Contributor Author

df7cb commented Oct 4, 2021

Even if you don't test it there, you could still merge it?

@jeevanladhe
Copy link
Contributor

Even if you don't test it there, you could still merge it?

Well, that would fetch some additional maintenance every time we make
changes or add some additional test to server_options.sql. Every time we
do so, we will have to add the expected output to this new file as well,
and will also have to make sure this test also passes. For doing that
we will need to set up a 32-bit machine. This collectively indirectly
implies we do half of the job if we would have to call it official
support.

Regards,
Jeevan Ladhe

@df7cb
Copy link
Contributor Author

df7cb commented Oct 4, 2021

Alternatively, you could drop the offending test (the 3 fetch size errors), or fix the variable to always be 64 bit.

@df7cb
Copy link
Contributor Author

df7cb commented Dec 6, 2021

Fwiw the 32-bit problem got worse with 2.7.0:

13:20:40 --- /tmp/autopkgtest.Hiip7V/tree/expected/aggregate_pushdown_4.out	2021-11-18 08:59:41.000000000 +0000
13:20:40 +++ /tmp/autopkgtest.Hiip7V/tree/results/aggregate_pushdown.out	2021-12-06 12:20:40.513243199 +0000
13:20:40 @@ -629,11 +629,8 @@
13:20:40  (7 rows)
13:20:40  
13:20:40  SELECT sum(c2) * (random() <= 1)::int AS sum FROM fdw132_t1 ORDER BY 1;
13:20:40 - sum 
13:20:40 ------
13:20:40 - 300
13:20:40 -(1 row)
13:20:40 -
13:20:40 +WARNING:  failed to resolve name __mulodi4
13:20:40 +ERROR:  failed to look up symbol "evalexpr_19_0"

That's on PG12 and 13 where the remaining tests pass; on PG14 it's even worse since the connection gets terminated because of a JIT error.

I haven't read the code yet, but this smells like a bug.

@df7cb
Copy link
Contributor Author

df7cb commented Dec 9, 2021

The same regression diff appears on armhf with PG14:

diff -U3 /tmp/autopkgtest-lxc.ar7wrbxy/downtmp/build.DUL/src/expected/aggregate_pushdown.out /tmp/autopkgtest-lxc.ar7wrbxy/downtmp/build.DUL/src/results/aggregate_pushdown.out
--- /tmp/autopkgtest-lxc.ar7wrbxy/downtmp/build.DUL/src/expected/aggregate_pushdown.out	2021-11-18 08:59:41.000000000 +0000
+++ /tmp/autopkgtest-lxc.ar7wrbxy/downtmp/build.DUL/src/results/aggregate_pushdown.out	2021-12-08 22:29:30.341358246 +0000
@@ -629,11 +629,8 @@
 (7 rows)
 
 SELECT sum(c2) * (random() <= 1)::int AS sum FROM fdw132_t1 ORDER BY 1;
- sum 
------
- 300
-(1 row)
-
+WARNING:  failed to resolve name __mulodi4
+ERROR:  failed to look up symbol "evalexpr_19_0"
 -- LATERAL join, with parameterization
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT c2, sum FROM fdw132_t1 t1, lateral (SELECT sum(t2.c1 + t1.c1) sum FROM fdw132_t2 t2 GROUP BY t2.c1) qry WHERE t1.c2 * 2 = qry.sum and t1.c2 > 10 ORDER BY 1;
### End 14 installcheck (FAILED with exit code 1) ###

@cpaelzer
Copy link

I just wanted to state that (for obvious reasons) this affects Ubuntu just as much.

After reading the discussion so far I wanted to ask - as it seems armhf (or 32bit in general) shall not be supported - if you'd want us (= the Distributions) to disable building and providing mysql_fdw on armhf?

@df7cb
Copy link
Contributor Author

df7cb commented Jan 12, 2022

"failed to resolve name" and "failed to look up symbol" are from PG's jit machinery.

https://github.com/postgres/postgres/blob/a18b6d2dc288dfa6e7905ede1d4462edd6a8af47/src/backend/jit/llvm/llvmjit.c#L1100-L1102

@anarazel, is that a bug in PG?

@df7cb
Copy link
Contributor Author

df7cb commented Jan 12, 2022

https://codesearch.debian.net/search?q=mulodi4&literal=1 has some insights that this is a general problem with clang on 32-bit:

webkit2gtk_2.34.3-1/Source/WTF/wtf/CheckedArithmetic.h

/* On Linux with clang, libgcc is usually used instead of compiler-rt, and it does
 * not provide the __mulodi4 symbol used by clang for __builtin_mul_overflow

@anarazel
Copy link

@df7cb Hm. If you compile a c program using __builtin_mul_overflow on those platforms with clang, does it end up using libgcc? Does it make a difference if you add --rtlib=libgcc or --rtlib=compiler-rt to the clang invocations?

@df7cb
Copy link
Contributor Author

df7cb commented Jan 13, 2022

It does not, but my test might be too simple:

[25] 10:08 [email protected]:~/overflow 1j $ cat overflow.c 
int mul(int a, int b, int *c)
{
	__builtin_mul_overflow(a, b, c);
	return *c;
}

int main ()
{
	int a;
	mul(5, 5, &a);
	return a;
}
[0] 10:12 [email protected]:~/overflow 1j $ clang --rtlib=libgcc -Wall -O2 overflow.c -lgcc
[0] 10:12 [email protected]:~/overflow 1j $ ./a.out ; echo $?
25
[0] 10:12 [email protected]:~/overflow 1j $ ldd a.out 
	linux-gate.so.1 (0xf7fb2000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7db2000)
	/lib/ld-linux.so.2 (0xf7fb4000)
[0] 10:12 [email protected]:~/overflow 1j $ clang --version
Debian clang version 13.0.1-+rc1-1~exp4
Target: i386-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

--rtlib doesn't make a difference.

@anarazel
Copy link

@df7cb Oh, I thought this was arm specific... Do you see failures on 32bit x86?

I think the example fails to fail to generate a reference to mulodi only because a) the overflow path isn't actually used b) it's only 32bit.

$ cat /tmp/test_mul.c
#include <stdint.h>
#include <stdbool.h>

bool mul(int64_t a, int64_t b, int64_t *c)
{
  return __builtin_mul_overflow(a, b, c);
}

$ clang -m32 -Wall -O2 /tmp/test_mul.c -o - -S|grep mulodi
	calll	__mulodi4

@df7cb
Copy link
Contributor Author

df7cb commented Jan 13, 2022

I've seen it on both i386 and armhf.

Your code now shows the problem in the 32-bit chroot:

[25] 10:51 [email protected]:~/overflow 1j $ cat test_mul.c 
#include <stdint.h>
#include <stdbool.h>

bool mul(int64_t a, int64_t b, int64_t *c)
{
  return __builtin_mul_overflow(a, b, c);
}

int main()
{
	int64_t c;
	mul(5, 5, &c);
	return c;
}
[0] 10:51 [email protected]:~/overflow 1j $ clang -m32 -Wall -O2 test_mul.c -c
[0] 10:52 [email protected]:~/overflow 1j $ clang -Wall -O2 test_mul.o
/usr/bin/ld: test_mul.o: in function `mul':
test_mul.c:(.text+0x2f): undefined reference to `__mulodi4'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[1] 10:52 [email protected]:~/overflow 1j $ clang --rtlib=libgcc -Wall -O2 test_mul.o
/usr/bin/ld: test_mul.o: in function `mul':
test_mul.c:(.text+0x2f): undefined reference to `__mulodi4'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
[1] 10:52 [email protected]:~/overflow 1j $ clang --rtlib=compiler-rt -Wall -O2 test_mul.o
[0] 10:52 [email protected]:~/overflow 1j $ ldd a.out 
	linux-gate.so.1 (0xf7f7c000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7d7c000)
	/lib/ld-linux.so.2 (0xf7f7e000)

@anarazel
Copy link

It appears to be getting fixed in clang. See https://bugs.llvm.org/show_bug.cgi?id=28629 - sid's clang-14 doesn't fail anymore.

As the above example quite clearly shows, this is unrelated to postgres / jit...

@df7cb
Copy link
Contributor Author

df7cb commented Jan 13, 2022

Thanks @anarazel !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants