-
Notifications
You must be signed in to change notification settings - Fork 5.5k
[WIP] Upgrade to Hive 4.0.1 #24571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[WIP] Upgrade to Hive 4.0.1 #24571
Conversation
7ec86c0 to
954a77b
Compare
7e6d265 to
0448c62
Compare
21c80f6 to
282e2c4
Compare
|
Looking closely at the reason why tests are failing, beginning with presto-orc module.
from pyarrow import orc
table2 = orc.read_table('/tmp/3420396529049254202/data.orc')
print(table2)Whereas, record reader for ORC in presto reads it differently
ORC files generated by the version of hive in the PR and master:
This is giving a clue as to something has changed between the versions. This indicates that files written by older version of hive + ORC will give incorrect output? |
|
Another interesting find: There is a difference between row indices and stripe information. |
13d9947 to
5c04f64
Compare
|
@ethanyzhang imported this issue as lakehouse/presto #24571 |
|
The test failures such as: Is in the way presto reads the data and writes the data. Somehow even before the data is interpreted as a timestamp type i.e it is still a long type, it has a timestamp adjusted to system timezone. Why it happens is not yet clear to me, when the data written by presto is read via an external ORC reader it has a 6h adjustment applied to it. A similar thing happens when presto reads the data written by hive. There are no issues while reading other datatypes, e.g. Long/Ints etc... The problem seems to be specific to timestamp only. @imjalpreet agrees with this. |
5b896d6 to
6db405e
Compare
894b0aa to
5a2117a
Compare
71ed5ae to
c2e4072
Compare
This reverts commit 6880dd2.
c2e4072 to
602229d
Compare


Description
Upgrade to Hive 4.0.1
Depends on prestodb/presto-hive-apache#65 and prestodb/presto-hive-dwrf#12
Motivation and Context
#24435
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.