Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANALYZE command fails with Spark server #41

Open
gabbasb opened this issue Aug 23, 2017 · 0 comments
Open

ANALYZE command fails with Spark server #41

gabbasb opened this issue Aug 23, 2017 · 0 comments

Comments

@gabbasb
Copy link
Collaborator

gabbasb commented Aug 23, 2017

ANALYZE jobhist;
ERROR: failed to fetch execute query: This function is supposed to execute queries that do not generate any result set
ANALYZE emp(empno);
ERROR: failed to fetch execute query: This function is supposed to execute queries that do not generate any result set
VACUUM ANALYZE emp;

gabbasb added a commit to gabbasb/hdfs_fdw that referenced this issue Aug 23, 2017
Problem Statement:
Hive and Spark both support HiveQL, and are compatible except
for the behaviour of the ANALYZE command.
The difference is as follows:
In Hive, ANALYZE is a utility command and does not return any
result set whereas in Spark it returns a result set.
For example:
In Hive we get this output:
--------------------------
0: jdbc:hive2://localhost:10000/testdb>  analyze table names_tab compute statistics;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1488090103001_0007
INFO  : The url to track the job: http://localhost:8088/proxy/application_1488090103001_0007/
INFO  : Starting Job = job_1488090103001_0007, Tracking URL = http://localhost:8088/proxy/application_1488090103001_0007/
INFO  : Kill Command = /home/abbasbutt/Projects/hadoop_fdw/hadoop/bin/hadoop job  -kill job_1488090103001_0007
INFO  : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
INFO  : 2017-08-22 19:08:11,328 Stage-0 map = 0%,  reduce = 0%
No rows affected (11.949 seconds)
INFO  : 2017-08-22 19:08:15,465 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 0.93 sec
INFO  : MapReduce Total cumulative CPU time: 930 msec
INFO  : Ended Job = job_1488090103001_0007
INFO  : Table testdb.names_tab stats: [numFiles=2, numRows=12, totalSize=76, rawDataSize=64]
0: jdbc:hive2://localhost:10000/testdb> [abbasbutt@localhost bin]$

In Spark we get this output:
---------------------------
0: jdbc:hive2://localhost:10000/my_spark_db> analyze table junk_table compute statistics;
+---------+--+
| Result  |
+---------+--+
+---------+--+
No rows selected (1.462 seconds)

Solution:
The CREATE SERVER command already has a client_type parameter
that currently supports one value 'hiveserver2'.
To support ANALYZE on Spark the client type can also have the value
'spark'.
If the client_type is not specified the default will be hive
and analyze command will fail when Spark is used.
Otherwise if correct client_type is specified ANALYZE will work
fine with Spark.

For Example:
postgres=# CREATE EXTENSION hdfs_fdw;
CREATE EXTENSION
postgres=# CREATE SERVER hdfs_svr FOREIGN DATA WRAPPER hdfs_fdw OPTIONS (host '127.0.0.1',port '10000',client_type 'spark');
CREATE SERVER
postgres=# CREATE USER MAPPING FOR abbasbutt server hdfs_svr OPTIONS (username 'ldapadm', password 'ldapadm');
CREATE USER MAPPING
postgres=# CREATE FOREIGN TABLE fnt( a int, name varchar(255)) SERVER hdfs_svr OPTIONS (dbname 'my_spark_db', table_name 'junk_table');
CREATE FOREIGN TABLE
postgres=# ANALYZE fnt;
ANALYZE
ibrarahmad added a commit that referenced this issue Aug 23, 2017
Issue - (#41, #43) - Analyze command fix for Spark server.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant