Skip to content

Commit 91623db

Browse files
author
James Malone
committed
Add HCatalog init action & typo fix
- Add Hive HCatalog initialization action - Typo fix for Flink readme
1 parent fec8cf4 commit 91623db

File tree

4 files changed

+56
-1
lines changed

4 files changed

+56
-1
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ This repository presently offers the following actions for use with Cloud Datapr
3030
* [Apache Zeppelin](http://zeppelin.apache.org)
3131
* [Apache ZooKeeper](http://zookeeper.apache.org)
3232
* [Google Cloud Datalab](https://cloud.google.com/datalab/)
33+
* [Hive HCatalog](https://cwiki.apache.org/confluence/display/Hive/HCatalog)
3334
* [Hue](http://gethue.com)
3435
* [IPython](http://ipython.org)
3536
* [Presto](http://prestodb.io)

flink/flink.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
# To use this script, you will need to configure the following variables to
2020
# match your cluster. For information about which software components
2121
# (and their version) are included in Cloud Dataproc clusters, see the
22-
# Cloud Dataproc Image Version informayion:
22+
# Cloud Dataproc Image Version information:
2323
# https://cloud.google.com/dataproc/concepts/dataproc-versions
2424

2525
set -x -e

hive-hcatalog/README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Apache Drill Initialization Action
2+
3+
This initialization action installs [Hive HCatalog](https://cwiki.apache.org/confluence/display/Hive/HCatalog) on a [Google Cloud Dataproc](https://cloud.google.com/dataproc) cluster. The script will also configure Pig to use HCatalog.
4+
5+
## Using this initialization action
6+
7+
You can use this initialization action to create a new Cloud Dataproc cluster with HCatalog installed by doing the following.
8+
9+
1. Uploading a copy of this initialization action (`hive-hcatalog.sh`) to [Google Cloud Storage](https://cloud.google.com/storage). Alternatively, you can use the Google-hosted copy of this initialization action at `gs://dataproc-initialization-actions/hive-hcatalog/hive-hcatalog.sh`
10+
2. Using the `gcloud` command to create a new cluster with this initialization action. The following command will create a new cluster named `<CLUSTER-NAME>`, specify the initialization action stored in `<GCS-BUCKET>`:
11+
12+
```bash
13+
gcloud dataproc clusters create <CLUSTER-NAME> \
14+
--initialization-actions gs://<GCS-BUCKET>/hive-hcatalog.sh
15+
```
16+
3. Once the cluster has been created HCatalog should be installed and configured for use with Pig.
17+
18+
You can find more information about using initialization actions with Dataproc in the [Dataproc documentation](https://cloud.google.com/dataproc/init-actions).

hive-hcatalog/hive-hcatalog.sh

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/bin/bash
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS-IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# This script installs Hive HCatalog
16+
# (https://cwiki.apache.org/confluence/display/Hive/HCatalog) on a Google Cloud
17+
# Dataproc cluster.
18+
#
19+
# To use this script, you will need to configure the following variables to
20+
# match your cluster. For information about which software components
21+
# (and their version) are included in Cloud Dataproc clusters, see the
22+
# Cloud Dataproc Image Version information:
23+
# https://cloud.google.com/dataproc/concepts/dataproc-versions
24+
25+
set -x -e
26+
27+
# Install the hive-hcatalog package
28+
apt-get -q -y install hive-hcatalog
29+
30+
# Configure Pig to use HCatalog
31+
cat >>/etc/pig/conf/pig-env.sh <<EOF
32+
#!/bin/bash
33+
34+
includeHCatalog=true
35+
36+
EOF

0 commit comments

Comments
 (0)