Skip to content

Word Count Output

Ronen Botzer edited this page Jan 5, 2020 · 1 revision

In this example, we will read /tmp/input (log text data) in HDFS, do a word count, but instead of putting the results in HDFS, push them to test:counts on Aerospike. We will achieve this by setting the output format class in the map-reduce job driver to an extension of the AerospikeOutputFormat class. Since the original data is same as the Word Count Input Example, we expect same results.

Word Count Output Example

test:counts has two bins, word and count. The word bin value is also the Primary Key for that record and can be used in the AQL select statements to examine individual records.

Source code can be reviewed here:

~/aerospike-hadoop/examples/word_count_output/src/main/java/com/aerospike/hadoop/examples/wordcountoutput/WordCountOutput.java

Running the test:

hdclient@ztg-client:~/aerospike-hadoop$ cd ~
hdclient@ztg-client:~$ cd aerospike-hadoop/
hdclient@ztg-client:~/aerospike-hadoop$ hadoop jar ./examples/word_count_output/build/libs/word_count_output.jar -D aerospike.output.namespace=test -D aerospike.output.setname=counts /tmp/words

Inspect result in Aerospike using AQL:
hdclient@ztg-client:~/aerospike-hadoop$aql
aql> select * from test.counts where PK = 'writes_master'
+-----------------+-------+
| word            | count |
+-----------------+-------+
| "writes_master" | 2040  |
+-----------------+-------+
1 row in set (0.000 secs)

aql> select * from test.counts where PK = '{bar}'
+---------+-------+
| word    | count |
+---------+-------+
| "{bar}" | 340   |
+---------+-------+
1 row in set (0.001 secs)

aql> exit

hdclient@ztg-client:~/aerospike-hadoop$

The results for counts are same as in the Word Count Input Example.