For the purpose of these exercises we will be working with a basic table which as two column families. The first column family is “personal” and will contain first_name, last_name, age, gender, martial_status. The second column family is “professional” and will contain “occupation” and “education”.
We will walk through all of the steps from creating the table, column families, populating data, changing data, deleting data and dropping the table. We will first show you this in the HBase shell so you can be familiar with the data we are working with. In Part 3 we will start to do these tasks programmatically using Java.
We are making a very brisk journey through the shell, with little to no explanation of the various parts of HBase, we assume you have learned the basics from reading the documentation. Our goal is to just so some basic common HBase table operations using the shell, and then replicate it using Java.
Our employee
table will look like so:
personal | professional | ||||||
ID | first_name | last_name | age | gender | martial_status | occupation | education |
First we fire up the HBase shell
1 2 3 4 5 6 7 8 9 10 11 12 |
Brian-Feenys-Mac-Pro:hbase-1.2.6 bfeeny$ hbase shell 2018-03-07 20:37:42,618 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/bfeeny/Documents/hbase/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/bfeeny/Documents/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.6, rUnknown, Mon May 29 02:25:32 CDT 2017 hbase(main):001:0> |
We can request the basic status from HBase
1 2 |
hbase(main):001:0> status 1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load |
We ask it who we are, similar to the whoami
UNIX command, and get a list of any tables
1 2 3 4 5 6 7 8 9 |
hbase(main):002:0> whoami bfeeny (auth:SIMPLE) groups: staff, everyone, localaccounts, _appserverusr, admin, _appserveradm, _lpadmin, _appstore, _lpoperator, _developer, _analyticsusers, com.apple.access_ftp, com.apple.access_screensharing, com.apple.access_ssh hbase(main):005:0* list TABLE 0 row(s) in 0.0450 seconds => [] |
We see that there are no tables. We create the employee
table with two column families, personal
and professional
.
1 2 3 4 5 6 7 8 9 10 |
hbase(main):008:0> create 'employee', 'personal', 'professional' 0 row(s) in 1.2420 seconds => Hbase::Table - employee hbase(main):009:0> list TABLE employee 1 row(s) in 0.0130 seconds => ["employee"] |
We can use the describe
command to give us more copious information about the table. Many of these parameters have to do with the underlaying Hadoop layer and are not important for our exercises.
1 2 3 4 5 6 7 8 9 |
hbase(main):010:0> describe 'employee' Table employee is ENABLED employee COLUMN FAMILIES DESCRIPTION {NAME => 'personal', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'professional', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 2 row(s) in 0.0300 seconds |
We can see we have no actual records in the table
1 2 3 4 |
hbase(main):011:0> count 'employee' 0 row(s) in 0.0380 seconds => 0 |
Let’s insert a single record with an ID (row key) of 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
hbase(main):012:0> put 'employee', 1, 'personal:first_name', 'John' 0 row(s) in 0.0730 seconds hbase(main):013:0> list TABLE employee 1 row(s) in 0.0130 seconds => ["employee"] hbase(main):014:0> get 'employee', 1 COLUMN CELL personal:first_name timestamp=1520557083015, value=John 1 row(s) in 0.0280 seconds |
You can see we inserted a single record with a column first_name
inside of the column family personal
. There are no constraints in our table, so we can leave entire columns out or create new ones on the fly.
We will add a bunch more data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
hbase(main):015:0> put 'employee', 3, 'personal:first_name', 'Sally' 0 row(s) in 0.0150 seconds hbase(main):016:0> put 'employee', 3, 'personal:last_name', 'Jones' 0 row(s) in 0.0050 seconds hbase(main):017:0> put 'employee', 3, 'personal:age', '32' 0 row(s) in 0.0030 seconds hbase(main):018:0> put 'employee', 3, 'personal:gender', 'female' 0 row(s) in 0.0030 seconds hbase(main):019:0> put 'employee', 3, 'personal:martial_status', 'divorced' 0 row(s) in 0.0040 seconds hbase(main):020:0> hbase(main):021:0* put 'employee', 3, 'professional:occupation', 'doctor' 0 row(s) in 0.0040 seconds hbase(main):022:0> put 'employee', 3, 'professional:education', 'MD' 0 row(s) in 0.0030 seconds |
Notice I inserted the above record using a row key that was not 2, instead I skipped 2. HBase doesn’t care what you make the row key, it can be a string, number, even an array.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
hbase(main):023:0> put 'employee', 2, 'personal:first_name', 'Alex' 0 row(s) in 0.0150 seconds hbase(main):024:0> put 'employee', 2, 'personal:last_name', 'Wright' 0 row(s) in 0.0050 seconds hbase(main):025:0> put 'employee', 2, 'personal:age', '24' 0 row(s) in 0.0030 seconds hbase(main):026:0> put 'employee', 2, 'personal:gender', 'male' 0 row(s) in 0.0030 seconds hbase(main):027:0> put 'employee', 2, 'personal:martial_status', 'single' 0 row(s) in 0.0030 seconds hbase(main):028:0> hbase(main):029:0* put 'employee', 2, 'professional:occupation', 'cab driver' 0 row(s) in 0.0030 seconds hbase(main):030:0> put 'employee', 2, 'professional:education', 'high school' 0 row(s) in 0.0140 seconds |
Now that we have inserted information regarding three employees, lets take a look at our table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
hbase(main):031:0> count 'employee' 3 row(s) in 0.0170 seconds => 3 hbase(main):032:0> scan 'employee' ROW COLUMN+CELL 1 column=personal:first_name, timestamp=1520557083015, value=John 2 column=personal:age, timestamp=1520557533759, value=24 2 column=personal:first_name, timestamp=1520557533683, value=Alex 2 column=personal:gender, timestamp=1520557533790, value=male 2 column=personal:last_name, timestamp=1520557533725, value=Wright 2 column=personal:martial_status, timestamp=1520557533821, value=single 2 column=professional:education, timestamp=1520557553040, value=high school 2 column=professional:occupation, timestamp=1520557533875, value=cab driver 3 column=personal:age, timestamp=1520557387868, value=32 3 column=personal:first_name, timestamp=1520557387787, value=Sally 3 column=personal:gender, timestamp=1520557387897, value=female 3 column=personal:last_name, timestamp=1520557387827, value=Jones 3 column=personal:martial_status, timestamp=1520557387926, value=divorced 3 column=professional:education, timestamp=1520557388005, value=MD 3 column=professional:occupation, timestamp=1520557387976, value=doctor 3 row(s) in 0.0310 seconds |
We can easily make changes to any information:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
hbase(main):034:0> put 'employee', 3, 'personal:martial_status', 'married' 0 row(s) in 0.0150 seconds hbase(main):035:0> get 'employee', 3 COLUMN CELL personal:age timestamp=1520557387868, value=32 personal:first_name timestamp=1520557387787, value=Sally personal:gender timestamp=1520557387897, value=female personal:last_name timestamp=1520557387827, value=Jones personal:martial_status timestamp=1520557942262, value=married professional:education timestamp=1520557388005, value=MD professional:occupation timestamp=1520557387976, value=doctor 7 row(s) in 0.0050 seconds |
We can delete just a single cell if we wish
1 2 3 4 5 6 7 8 9 10 |
hbase(main):044:0> delete 'employee', 1, 'personal:martial_status' 0 row(s) in 0.0140 seconds hbase(main):045:0> get 'employee', 1 COLUMN CELL personal:age timestamp=1520558063384, value=50 personal:first_name timestamp=1520558063320, value=John personal:gender timestamp=1520558063410, value=male personal:last_name timestamp=1520558063358, value=Smith 4 row(s) in 0.0130 seconds |
We can use the exists
command to see if a table exists
1 2 3 |
hbase(main):046:0> exists 'employee' Table employee does exist 0 row(s) in 0.0180 seconds |
We have to disable
a table before we can drop it. Disabling a table flushes all the data in memory to disk.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
hbase(main):047:0> disable 'employee' 0 row(s) in 2.2590 seconds hbase(main):048:0> enable 'employee' 0 row(s) in 1.2500 seconds hbase(main):049:0> drop 'employee' ERROR: Table employee is enabled. Disable it first. Here is some help for this command: Drop the named table. Table must first be disabled: hbase> drop 't1' hbase> drop 'ns1:t1' hbase(main):050:0> disable 'employee' 0 row(s) in 2.2350 seconds hbase(main):051:0> drop 'employee' 0 row(s) in 1.2380 seconds |
We can see there are no more tables
1 2 3 4 5 |
hbase(main):052:0> list TABLE 0 row(s) in 0.0010 seconds => [] |
We will be repeating these commands in the next Part using Java in Basic HBase Java Classes and Methods – Part 3: Table Creation