We will cover these basic steps:
Instantiating a configuration object
Establishing a connection to HBase
Manipulating tables using an administration object
Manipulating data within a table using a table instance
Creating our table
I am using Maven, and below is my pom.xml
that I will be using for all of these examples.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.foo</groupId> <artifactId>hbase-examples</artifactId> <version>1.0-SNAPSHOT</version> <packaging>jar</packaging> <properties> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> </properties> <repositories> <repository> <id>hbase-local</id> <name>Local HBase</name> <url>file:///Users/bfeeny/Documents/hbase/hbase-1.2.6/lib</url> <layout>default</layout> </repository> </repositories> <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client --> <dependencies> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.2.6</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.5</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>1.2.6</version> </dependency> </dependencies> </project> |
You may also want to optionally create a log4j.properties
file if using Maven. Maven uses this file for logging. We will create a basic properties file and store it in our main/resources
folder. For more information on log4j visit the project site. Here is the basic properties file we will use.
1 2 3 4 5 6 |
log4j.rootLogger=DEBUG, CA log4j.appender.CA=org.apache.log4j.ConsoleAppender log4j.appender.CA.layout=org.apache.log4j.PatternLayout log4j.appender.CA.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n |
AdminCreateTable
basic libraries and main method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.*; import java.io.IOException; public class AdminCreateTable { public static void main(String[] args) throws IOException { } } |
Instantiating a Configuration
object
We will instantiate a Configuration
object using the HBaseConfiguration
static method. The configuration Class is the base class for all config objects in the Hadoop ecosystem.
1 |
Configuration conf = HBaseConfiguration.create(); |
Establish a connection to the HBase cluster
We use the ConnectionFactory
to create a connection object by passing in our Configuration
object.
1 |
Connection connection = ConnectionFactory.createConnection(conf); |
All code which uses our connection should be in try / finally
blocks so that you can manage the connection manually and close the connection when you are done using it.
Instantiate an Admin
object
Because functions such as Creating or Removing tables are administrative functions, these must be done using the Admin
object. We will create an Admin
object from our Connection
object.
1 2 3 4 5 |
try { Admin admin = connection.getAdmin(); } finally { connection.close(); } |
You can see we put the instantiation of the Admin
object inside of our try/finally
clause. The rest of our commands will also be inside this clause. We added a command to close our connection in the finally
clause.
Create the table schema using an HTableDescriptor
We use an HTableDescriptor
to define our table, its properties such as column families, performance settings, etc. We will create a table of name employee
and two column families: personal
and professional
.
1 2 3 4 |
TableDescriptor tableName = new HTableDescriptor(TableName.valueOf("employee")); tableName.addFamily(new HColumnDescriptor("personal")); tableName.addFamily(new HColumnDescriptor("professional")); |
Create the table
We check to see if the table exists using our Admin
object. If it does not exist, we create it, and it does exist we print so. We use the createTable
method on our Admin
object and pass in the HTableDescriptor
we created previously.
1 2 3 4 5 6 7 8 9 |
if (!admin.tableExists(tableName.getTableName())) { System.out.print("Creating the employee table. "); admin.createTable(tableName); System.out.println("Done."); } else { System.out.println("Table already exists"); } |
Putting it all together we have the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.*; import java.io.IOException; public class AdminCreateTable { public static void main(String[] args) throws IOException { Configuration conf = HBaseConfiguration.create(); Connection connection = ConnectionFactory.createConnection(conf); try { Admin admin = connection.getAdmin(); HTableDescriptor tableName = new HTableDescriptor(TableName.valueOf("employee")); tableName.addFamily(new HColumnDescriptor("personal")); tableName.addFamily(new HColumnDescriptor("professional")); if (!admin.tableExists(tableName.getTableName())) { System.out.print("Creating the employee table. "); admin.createTable(tableName); System.out.println("Done."); } else { System.out.println("Table already exists"); } } finally { connection.close(); } } } |
Build and Run the code. You can verify the table has been created by going to the HBase Shell and verifying it exists.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
Brian-Feenys-Mac-Pro:hbase-examples bfeeny$ hbase shell 2018-03-11 20:55:32,254 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/bfeeny/Documents/hbase/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/bfeeny/Documents/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.6, rUnknown, Mon May 29 02:25:32 CDT 2017 hbase(main):001:0> list TABLE employee 1 row(s) in 0.1720 seconds => ["employee"] hbase(main):002:0> describe 'employee' Table employee is ENABLED employee COLUMN FAMILIES DESCRIPTION {NAME => 'personal', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} {NAME => 'professional', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 2 row(s) in 0.0960 seconds hbase(main):003:0> |
See you in the next part Basic HBase Java Classes and Methods – Part 4: Putting Data into a Table