About this blog
This blog is mostly about my pursuits in Data Science. Previous blog entries also dealt with storage, compute, virtualization and professional services. Currently the focus is on Data Science, including Big Data, Hadoop, Business Intelligence, Data Warehouse, Data Integration and Visualization. From time to time I will blog about other things of interest. The opinions expressed in this blog are entirely my own and should not be taken as the opinion of my employer.Blogroll
Category Archives: Data Analytics
Custom Pytorch Dataset Class for Timeseries Sequence Windows
Recently I was working on a problem with Time Series. Time Series can quickly add up to a lot of data, as you are using previous intervals to predict future intervals. What some people do is they create a very … Continue reading
Finding the ideal num_workers for Pytorch Dataloaders
One of the biggest bottlenecks in Deep Learning is loading data. having fast drives and access to the data is important, especially if you are trying to saturate a GPU or multiple processors. Pytorch has Dataloaders, which help you manage … Continue reading
Basic HBase Java Classes and Methods – Part 8: Disable and Delete a Table
In order to delete a table in HBase it must be disabled first. This forces any data in memory to be flushed to disk. Because this is an admin operation, we must create an Admin object, similar to how we … Continue reading
Basic HBase Java Classes and Methods – Part 7: Delete from a Table
Deleting data from an HBase table is very similar in overall structure to many of our previous operations. First we have our general skeleton code. As before we use static variable declarations to make the code look a lot nicer. … Continue reading
Basic HBase Java Classes and Methods – Part 6: Scan a Table
In HBase a Scan is similar to a Select in SQL. Again we return to skeleton code which is very similar to what we have seen before. I will put comments into the three areas we will be addressing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.Cell; import org.apache.hadoop.hbase.CellUtil; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.TableName; import org.apache.hadoop.hbase.client.Connection; import org.apache.hadoop.hbase.client.ConnectionFactory; import org.apache.hadoop.hbase.client.Table; import org.apache.hadoop.hbase.client.*; import java.io.IOException; public class ScanTable { public static void main(String[] args) throws IOException { Configuration conf = HBaseConfiguration.create(); Connection connection = ConnectionFactory.createConnection(conf); Table table = null; /* ResultScanner instantiation */ try { /* Table Scan Code */ } finally { connection.close(); if (table != null) { table.close(); } /* Close scanResult */ } } } |