user-avatar
Today is Thursday
November 21, 2024

Tag: hbase

August 23, 2015

Running GeoWave from command line for HBase options

by viggy — Categories: Uncategorized — Tags: , , , , , , Leave a comment

GeoWave provides GeoWaveMain class which can be used to run GeoWave from command line. For regular users GeoWave provides a wrapper script for the same which would be installed in case you are using the rpm or geowave through other packages. In my case, since I was running in a development environment, I am directly using the GeoWaveMain class with options from the command line. Here I document the results that I got while running the hbase related commands from the GeoWaveMain class from command line. The GeoWave documentation talks about running the commands from command line here. I used the same and the existing tests in GeoWave source code to deduce the commands.

In the following examples, I am running a local hbase and zookeeper instance.

First, to run the commands, we need to generate the tools jar using the command in the deploy source directory,

mvn package -P geowave-tools-singlejar  -Dmaven.test.skip=true

I am skipping tests as some of the tests fail in my local setup due to various reason and I do not want that to be reason for the maven build to fail.

Next I also generate the hbase datastore jar using the following mvn build command

mvn package -P hbase-container-singlejar -Dmaven.test.skip=true -Dfindbugs.onlyAnalyze=true

Here I am also asking FindBugs just to analyze as currently FindBugs fails the build due to the autogenerated code in FilterProtos, which is used for Filters in HBase. I have not yet been able to find a solution to fix the same and general solution was to skip findbugs for the build process using this switch. This would still generate the FindBugs.xml which can be analysed later using FindBugs GUI separately.

The last two command should generate geowave-deploy-0.8.8-SNAPSHOT-tools.jar and geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar in the target directory. Since we also use geotools-vector and gpx as formats in following commands, we need to copy their jar also here or refer to them in the classpath. In our case, we just copy the jar to the target directory.

Assuming that user’s current working directory is target, following command is used to run localhbaseingest:

java -cp geowave-deploy-0.8.8-SNAPSHOT-tools.jar:geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar:geowave-format-vector-0.8.8-SNAPSHOT.jar mil.nga.giat.geowave.core.cli.GeoWaveMain -localhbaseingest -z localhost:2181 -n geowave -f geotools-vector -b ../../test/data/hail_test_case/hail-box-temporal-filter.shp

To get hdfshbaseingest to work, you need to start Yarn from your hadoop installation. In this, we also use “gpx” format rather than “geotools-vector”(I need to understand why?). Hence we use the jar of gpx format and copy the same in our current working directory. Also this command expects that the directory gpx is already exists in HDFS file system. To create this, you can run the following command from your hadoop installation:

<Hadoop-Installation-Dir>/bin/hadoop fs -mkdir /gpx

Then we run the following command:

java -cp geowave-deploy-0.8.8-SNAPSHOT-tools.jar:geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar:geowave-format-gpx-0.8.8-SNAPSHOT.jar: mil.nga.giat.geowave.core.cli.GeoWaveMain -hdfshbaseingest -z localhost:2181 -n geowave -f gpx -b ../../test/data/hail_test_case/hail-box-temporal-filter.shp -hdfs localhost:9000 -hdfsbase “/” -jobtracker “localhost:8032”

-hdfs is the hostname:port of the hadoop installation

-hdfsbase is the parent directory in which we want to ingest

-jobtracker is the hostname:port of the yarn installation.

Currently for hdfshbasestage, I am getting the following error which need to be fixed:

➜  target git:(GEOWAVE-406) ✗ java -cp geowave-deploy-0.8.8-SNAPSHOT-tools.jar:geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar:geowave-format-gpx-0.8.8-SNAPSHOT-tools.jar:. mil.nga.giat.geowave.core.cli.GeoWaveMain -hdfshbasestage -b ~/workspace/geowave/extensions/formats/gpx/src/test/resources/ -hdfs localhost:9000 -hdfsbase “/gpx/” -f gpx
2015-08-23 04:41:27,085 FATAL [main] ingest.AbstractIngestHBaseCommandLineDriver (AbstractIngestHBaseCommandLineDriver.java:applyArguments(146)) – Error parsing plugins
java.lang.IllegalArgumentException: Unable to find SPI plugin provider for ingest format ‘gpx’
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.getPluginProviders(AbstractIngestHBaseCommandLineDriver.java:196)
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.applyArguments(AbstractIngestHBaseCommandLineDriver.java:141)
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.run(AbstractIngestHBaseCommandLineDriver.java:75)
at mil.nga.giat.geowave.core.cli.GeoWaveMain.main(GeoWaveMain.java:48)

Similarly for posthbasestage I am getting the same error:

➜  target git:(GEOWAVE-406) ✗ java -cp geowave-deploy-0.8.8-SNAPSHOT-tools.jar:geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar:geowave-format-gpx-0.8.8-SNAPSHOT-tools.jar:. mil.nga.giat.geowave.core.cli.GeoWaveMain -posthbasestage -hdfs localhost:9000 -hdfsbase “/gpx/” -f gpx  -z localhost:2181 -n geowave -jobtracker “localhost:8032”
2015-08-23 04:44:16,261 FATAL [main] ingest.AbstractIngestHBaseCommandLineDriver (AbstractIngestHBaseCommandLineDriver.java:applyArguments(146)) – Error parsing plugins
java.lang.IllegalArgumentException: Unable to find SPI plugin provider for ingest format ‘gpx’
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.getPluginProviders(AbstractIngestHBaseCommandLineDriver.java:196)
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.applyArguments(AbstractIngestHBaseCommandLineDriver.java:141)
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.run(AbstractIngestHBaseCommandLineDriver.java:75)
at mil.nga.giat.geowave.core.cli.GeoWaveMain.main(GeoWaveMain.java:48)

August 23, 2015

Using Google’s Protocol Buffer library to write GeoWave Filters for HBase datastore

by viggy — Categories: Uncategorized — Tags: , , , , Leave a comment

Accumulo provides Iterators which can be run on Tablet Servers as Filters during Scan. GeoWave uses this in form of local Client Filters and Distributable Filters which run on Tablet Servers when any scan is performed. As part of adding support for HBase, I needed to implement these filters in HBase. I have currently implemented two Filters, SingleEntryFilter and CqlHBaseQueryFilter which are counterparts for SingleEntryFilterIterator and CqlQueryFilterIterator in Accumulo.

Hbase makes use of Google’s Protocol Buffer to serialize data on client side and send it across to tablet servers. In this blog, I explain how I used the protobuf-java library to write the SingleEntryFilter for HBase in GeoWave.

Protobuf auto generates part of the code by using .proto file and its own code generator. The .proto file needs to contain the information about the arguments your class accepts in its constructor, which package the class needs to be generated in, etc. The arguments that class supports needs to be serializable. Since in our case, we are migrating from iterators, all the needed data are expected to be serializable as they need to be serialized even in case of iterators for Accumulo. I created a ‘protobuf’ directory inside local source directory ‘extensions/datastores/hbase/src/main/’ in GeoWave source code and in that created the following SingleEntryFilters.proto file.

 

option java_package = “mil.nga.giat.geowave.datastore.hbase.query.generated”;
option java_outer_classname = “FilterProtos”;
option java_generic_services = true;
option java_generate_equals_and_hash = true;
option optimize_for = SPEED;

message SingleEntryFilter {
required bytes adapterId = 1;
required bytes dataId = 2;
}

 

Now to generate the classes using protobuf, we need to install protobuf compiler on the machine. You can download the compiler from here.The README.txt given along with the compiler is quite explanatory for installing it.

After successful installation, by default the protoc compiler executable would be in src directory. Go to the source directory in which you want the generated package to be added. In my case, it was in <geowave-src-directory>/extensions/datastores/hbase/src/main/ .

Now, you can the following command.

<path-to-protoc-installation-dir>/src/protoc -I=. –java_out=java/ protobuf/SingleEntryFilters.proto

protobuf/SingleEntryFilter.proto is the path to the .proto file from your current directory.

 

This generated the necessary FilterProtos class. Now we need to create the SingleEntryFilter class. We use the FilterBase class provided by HBase to create new custom classes. Lars George’s book, HBase: The Definitive Guide  explains developing custom Filters for hbase and I used the example shared in the github repo for the book to develop Custom Filter for developing SingleEntryFilter.

It was only through that example that I came to know that I need to also implement toByteArray and parseFrom methods in the Custom Filter. Later I also found in HBase log that parseFrom method generates a DeserializationException which informs user about extending it in derived class.

June 23, 2015

Accumulo and Hbase – Differences in its API, classes

by viggy — Categories: tech — Tags: , , , , , , , , , Leave a comment

As part of my GSoC project, I am working on adding support of HBase in GeoWave. Currently Accumulo is already well supported. Hence one of my main task is to understand the existing tests and examples and add support for the same for a HBase store.
One of the main challenge I am facing is there is no easily available mapping of Accumulo Classes to HBase classes. So in this blog, I will try to do the same.

Basic Entity Classes
Accumulo HBase Remarks
Key KeyValue HBase doesnt have separate entity for Key and Value. It stores everything as byte[].
Value KeyValue
Mutation RowMutations Accumulo exposes put/putDelete method whereas HBase has Put and Delete classes to Put in and Delete a single row respectively.
Scanner/Writer Classes
Accumulo HBase Remarks
Scanner/ScannerBase Scan Accumulo has a structured class heirarchy for scanning where in HBase I have only used Scan as of now.

TODO: To add about establishing connection, authentication, Write to store, Iterators/Coprocessors