user-avatar
Today is Thursday
April 25, 2024

Tag: commandline

August 23, 2015

Running GeoWave from command line for HBase options

by viggy — Categories: Uncategorized — Tags: , , , , , , Leave a comment

GeoWave provides GeoWaveMain class which can be used to run GeoWave from command line. For regular users GeoWave provides a wrapper script for the same which would be installed in case you are using the rpm or geowave through other packages. In my case, since I was running in a development environment, I am directly using the GeoWaveMain class with options from the command line. Here I document the results that I got while running the hbase related commands from the GeoWaveMain class from command line. The GeoWave documentation talks about running the commands from command line here. I used the same and the existing tests in GeoWave source code to deduce the commands.

In the following examples, I am running a local hbase and zookeeper instance.

First, to run the commands, we need to generate the tools jar using the command in the deploy source directory,

mvn package -P geowave-tools-singlejar  -Dmaven.test.skip=true

I am skipping tests as some of the tests fail in my local setup due to various reason and I do not want that to be reason for the maven build to fail.

Next I also generate the hbase datastore jar using the following mvn build command

mvn package -P hbase-container-singlejar -Dmaven.test.skip=true -Dfindbugs.onlyAnalyze=true

Here I am also asking FindBugs just to analyze as currently FindBugs fails the build due to the autogenerated code in FilterProtos, which is used for Filters in HBase. I have not yet been able to find a solution to fix the same and general solution was to skip findbugs for the build process using this switch. This would still generate the FindBugs.xml which can be analysed later using FindBugs GUI separately.

The last two command should generate geowave-deploy-0.8.8-SNAPSHOT-tools.jar and geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar in the target directory. Since we also use geotools-vector and gpx as formats in following commands, we need to copy their jar also here or refer to them in the classpath. In our case, we just copy the jar to the target directory.

Assuming that user’s current working directory is target, following command is used to run localhbaseingest:

java -cp geowave-deploy-0.8.8-SNAPSHOT-tools.jar:geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar:geowave-format-vector-0.8.8-SNAPSHOT.jar mil.nga.giat.geowave.core.cli.GeoWaveMain -localhbaseingest -z localhost:2181 -n geowave -f geotools-vector -b ../../test/data/hail_test_case/hail-box-temporal-filter.shp

To get hdfshbaseingest to work, you need to start Yarn from your hadoop installation. In this, we also use “gpx” format rather than “geotools-vector”(I need to understand why?). Hence we use the jar of gpx format and copy the same in our current working directory. Also this command expects that the directory gpx is already exists in HDFS file system. To create this, you can run the following command from your hadoop installation:

<Hadoop-Installation-Dir>/bin/hadoop fs -mkdir /gpx

Then we run the following command:

java -cp geowave-deploy-0.8.8-SNAPSHOT-tools.jar:geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar:geowave-format-gpx-0.8.8-SNAPSHOT.jar: mil.nga.giat.geowave.core.cli.GeoWaveMain -hdfshbaseingest -z localhost:2181 -n geowave -f gpx -b ../../test/data/hail_test_case/hail-box-temporal-filter.shp -hdfs localhost:9000 -hdfsbase “/” -jobtracker “localhost:8032”

-hdfs is the hostname:port of the hadoop installation

-hdfsbase is the parent directory in which we want to ingest

-jobtracker is the hostname:port of the yarn installation.

Currently for hdfshbasestage, I am getting the following error which need to be fixed:

➜  target git:(GEOWAVE-406) ✗ java -cp geowave-deploy-0.8.8-SNAPSHOT-tools.jar:geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar:geowave-format-gpx-0.8.8-SNAPSHOT-tools.jar:. mil.nga.giat.geowave.core.cli.GeoWaveMain -hdfshbasestage -b ~/workspace/geowave/extensions/formats/gpx/src/test/resources/ -hdfs localhost:9000 -hdfsbase “/gpx/” -f gpx
2015-08-23 04:41:27,085 FATAL [main] ingest.AbstractIngestHBaseCommandLineDriver (AbstractIngestHBaseCommandLineDriver.java:applyArguments(146)) – Error parsing plugins
java.lang.IllegalArgumentException: Unable to find SPI plugin provider for ingest format ‘gpx’
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.getPluginProviders(AbstractIngestHBaseCommandLineDriver.java:196)
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.applyArguments(AbstractIngestHBaseCommandLineDriver.java:141)
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.run(AbstractIngestHBaseCommandLineDriver.java:75)
at mil.nga.giat.geowave.core.cli.GeoWaveMain.main(GeoWaveMain.java:48)

Similarly for posthbasestage I am getting the same error:

➜  target git:(GEOWAVE-406) ✗ java -cp geowave-deploy-0.8.8-SNAPSHOT-tools.jar:geowave-deploy-0.8.8-SNAPSHOT-hbase-singlejar.jar:geowave-format-gpx-0.8.8-SNAPSHOT-tools.jar:. mil.nga.giat.geowave.core.cli.GeoWaveMain -posthbasestage -hdfs localhost:9000 -hdfsbase “/gpx/” -f gpx  -z localhost:2181 -n geowave -jobtracker “localhost:8032”
2015-08-23 04:44:16,261 FATAL [main] ingest.AbstractIngestHBaseCommandLineDriver (AbstractIngestHBaseCommandLineDriver.java:applyArguments(146)) – Error parsing plugins
java.lang.IllegalArgumentException: Unable to find SPI plugin provider for ingest format ‘gpx’
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.getPluginProviders(AbstractIngestHBaseCommandLineDriver.java:196)
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.applyArguments(AbstractIngestHBaseCommandLineDriver.java:141)
at mil.nga.giat.geowave.core.ingest.AbstractIngestHBaseCommandLineDriver.run(AbstractIngestHBaseCommandLineDriver.java:75)
at mil.nga.giat.geowave.core.cli.GeoWaveMain.main(GeoWaveMain.java:48)