hbase integration with pyspark -
i trying access hbase pyspark in hdp 2.3 trying execute sample program given in spark directory using following command:
spark-submit --driver-class-path /usr/hdp/current/spark-client/lib/spark-examples-1.3.1.2.3.0.0-2557-hadoop2.7.1.2.3.0.0-2557.jar --jars /root/{user}/hbase-0.94.0.jar /usr/hdp/current/spark-client/examples/src/main/python/hbase_inputformat.py 10.77.36.78 iemployee
in starting giving class not found exception downloaded hbase-0.94.0.jar , previous error gone getting below error.
error log:
08:59:49 error tableinputformat: org.apache.hadoop.hbase.client.noserverforregionexception: unable find region iemployee,,99999999999999 after 10 tries. @ org.apache.hadoop.hbase.client.hconnectionmanager$hconnectionimplementation.locateregioninmeta(hconnectionmanager.java:926) @ org.apache.hadoop.hbase.client.hconnectionmanager$hconnectionimplementation.locateregion(hconnectionmanager.java:832) @ org.apache.hadoop.hbase.client.hconnectionmanager$hconnectionimplementation.locateregion(hconnectionmanager.java:801) @ org.apache.hadoop.hbase.client.hconnectionmanager$hconnectionimplementation.locateregioninmeta(hconnectionmanager.java:933) @ org.apache.hadoop.hbase.client.hconnectionmanager$hconnectionimplementation.locateregion(hconnectionmanager.java:836) @ org.apache.hadoop.hbase.client.hconnectionmanager$hconnectionimplementation.locateregion(hconnectionmanager.java:801) @ org.apache.hadoop.hbase.client.htable.finishsetup(htable.java:234) @ org.apache.hadoop.hbase.client.htable.<init>(htable.java:174) @ org.apache.hadoop.hbase.client.htable.<init>(htable.java:133) @ org.apache.hadoop.hbase.mapreduce.tableinputformat.setconf(tableinputformat.java:96) @ org.apache.spark.rdd.newhadooprdd.getpartitions(newhadooprdd.scala:91) @ org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:219) @ org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:217) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:217) @ org.apache.spark.rdd.mappartitionsrdd.getpartitions(mappartitionsrdd.scala:32) @ org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:219) @ org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:217) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:217) @ org.apache.spark.rdd.rdd.take(rdd.scala:1156) @ org.apache.spark.api.python.serdeutil$.pairrddtopython(serdeutil.scala:205) @ org.apache.spark.api.python.pythonrdd$.newapihadooprdd(pythonrdd.scala:499) @ org.apache.spark.api.python.pythonrdd.newapihadooprdd(pythonrdd.scala) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:606) @ py4j.reflection.methodinvoker.invoke(methodinvoker.java:231) @ py4j.reflection.reflectionengine.invoke(reflectionengine.java:379) @ py4j.gateway.invoke(gateway.java:259) @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:133) @ py4j.commands.callcommand.execute(callcommand.java:79) @ py4j.gatewayconnection.run(gatewayconnection.java:207) @ java.lang.thread.run(thread.java:745) traceback (most recent call last): file "/usr/hdp/current/spark-client/examples/src/main/python/hbase_inputformat.py", line 74, in <module> conf=conf) file "/usr/hdp/2.3.0.0-2130/spark/python/pyspark/context.py", line 547, in newapihadooprdd jconf, batchsize) file "/usr/hdp/2.3.0.0-2130/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ file "/usr/hdp/2.3.0.0-2130/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.py4jjavaerror: error occurred while calling z:org.apache.spark.api.python.pythonrdd.newapihadooprdd. : java.io.ioexception: no table provided. @ org.apache.hadoop.hbase.mapreduce.tableinputformatbase.getsplits(tableinputformatbase.java:143) @ org.apache.spark.rdd.newhadooprdd.getpartitions(newhadooprdd.scala:95) @ org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:219) @ org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:217) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:217) @ org.apache.spark.rdd.mappartitionsrdd.getpartitions(mappartitionsrdd.scala:32) @ org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:219) @ org.apache.spark.rdd.rdd$$anonfun$partitions$2.apply(rdd.scala:217) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.rdd.rdd.partitions(rdd.scala:217) @ org.apache.spark.rdd.rdd.take(rdd.scala:1156) @ org.apache.spark.api.python.serdeutil$.pairrddtopython(serdeutil.scala:205) @ org.apache.spark.api.python.pythonrdd$.newapihadooprdd(pythonrdd.scala:499) @ org.apache.spark.api.python.pythonrdd.newapihadooprdd(pythonrdd.scala) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:606) @ py4j.reflection.methodinvoker.invoke(methodinvoker.java:231) @ py4j.reflection.reflectionengine.invoke(reflectionengine.java:379) @ py4j.gateway.invoke(gateway.java:259) @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:133) @ py4j.commands.callcommand.execute(callcommand.java:79) @ py4j.gatewayconnection.run(gatewayconnection.java:207) @ java.lang.thread.run(thread.java:745)
}
while searching found common issue of persons facing there no solution yet. tried multiple ways no luck. in advance
possible reasons issue are:
- table named iemployee not created
- regionserver not running properly
- incompatible version of hbase jar
Comments
Post a Comment