Errors when running Pig from Oozie, accessing HCatalog

First thing, make sure you are accessing the Shared Libraries

Your oozie workflow properties should have the reference: `oozie.use.system.libpath=true`.

Case # 1:

Description:

Complaints about "derby" class not found.  This is related to Hive Statistics: https://cwiki.apache.org/confluence/display/Hive/StatsDev

Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("org.apache.derby.jdbc.EmbeddedDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. at org.datanucleus.store.rdbms.datasource.AbstractDataSourceFactory.loadDriver(AbstractDataSourceFactory.java:57) at org.datanucleus.store.rdbms.datasource.DBCPDataSourceFactory.makePooledDataSource(DBCPDataSourceFactory.java:54) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:182) ... 107 more

Solution:

HCatalog tries to records statistics for running jobs and by default uses a local instance of derby to do this. Checkout the link to Hive Stats for details.

To get over this, you need to add the derby library to the jobs classpath. See Oozie Shared Libs configuration.

Case #2:

Description:

Can't Find Table. The pig script has no idea how to talk to HCatalog without a configuration.

Example
While loading I am getting this exception although there is a mars_fdr_pig.mars_3g.
 
Caused by: NoSuchObjectException(message:mars_fdr_pig.mars_3g table not found)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1380)

Solution:

Pig can't find 'hive-site.xml', so it's unable to see the metastore.  Add 'hive-site.xml' to the Oozie Shared Libs.

Case #3:

Description:

Issue loading HCatalog classes.  Can't find the HCatalog jar files from the pig context.

Solution:

HCatalog libraries are not part of the default installation. Checkout Oozie Shared Libs  for instructions to complete the setup and resolve this issue.

Case #4:

Description:

UDF error - Can't Load Class / UDF

Solution:

You have a few scenarios here. In any case, the UDF 'jar' is not visible to the process running pig.

You can either:

    • Place the UDF library in the workflows "lib" directory so it will be automatically included in the classpath
    • Add the jar to the Oozie Shared Libs

Case #5:

Description:

UDF error - Registration Failure

Script Exerpt
register '/usr/lib/pig/piggybank.jar'
Resulting Error when run via Oozie
2014-01-08 14:15:59,271 [main] ERROR org.apache.pig.tools.grunt.Grunt  - ERROR 101: file '/usr/lib/pig/piggybank.jar' does not exist.
 

Solution:

Associated with the issue above, this happens when the pig script is developed and run from a command line and then transitioned over to run from an Oozie workflow.

The context that pig is running from is different and any registration called to a local file system will fail (most of the time). Follow the examples in this cookbook (see CASE #4) for solutions that reference jar's that need to be registered.