HTTPFS - Configure and Run with HDP 2.2.4.x

httpfs is needed to support a centralized WebHDFS interface to an HA enable NN Cluster.  This can be used by Hue or any other WebHDFS enabled client that needs to use a cluster configured with a High-Availability Namenode.

The installation is a piece of cake:

yum install hadoop-httpfs

But that's were the fun ends!!!  Configuring is a whole other thing.  It's not hard, if you know the right buttons to push.  Unfortunately, the buttons and directions for doing this can be quite aloof.

The httpfs service is a tomcat application that relies on having the Hadoop libraries and configuration available, so it can resolve your HDP installation.

When you do the installation (above), a few items are installed.

/usr/hdp/2.2.x.x-x/hadoop-httpfs
/etc/hadoop-httpfs/conf
/etc/hadoop-httpfs/tomcat-deployment

Configuring - The Short Version

Set the version for current with 

hdp-select set hadoop-httpfs 2.2.x.x-x

From this point on, many of our changes are designed to "fix" the poor "hardcoded" implementations in the deployed scripts.

Adjust the /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh script

vi httpfs.sh
#!/bin/bash
# Autodetect JAVA_HOME if not defined
if [ -e /usr/libexec/bigtop-detect-javahome ]; then
  . /usr/libexec/bigtop-detect-javahome
elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then
  . /usr/lib/bigtop-utils/bigtop-detect-javahome
fi


### Added to assist with locating the right configuration directory
export HTTPFS_CONFIG=/etc/hadoop-httpfs/conf


### Remove the original HARD CODED Version reference...  I mean, really???
export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}
export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec

exec /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh.distro "$@"

 

Now let's create a few symlinks to connect the pieces together

Required Symlinks
cd /usr/hdp/current/hadoop-httpfs
ln -s /etc/hadoop-httpfs/tomcat-deployment/conf conf
ln -s ../hadoop/libexec libexec
 

 

Like all the other Hadoop components, httpfs follows use *-env.sh files to control the startup environment.  Above, in the httpfs.sh script we set the location of the configuration directory.  That is used to find and load the httpfs-env.sh file we'll modified below.

Setup your httpfs-env.sh file
# Add these to control and set the Catalina directories for starting and finding the httpfs application
export CATALINA_BASE=/usr/hdp/current/hadoop-httpfs
export HTTPFS_CATALINA_HOME=/etc/hadoop-httpfs/tomcat-deployment
 
# Set a log directory that matches your standards
export HTTPFS_LOG=/var/log/hadoop/httpfs
 
# Set a tmp directory for httpfs to store interim files
export HTTPFS_TEMP=/tmp/httpfs

 

That's it!!  Now run it!

Manually Starting
cd /usr/hdp/current/hadoop-httpfs/sbin
./httpfs.sh start
 
# To Stop
./httpfs.sh stop 

Try it out!!

http://m1.hdp.local:14000/webhdfs/v1/user?user.name=hdfs&op=LISTSTATUS

Obviously, changing out with your target host.  The default port is 14000.  If you want to change that, add the following to:

/etc/hadoop-httpfs/conf/httpfs-env.sh
export HTTPFS_HTTP_PORT=<new_port>

Want to Add httpfs as a Service (auto-start)?

The HDP installation puts a set of init.d files in the specific versions directory.

Secret Location for Service Startup Scripts
cd /usr/hdp/<hdp.version>/etc/rc.d/init.d

Create a symlink to this in /etc/init.d

Symlink Service Script
ln -s /usr/hdp/<hdp.version>/etc/rc.d/init.d/hadoop-httpfs /etc/init.d/hadoop-httpfs

Then set up the service to run on restart

Automate
# As Root User
chkconfig --add hadoop-httpfs
Controlling the Service
# Start Service
service hadoop-httpfs start
 
# Stop Service
service hadoop-httpfs stop

This method will run the service as the 'httpfs' user. Ensure that the 'httpfs' user has permissions to write to the log directory (/var/log/hadoop/httpfs if you followed these directions).

 

A Little More Detail 

Proxies are fun, aren't they?  We'll they'll affect you here as well.  The directions here mention these proxy settings in core-site.xml.

HDFS Proxy Setting (core-site.xml)
<property>
 <name>hadoop.proxyuser.httpfs.groups</name>
 <value>*</value>
</property>

<property>
 <name>hadoop.proxyuser.httpfs.hosts</name>
 <value>*</value>
</property>

Proxy Settings

This means that httpfs.sh must be run as the httpfs user, in order to work.  If you want to run the service with another user, adjust the proxy settings above.

Proxy Setting For 'root'
 <property>
 <name>hadoop.proxyuser.root.groups</name>
 <value>*</value>
</property>

<property>
 <name>hadoop.proxyuser.root.hosts</name>
 <value>*</value>
</property>