HTTPFS - Configure and Run with HDP 2.3.x

httpfs is needed to support a centralized WebHDFS interface to an HA enable NN Cluster.  This can be used by Hue or any other WebHDFS enabled client that needs to use a cluster configured with a High-Availability Namenode.

The installation is a piece of cake:

yum install hadoop-httpfs

But that's were the fun ends!!!  Configuring is a whole other thing.  It's not hard, if you know the right buttons to push.  Unfortunately, the buttons and directions for doing this can be quite aloof.

The httpfs service is a tomcat application that relies on having the Hadoop libraries and configuration available, so it can resolve your HDP installation.

When you do the installation (above), a few items are installed.

/usr/hdp/2.2.x.x-x/hadoop-httpfs
/etc/hadoop-httpfs/conf
/etc/hadoop-httpfs/tomcat-deployment

Configuring - The Short Version

Set the version for current with 

hdp-select set hadoop-httpfs 2.3.x.x-x

Fix the ownership of the /etc/hadoop-httpfs directory.  Because httpfs will be run by the 'httpfs' user.

cd /etc
chown -R httpfs hadoop-httpfs

 

Now let's create a few symlinks to connect the pieces together

Required Symlinks
# Point to the 'webapps' in current.
cd /etc/hadoop-httpfs/tomcat-deployment
ln -s /usr/hdp/current/hadoop-httpfs/webapps webapps
 

 

Like all the other Hadoop components, httpfs follows use *-env.sh files to control the startup environment.  Above, in the httpfs.sh script we set the location of the configuration directory.  That is used to find and load the httpfs-env.sh file we'll modified below.

Setup your httpfs-env.sh file
 
# Set JAVA_HOME
export JAVA_HOME=/usr/java/latest
 
# Set a log directory that matches your standards
export HTTPFS_LOG=/var/log/hadoop/httpfs
 
# Set a tmp directory for httpfs to store interim files
export HTTPFS_TEMP=/tmp/httpfs

 

That's it!!  Now run it!

Manually Starting
sudo su - httpfs
cd /usr/hdp/current/hadoop-httpfs/sbin
./httpfs.sh start
 
# To Stop
./httpfs.sh stop 

Try it out!!

http://m1.hdp.local:14000/webhdfs/v1/user?user.name=hdfs&op=LISTSTATUS

Obviously, changing out with your target host.  The default port is 14000.  If you want to change that, add the following to:

/etc/hadoop-httpfs/conf/httpfs-env.sh
export HTTPFS_HTTP_PORT=<new_port>

Want to Add httpfs as a Service (auto-start)?

The HDP installation puts a set of init.d files in the specific versions directory.

Secret Location for Service Startup Scripts
cd /usr/hdp/<hdp.version>/etc/rc.d/init.d

Create a symlink to this in /etc/init.d

Symlink Service Script
ln -s /usr/hdp/<hdp.version>/etc/rc.d/init.d/hadoop-httpfs /etc/init.d/hadoop-httpfs

Then set up the service to run on restart

Automate
# As Root User
chkconfig --add hadoop-httpfs
Controlling the Service
# Start Service
service hadoop-httpfs start
 
# Stop Service
service hadoop-httpfs stop

This method will run the service as the 'httpfs' user. Ensure that the 'httpfs' user has permissions to write to the log directory (/var/log/hadoop/httpfs if you followed these directions).

 

A Little More Detail 

Proxies are fun, aren't they?  We'll they'll affect you here as well.  The directions here mention these proxy settings in core-site.xml.

HDFS Proxy Setting (core-site.xml)
<property>
 <name>hadoop.proxyuser.httpfs.groups</name>
 <value>*</value>
</property>

<property>
 <name>hadoop.proxyuser.httpfs.hosts</name>
 <value>*</value>
</property>

Proxy Settings

This means that httpfs.sh must be run as the httpfs user, in order to work.  If you want to run the service with another user, adjust the proxy settings above.

Proxy Setting For 'root'
 <property>
 <name>hadoop.proxyuser.root.groups</name>
 <value>*</value>
</property>

<property>
 <name>hadoop.proxyuser.root.hosts</name>
 <value>*</value>
</property>