httpfs is needed to support a centralized WebHDFS interface to an HA enable NN Cluster. This can be used by Hue or any other WebHDFS enabled client that needs to use a cluster configured with a High-Availability Namenode.
The installation is a piece of cake:
yum install hadoop-httpfs
But that's were the fun ends!!! Configuring is a whole other thing. It's not hard, if you know the right buttons to push. Unfortunately, the buttons and directions for doing this can be quite aloof.
The httpfs service is a tomcat application that relies on having the Hadoop libraries and configuration available, so it can resolve your HDP installation.
When you do the installation (above), a few items are installed.
Configuring - The Short Version
Set the version for current with
hdp-select set hadoop-httpfs 2.3.x.x-x
Fix the ownership of the /etc/hadoop-httpfs directory. Because httpfs will be run by the 'httpfs' user.
chown -R httpfs hadoop-httpfs
Now let's create a few symlinks to connect the pieces together
# Point to the 'webapps' in current.
ln -s /usr/hdp/current/hadoop-httpfs/webapps webapps
Like all the other Hadoop components, httpfs follows use *-env.sh files to control the startup environment. Above, in the httpfs.sh script we set the location of the configuration directory. That is used to find and load the httpfs-env.sh file we'll modified below.
# Set JAVA_HOME
# Set a log directory that matches your standards
# Set a tmp directory for httpfs to store interim files
That's it!! Now run it!
sudo su - httpfs
# To Stop
Try it out!!
Obviously, changing out with your target host. The default port is 14000. If you want to change that, add the following to:
Want to Add httpfs as a Service (auto-start)?
The HDP installation puts a set of init.d files in the specific versions directory.
Create a symlink to this in /etc/init.d
ln -s /usr/hdp/<hdp.version>/etc/rc.d/init.d/hadoop-httpfs /etc/init.d/hadoop-httpfs
Then set up the service to run on restart
# As Root User
chkconfig --add hadoop-httpfs
# Start Service
service hadoop-httpfs start
# Stop Service
service hadoop-httpfs stop
This method will run the service as the 'httpfs' user. Ensure that the 'httpfs' user has permissions to write to the log directory (/var/log/hadoop/httpfs if you followed these directions).
A Little More Detail
Proxies are fun, aren't they? We'll they'll affect you here as well. The directions here mention these proxy settings in core-site.xml.
This means that httpfs.sh must be run as the httpfs user, in order to work. If you want to run the service with another user, adjust the proxy settings above.