httpfs is needed to support a centralized WebHDFS interface to an HA enable NN Cluster. This can be used by Hue or any other WebHDFS enabled client that needs to use a cluster configured with a High-Availability Namenode.
The installation is a piece of cake:
yum install hadoop-httpfs
But that's were the fun ends!!! Configuring is a whole other thing. It's not hard, if you know the right buttons to push. Unfortunately, the buttons and directions for doing this can be quite aloof.
The httpfs service is a tomcat application that relies on having the Hadoop libraries and configuration available, so it can resolve your HDP installation.
When you do the installation (above), a few items are installed.
Configuring - The Short Version
Set the version for current with
hdp-select set hadoop-httpfs 2.2.x.x-x
From this point on, many of our changes are designed to "fix" the poor "hardcoded" implementations in the deployed scripts.
Adjust the /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh script
# Autodetect JAVA_HOME if not defined
if [ -e /usr/libexec/bigtop-detect-javahome ]; then
elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then
### Added to assist with locating the right configuration directory
### Remove the original HARD CODED Version reference... I mean, really???
exec /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh.distro "$@"
Now let's create a few symlinks to connect the pieces together
ln -s /etc/hadoop-httpfs/tomcat-deployment/conf conf
ln -s ../hadoop/libexec libexec
Like all the other Hadoop components, httpfs follows use *-env.sh files to control the startup environment. Above, in the httpfs.sh script we set the location of the configuration directory. That is used to find and load the httpfs-env.sh file we'll modified below.
# Add these to control and set the Catalina directories for starting and finding the httpfs application
# Set a log directory that matches your standards
# Set a tmp directory for httpfs to store interim files
That's it!! Now run it!
# To Stop
Try it out!!
Obviously, changing out with your target host. The default port is 14000. If you want to change that, add the following to:
Want to Add httpfs as a Service (auto-start)?
The HDP installation puts a set of init.d files in the specific versions directory.
Create a symlink to this in /etc/init.d
ln -s /usr/hdp/<hdp.version>/etc/rc.d/init.d/hadoop-httpfs /etc/init.d/hadoop-httpfs
Then set up the service to run on restart
# As Root User
chkconfig --add hadoop-httpfs
# Start Service
service hadoop-httpfs start
# Stop Service
service hadoop-httpfs stop
This method will run the service as the 'httpfs' user. Ensure that the 'httpfs' user has permissions to write to the log directory (/var/log/hadoop/httpfs if you followed these directions).
A Little More Detail
Proxies are fun, aren't they? We'll they'll affect you here as well. The directions here mention these proxy settings in core-site.xml.
This means that httpfs.sh must be run as the httpfs user, in order to work. If you want to run the service with another user, adjust the proxy settings above.