Build a Local Repo for HDP with Chef

Building out a local repository while on-site at a client can be a real PITA.  When done manually, the process is error prone and can waste valuable time while on-site at the client.

These instructions will help you build out a NEW repository from scratch, on a basic "minimal" installation of CentOS 6.4.  I'm sure it would work with RHEL 6.4 and other 6.x variants as well.

We'll be using a provisioning process called "Chef".  I won't bore you with the details of Chef, if you're interested following the link, but basically it's a scripted provisioning system that will install and configure a system based on a selected set of "recipes".  These "recipes" contain the logic required to install the various components you identify.  In our case, I've created a chef recipe to build out an HDP local repository, which will be used to automate this buildout process.

Let's Get Started

Step #1: OS Environment:

Build out a 'minimal' installation of the OS with network connectivity and access to the internet to pull down all the bits and pieces.  You won't need to do anything else beyond the basic installation and network configuration.

Make sure you establish a FQDN for the host, this will be used later in the process and also used to find the repo host during our HDP installation.

Assumptions:
  • All commands are run as 'root' or as a 'sudo' user.
  • Network connectivity
  • The OS has at least 35GB (should make it 50GB) of available disk space to complete the repository buildout.
  • Access to the Internet (at least during the time your building the repo)
Step #2: Additional Components to Install on the OS:
Step #3: Control Files

Next we need to build a few control files that will help our system find the bit and pieces needed to build out the system.

Pick a root directory, it's location isn't important, as long as we are consistent from this point on:

mkdir hdp_repo_init
cd hdp_repo_init

The assumption going forward is that all commands will be run from this newly created directory.

Create a file called 'Cheffile'. This tells the librarian-chef which 'recipes' to pull down and build out the chef local repository (different then our HDP repo) used for the provisioning process.  Place the following contents in it:

Cheffile
#!/usr/bin/env ruby
#^syntax detection

site 'http://community.opscode.com/api/v1'

cookbook 'apache2'
cookbook 'iptables'

cookbook 'hdp-repo', :git => 'https://github.com/dstreev/chef_recipes', :path => 'hdp-repo'

Create a file called 'solo.rb'.  This tells chef were to find it's cookbooks.  These are install, later, by the librarian-chef gems library we install above. Place the following contents in it:

solo.rb
root = File.absolute_path(File.dirname(__FILE__))

file_cache_path  root
cookbook_path    root + '/cookbooks'
role_path		 root + '/roles'

log_level        :info
log_location     STDOUT
ssl_verify_mode  :verify_none 

Create a file called 'solo.json'. This controls what 'recipes' and 'roles' we will be using.

solo.json
{ "run_list": "role[local_repo]" } 

Replace the following contents for default_attributes.hdp_repo.location.host with the FQDN of the target Local Repo your building in the file below.

This chef script is capable of bringing down multiple versions of HDP, as you might be able to see below.  With the configuration below (2 OS, 2 Ambari versions, 3 each of HDP 1.3 and 2.0), 50 Gig of space was NOT enough.  If you need all of these, you will need to create a 75 Gig drive (at least). 

The "default_version" elements for Ambari and HDP_Utils will be used to construct a template "ambari.repo" file in /var/www/html/repos/local.yum.repos.d that has the entries to point to the repo you are building.

roles/local_repo.json
{
    "name": "local_repo",
    "default_attributes": {
        "hdp_repo": {
        	"os_base" : {
        		"items": ["centos6"]
        	},
            "location": {
				"host": "repo.hwx.test"
	    	},
            "ambari": {
            	"default_version": "1.5.1",
                "versions": ["1.4.4.23","1.4.3.38","1.4.2.104"]
            },
            "hdp_utils": {
            	"default_version": "1.1.0.17",
                "versions": ["1.1.0.16","1.1.0.17"]
            },
            "hdp_1.3" : {
                "versions" : ["1.3.3.2"]
            },
            "hdp_2.0" : {
                "versions" : ["2.0.6.1","2.0.10.0"]
            },
            "hdp_2.1" : {
                "versions" : {
					"GA": ["2.1.1.0"],
					"updates": []
				}
            }
        },
        "apache": {
            "default_site_enabled": true
        }
    },
    "json_class": "Chef::Role",
    "description": "This is the base role for an HDP Repo Buildout",
    "chef_type": "role",
    "run_list": [
        "recipe[hdp-repo]"
    ]
}

```

Step #4: Initialize the Chef Recipe Local Repository

Base on the settings in the 'Cheffile', we will bring down the required recipes needed for this installation.

librarian-chef install --verbose

Check back here for updates to the recipes and other various links. If we update the 'hdp-repo' recipe, it will not be picked up UNLESS you delete the 'Cheffile.lock' file AND run the above command again to refresh the local copy of the recipe. This Cheffile.lock file remembers the SHA1 for the release pulled down during the initial fetch. Just delete the file and run the above command again the ensure you get the latest recipes.

 

Step #5: Build out our Local Repository

This process will take some time.  If you can run it over night, you'll be better off.  It will download as much as 30GB, so depending on your connection, you could be waiting for awhile.

Because it does attach and download many libraries, it has a tendency to timeout/fail during the buildout process. It is OK to run the command below several times, until the entire repo buildout completes successfully. The repo buildout is designed to be restartable and when run again it will:

  • Complete what wasn't completed before
  • Update the repos with the most current versions found on the mirrors.
chef-solo -c solo.rb -j solo.json

You can review the actual process used to buildout and create the local repository that you'll point to from Ambari (1.4.2 and above).

Extras:

The local repo buildout process will make a copy of the "JDK" the Hortonworks keeps on it's repo site and is used by the default installation process of Ambari.  It stores it in the "/var/www/html/repos/artifacts" directory.  I also use this to store "other" jdk's, that I might use during the installation process. These JDK's will need to be copied over manually.

If you are using me "vagrant" VM cluster buildout on github.com https://github.com/dstreev/vagrant , which uses a few Chef recipes I've created for configuring a HDP node (https://github.com/dstreev/chef_recipes), you'll see that I use JDK 7 as a standard for my installations.  To support that process, you'll need to get the JDK (tar.gz) from Oracle and put it in the "artifacts" directory.

Save some space

During the buildout process, Chef will download the rpm tar.gz files from the Hortonworks repo and store them in /var/www/html/tgz.  This is the staging directory used to buildout the yum repos.  While these files are not longer used, after the buildout, I do NOT recommend simply deleting them.  If you were to rerun the process to download updates, the process will look for these files and replace them if they are not there.  To save space AND prevent them from being downloaded again, I just replace them with zero byte files.

echo -n "" > xxx.tar.gx

This will fool the check and proceed to the next step.  In return, reclaiming nearly 40% of your drive space.