The effectively do any development on Hadoop, it's important to have a dataset that you can work with. Here are a few links to Datasets on the web. Start with basic csv datasets and work your way into other more complicated datasets, like XML and JSON.
A full Employee Dataset for MySQL: https://dev.mysql.com/doc/employee/en/employees-installation.html
XML Datasets: http://www.cs.washington.edu/research/xmldatasets/
Build your own: https://github.com/dstreev/hdp-data-gen