Common issue in the Software Development Lifecycle is the need to quickly bootstrap vanilla environment, deploy some code onto it, run it and then scrap it. This is a core concept in Continuous Integration / Continuous Delivery (CI/CD). It is a stepping stone towards immutable infrastructure. Properly automated implementation can also save time (no need to configure it manually) and money (no need to track potential regression issues in the development process).

Over the course of several years, found this to be extremely useful when used in BigData projects that use Hadoop. Installation of Hadoop is not always straight-forward. It depends on various internal and external components (JDK, Map-Reduce Framework, HDFS, etc). It can be messy. Different components communicate over various ports and protocols. HDFS uses somewhat clumsy semantics to deal with files and directories. For those and similar reasons we decided to present our take on Hadoop installation on a single node for development purposes.

The following shell script is simplified, fully functional skeleton implementation that will install Hadoop on a c3.xlarge, Fedora 20 node in AWS and run a test job on it:

Additional notes:

  • Please, edit the AWS_PROFILE variable. AWS CLI commands depend on this!
  • Activity log is kept in /tmp/test-hadoop-setup.log and will be recreated with every new run of the script.
  • In case of normal execution, all allocated resources will be cleaned upon termination.
  • This script is ready to be used as Jenkins build-and-deploy job.
  • Since the single-node Hadoop/HDFS is terminated, output data that goes to HDFS should be transferred out of the instance before termination!

Example run should look like:

Hopefully, this short introduction will advance your efforts to automate development tasks in BigData projects!

If you want to discuss more complex scenarios including automated deployments over multi-node Hadoop clusters, AWS Elastic MapReduce, AWS DataPipeline or other components of the BigData ecosystem, do not hesitate to Contact Us!

References

Share on LinkedIn0Tweet about this on TwitterShare on Facebook0Share on Google+0Email this to someone