How to Install Hadoop on Windows with Cloudera VM
Let’s take a look at how to install Hadoop on Windows to practice Hadoop programming.
In order to process large data sets in Hadoop it is necessary to install a full version of Hadoop on a real cluster with nodes of computers ranging from tens to several thousands. However, we can start experimenting with Hadoop technology right away by downloading a sandbox installation in our computer. A Sandbox installation of Hadoop is a ready to run installation with core Hadoop module and other related Hadoop software packages bundled in a virtual machine(vm) image. It typically runs on a single node and it is good enough for us to learn Hadoop.
The three main sand box distributions of Hadoop are:
- Cloudera QuickStart VM
- Hortonworks Sandbox
- MapR Sandbox for Hadoop
All the above sandbox distributions can be downloaded for free from the respective websites.
We will go ahead with installing Cloudera QuickStart VM in Windows for our Hadoop learning purpose. Cloudera QuickStart VM comes with CentOS 6 operating system and the following Hadoop ecosystem and Development tools pre-installed.
Apache Hadoop Ecosystem Tools | Development Tools |
---|---|
Apache Hadoop | JDK 7 |
Apache Spark | Eclipse IDE (Luna) with Maven |
Apache Pig | MySQL database |
Apache Hive | Git Command Line |
Apache HBase | Perl |
Apache Impala | Python |
Hue | PHP |
Apache Oozie | |
Apache Solr |
So, there is no need for us to worry about installing all these software separately. Instead, we could simply install Cloudera QuickStart VM and get our hands dirty by developing Hadoop MapReduce code.
Before we can install and configure Cloudera QuickStart VM we need a VirtualBox to run it.
Note: VirtualBox allows us to run multiple operating systems as virtual machines in our computer at the same time. For instance, we can run Linux on our Windows PC, run Windows and Linux on our Mac etc.
Let’s watch the following video tutorial to install VirtualBox and to install and configure Cloudera QuickStart VM 5.8.0 in Windows.
Since the host and guest machines are running on different operating systems we are unable to share files between our Windows host and Linux guest (VM). This video tutorial will show us how to share our computer’s files with a Virtual Machine.
In this post, we learned on how to install VirtualBox, Cloudera QuickStart VM 5.8.0 in Windows and how to share files between Windows host and the Virtual Machine. If you have any questions or comments regarding this blogpost or would like to suggest another way to share files with the VM, please feel free to post it in the comment section below.