How to Install Hadoop on Windows with Cloudera VM

Let’s take a look at how to install Hadoop on Windows to practice Hadoop programming.

In order to process large data sets in Hadoop it is necessary to install a full version of Hadoop on a real cluster with nodes of computers ranging from tens to several thousands. However, we can start experimenting with Hadoop technology right away by downloading a sandbox installation in our computer. A Sandbox installation of Hadoop is a ready to run installation with core Hadoop module and other related Hadoop software packages bundled in a virtual machine(vm) image. It typically runs on a single node and it is good enough for us to learn Hadoop.

The three main sand box distributions of Hadoop are:

  • Cloudera QuickStart VM
  • Hortonworks Sandbox
  • MapR Sandbox for Hadoop

All the above sandbox distributions can be downloaded for free from the respective websites.

We will go ahead with installing Cloudera QuickStart VM in Windows for our Hadoop learning purpose. Cloudera QuickStart VM comes with CentOS 6 operating system and the following Hadoop ecosystem and Development tools pre-installed.

Apache Hadoop Ecosystem Tools Development Tools
Apache Hadoop JDK 7
Apache Spark Eclipse IDE (Luna) with Maven
Apache Pig MySQL database
Apache Hive Git Command Line
Apache HBase Perl
Apache Impala Python
Hue PHP
Apache Oozie
Apache Solr

So, there is no need for us to worry about installing all these software separately. Instead, we could simply install Cloudera QuickStart VM and get our hands dirty by developing Hadoop MapReduce code.

Before we can install and configure Cloudera QuickStart VM we need a VirtualBox to run it.

Note: VirtualBox allows us to run multiple operating systems as virtual machines in our computer at the same time. For instance, we can run Linux on our Windows PC, run Windows and Linux on our Mac etc.

Let’s watch the following video tutorial to install VirtualBox and to install and configure Cloudera QuickStart VM 5.8.0 in Windows.

Since the host and guest machines are running on different operating systems we are unable to share files between our Windows host and Linux guest (VM). This video tutorial will show us how to share our computer’s files with a Virtual Machine.

In this post, we learned on how to install VirtualBox, Cloudera QuickStart VM 5.8.0 in Windows and how to share files between Windows host and the Virtual Machine. If you have any questions or comments regarding this blogpost or would like to suggest another way to share files with the VM, please feel free to post it in the comment section below.

At ByteQuest, we are planning to offer face-to-face (in person) Big Data training courses in Bay Area, CA. If you are interested in enrolling, please click here to learn more. If you would like to receive our latest posts & updates on big data training directly in your email inbox, please subscribe. If you have any questions or suggestions for us, please feel free to contact us.

Profile photo of Sivagami Ramiah

About Sivagami Ramiah

Sivagami Ramiah is the founder and primary instructor with ByteQuest, the Big Data Training Institution, which stemmed from her passion for teaching Big Data and Machine Learning. She has 20 years of experience in software application development, majority of which was spent leading an Enterprise Application Development Team. As part of the Mining Massive Data Sets Graduate Certificate Program from Stanford University she had an opportunity to work on projects in Machine Learning and Social Network Analysis. In addition to being a Chief Instructor at ByteQuest, she is currently consulting for corporate clients in building end-to-end Industrial Internet of Things (IIoT) Solutions. She enjoys speaking in Tech Meetups. In her spare time, she loves working on applying Machine Learning Algorithms on Kaggle Open Data Sets.

0 responses on "How to Install Hadoop on Windows with Cloudera VM"

Leave a Message

Your email address will not be published. Required fields are marked *

About

ByteQuest is a Big Data and Machine Learning Training institution helping teach the next generation of Data Engineers and Data Scientists.
top