Let’s take a look at how to install Hadoop on Windows to practice Hadoop programming.
In order to process large data sets in Hadoop it is necessary to install a full version of Hadoop on a real cluster with nodes of computers ranging from tens to several thousands. However, we can start experimenting with Hadoop technology right away by downloading a sandbox installation in our computer. A Sandbox installation of Hadoop is a ready to run installation with core Hadoop module and other related Hadoop software packages bundled in a virtual machine(vm) image. It typically runs on a single node and it is good enough for us to learn Hadoop.
The three main sand box distributions of Hadoop are:
- Cloudera QuickStart VM
- Hortonworks Sandbox
- MapR Sandbox for Hadoop
All the above sandbox distributions can be downloaded for free from the respective websites.
We will go ahead with installing Cloudera QuickStart VM in Windows for our Hadoop learning purpose. Cloudera QuickStart VM comes with CentOS 6 operating system and the following Hadoop ecosystem and Development tools pre-installed.
|Apache Hadoop Ecosystem Tools||Development Tools|
|Apache Hadoop||JDK 7|
|Apache Spark||Eclipse IDE (Luna) with Maven|
|Apache Pig||MySQL database|
|Apache Hive||Git Command Line|
So, there is no need for us to worry about installing all these software separately. Instead, we could simply install Cloudera QuickStart VM and get our hands dirty by developing Hadoop MapReduce code.
Before we can install and configure Cloudera QuickStart VM we need a VirtualBox to run it.
Note: VirtualBox allows us to run multiple operating systems as virtual machines in our computer at the same time. For instance, we can run Linux on our Windows PC, run Windows and Linux on our Mac etc.
Let’s watch the following video tutorial to install VirtualBox and to install and configure Cloudera QuickStart VM 5.8.0 in Windows.
Since the host and guest machines are running on different operating systems we are unable to share files between our Windows host and Linux guest (VM). This video tutorial will show us how to share our computer’s files with a Virtual Machine.
In this post, we learned on how to install VirtualBox, Cloudera QuickStart VM 5.8.0 in Windows and how to share files between Windows host and the Virtual Machine. If you have any questions or comments regarding this blogpost or would like to suggest another way to share files with the VM, please feel free to post it in the comment section below.
At ByteQuest, we are planning to offer face-to-face (in person) Big Data training courses in Bay Area, CA. If you are interested in enrolling, please click here to learn more. If you would like to receive our latest posts & updates on big data training directly in your email inbox, please subscribe. If you have any questions or suggestions for us, please feel free to contact us.
- Is the growth of Big Data Analytics and Artificial Intelligence going to take my job away? - January 8, 2018
- 10 Ways to Contribute to Open Source - June 28, 2017
- Demand for Data Engineers and Data Scientists Remains High - May 1, 2017
- Run a MapReduce job in Pseudo-Distributed Mode - February 17, 2017
- How to make sense of Bytes measured in Binary and Decimal? - February 12, 2017
- How to Install Hadoop on Windows with Cloudera VM - February 2, 2017
- Freely available Large Datasets to try out Hadoop - January 3, 2017
- Building a MapReduce Maven Project with Eclipse - December 29, 2016