Once launched, we will see a CentOS running with the CDH Hadoop distribution of Cloudera and the most common toolset for this kind of environments. Before launching it, I recommend doing some configuration tunning like increasing the RAM size (Cloudera Manager requires at least 8GB) and processor cores used. After the file is downloaded, uncompress the rar-file, and mount the VM.
In this case, I opted for the VMWare option. Downloading the VMįirst of all, we need to download the Cloudera’s Quickstart VM (CDH 5.13) from here.
The environment set here, it is going to be used in my personal Big Data projects you’ll be able to find in this blog. Like is the case of the Cloudera’s Quickstart VM, that incorporates the Cloudera distribution of Hadoop (CDH), and a set of tools like Spark, HDFS, Hive, and Flume, among others.ĭuring this post, I am going to show you how to set up and configure the latest Cloudera’s Quickstart VM (CDH 5.13), to get it ready to use with the latest version of Spark (2.4) and Kafka. If we want to learn how to use the technologies behind, we need to make use of VMs with a pseudo cluster assembled in it, and a set of tools pre-installed and ready to use. Unfortunately, people normally don’t have access to any of them. As you probably know, to operate with Big Data, we need a cluster of several nodes.