Posts

Showing posts from February, 2018

Assignment 04: Using Kafka And Zookeeper

Steps to get Kafka up and running Step 1: Run this script after logging into snadbox cd /usr/hdp/current/kafka-broker Step 2:

Assignment 06 Kafka Demo

Quick demo of Kafka in Action Step 1: Start VM / Start Sandbox / log into sandbox Step 2: Run this script spark-shell --packages org.apache.spark.:spark-sql-kafka-0-10_2.11:2.2.0 Step 3: 

Assignment 03 Using Sqoop

Image
SQOOP - read data out of SQL database and load it to HDFS Very reliable Original tool Suggested reading: https://www.techrepublic.com/article/why-streaming-data-is-the-future-of-big-data-and-apache-kafka-is-leading-the-charge/  (Links to an external site.) Links to an external site. https://www.infoworld.com/article/3212204/big-data/all-your-streaming-data-are-belong-to-kafka.html  (Links to an external site.) Links to an external site. Streaming Data  https://www.manning.com/books/streaming-data  (Links to an external site.) Links to an external site. Confluent https://www.confluent.io/blog/  (Links to an external site.) Links to an external site. Confluent is the "commercial" backing for Kafka started by the original developers Basically what Databricks is to Spark; Confluent is to Kafka Scenario “80% of your sales come from 20% of your customers” - The Pareto Principle Customer segmentation has been a marketing tactic in use f