BIG DATA ANALYTICS
[R17A0528]
LECTURE NOTES
B.TECH IV YEAR I SEM (R17)
(2020-2021)
MALLA REDDY
COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution UGC, Govt. of India)
Recognized under 2(f) and 12 (B) of UGC ACT 1956
(Affiliated to JNTUH, Hyderabad, Approved by AICTE - Accredited by NBA & NAAC ‘A’ Grade - ISO 9001:2015 Certified)
Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad 500100, Telangana State, India
(R17A0528) BIG DATA ANALYTICS
UNIT I
INTRODUCTION TO BIG DATA AND ANALYTICS
Classification of Digital Data, Structured and Unstructured Data
Introduction to Big Data: Characteristics Evolution Definition - Challenges with Big Data
- Other Characteristics of Data - Why Big Data - Traditional Business Intelligence versus Big
Data - Data Warehouse and Hadoop Environment Big Data Analytics: Classification of
Analytics Challenges - Big Data Analytics important - Data Science - Data Scientist -
Terminologies used in Big Data Environments - Basically Available Soft State Eventual
Consistency - Top Analytics Tools
UNIT II
INTRODUCTION TO TECHNOLOGY LANDSCAPE
NoSQL, Comparison of SQL and NoSQL, Hadoop -RDBMS Versus Hadoop - Distributed
Computing Challenges Hadoop Overview - Hadoop Distributed File System - Processing
Data with Hadoop - Managing Resources and Applications with Hadoop YARN -
Interacting with Hadoop Ecosystem
UNIT III
INTRODUCTION TO MONGODB AND MAPREDUCE PROGRAMMING
MongoDB: Why Mongo DB - Terms used in RDBMS and Mongo DB - Data Types -
MongoDB Query Language
MapReduce: Mapper Reducer Combiner Partitioner Searching Sorting Compression
UNIT IV
INTRODUCTION TO HIVE AND PIG
Hive: Introduction Architecture - Data Types - File Formats - Hive Query Language
Statements Partitions Bucketing Views - Sub- Query Joins Aggregations - Group by
and Having - RCFile Implementation - Hive User Defined Function - Serialization and
Deserialization. Pig: Introduction - Anatomy Features Philosophy - Use Case for Pig - Pig
Latin Overview - Pig Primitive Data Types - Running Pig - Execution Modes of Pig - HDFS
Commands - Relational Operators - Eval Function - Complex Data Types - Piggy Bank -
User-Defined Functions - Parameter Substitution - Diagnostic Operator - Word Count
Example using Pig - Pig at Yahoo! - Pig Versus Hive
UNIT V
INTRODUCTION TO DATA ANALYTICS WITH R
Machine Learning: Introduction, Supervised Learning, Unsupervised Learning, Machine
Learning Algorithms: Regression Model, Clustering, Collaborative Filtering, Associate Rule
Making, Decision Tree, Big Data Analytics with BigR.
Reference Book:
1. Judith Huruwitz, Alan Nugent, Fern Halper, Marcia Kaufman, “Big data for
dummies”, John Wiley & Sons, Inc.(2013)
2. Tom White, “Hadoop The Definitive Guide”, O’Reilly Publications, Fourth
Edition,2015
3. Dirk Deroos, Paul C.Zikopoulos, Roman B.Melnky, Bruce Brown, Rafael Coss,
“Hadoop For Dummies”, Wiley Publications,2014
4. Robert D.Schneider, “Hadoop For Dummies”, John Wiley & Sons, Inc.(2012)
5. Paul Zikopoulos, “Understanding Big Data: Analytics for Enterprise Class
Hadoop and Streaming Data, McGraw Hill, 2012 Chuck Lam, “Hadoop In
Action”, Dreamtech Publications, 2010
Text Book:
1. Seema Acharya, Subhashini Chellappan, “Big Data and Analytics”, Wiley
Publications, First Edition,2015
INDEX
S. No
Unit
Topic
Pg.No
1
I
INTRODUCTION TO BIG DATA AND ANALYTICS
Classification of Digital Data, Structured and Unstructured Data -
Introduction to Big Data
1
2
I
Why Big Data Traditional Business Intelligence versus Big Data - Data
Warehouse and Hadoop
4
3
I
Environment Big Data Analytics: Classification of Analytics Challenges
- Big Data Analytics importance
5
4
I
Data Science - Data Scientist - Terminologies used in Big Data Environments
10
5
I
Basically, Available Soft State Eventual Consistency -Top Analytics Tools
12
7
II
INTRODUCTION TO TECHNOLOGY LANDSCAPE
NoSQL, Comparison of SQL and NoSQL, Hadoop -RDBMS Versus
Hadoop - Distributed Computing
15
8
II
Challenges Hadoop Overview - Hadoop Distributed File System -
Processing Data with Hadoop -
20
9
II
Managing Resources and Applications with Hadoop YARN -
Interacting with Hadoop Ecosystem
22
111
10
III
INTRODUCTION TO MONGODB AND MAPREDUCE
PROGRAMMING
MongoDB: Why Mongo DB - Terms used in RDBMS and Mongo DB -
Data Types - MongoDB Query Language
24
1
11
III
MapReduce: Mapper Reducer Combiner Partitioner Searching
Sorting Compression
36
12
IV
INTRODUCTION TO HIVE AND PIG
Hive: Introduction Architecture - Data Types - File Formats - Hive
Query Language Statements
52
13
IV
Partitions Bucketing Views - Sub- Query Joins Aggregations -
Group by and Having - RCFile
70
14
IV
Implementation Hive User Defined Function - Serialization and
Deserialization. Pig: Introduction
75
15
IV
Anatomy Features Philosophy - Use Case for Pig - Pig Latin
Overview - Pig Primitive Data Types
76
16
IV
Running Pig - Execution Modes of Pig - HDFS Commands -Relational
Operators - Eval Function
79
17
IV
Complex Data Types - Piggy Bank - User-Defined Functions -
Parameter Substitution Diagnostic
82
18
IV
Operator - Word Count Example using Pig - Pig at Yahoo! - Pig Versus
Hive
93
19
V
INTRODUCTION TO DATA ANALYTICS WITH R
Machine Learning: Introduction, Supervised Learning, Unsupervised
Learning, Machine Learning
96
20
V
Algorithms: Regression Model, Clustering, Collaborative Filtering,
Associate Rule Making, Decision Tree, Big Data Analytics with BigR.
97