Linked e-resources
Details
Table of Contents
At a Glance; Contents; About the Author; About the Technical Reviewer; Acknowledgments; Introduction; Chapter 1: Big Data, Hadoop, and HDInsight; What Is Big Data?; The Scale-Up and Scale-Out Approaches; Apache Hadoop; A Brief History of Hadoop; HDFS; MapReduce; YARN; Hadoop Cluster Components; HDInsight; The Advantages of HDInsight; Summary; Chapter 2: Provisioning an HDInsight Cluster; An Azure Subscription; Creating the First Cluster; Basic Configuration Options; Creating a Cluster Using the Azure Portal; Connecting to a Cluster Using RDP; Connecting to a Cluster Using SSH
Creating a Cluster Using PowerShellCreating a Cluster Using an Azure Command-Line Interface; Creating a Cluster Using .NET SDK; The Resource Manager Template; HDInsight in a Sandbox Environment; Hadoop on a Virtual Machine; Hadoop on Windows; Preparing the Host Machine; Installing and Configuring Java JDK; Installing and configuring Python 2.7.x; Download and Install HDP for Windows; Summary; Chapter 3: Working with Data in HDInsight; Azure Blob Storage; The Benefits of Blob Storage; Uploading Data; Using Azure Command-Line Interface; Using Windows PowerShell
Using Microsoft Azure Storage ExplorerRunning MapReduce Jobs; Using PowerShell; Using .NET SDK; Hadoop Streaming; Streaming Mapper and Reducer; Serialization with Avro Library; Data Serialization; Binary Encoding; JSON Encoding; Using Microsoft Avro Library; Summary; Chapter 4: Querying Data with Hive; Hive Essentials; Hive Architecture; Submitting a Hive Query; Using Hive View; Using Secure Shell (SSH); Using Visual Studio; Using .NET SDK; Writing HiveQL; Data Types; Create/Drop/Alter/Use Database; The Hive Table; Internal Tables; External Tables; Storage Formats; Row Formats and SerDe
Partitioned TablesCreate Table Options; Temporary Tables; Data Retrieval; Hive Metastore; Apache Tez; Connecting to Hive Using ODBC and Power BI; ODBC and Power BI Configuration; Prepare Data for Analysis; Creating Hive Tables; Analyzing Data Using Power BI; Hive UDFs in C#; User Defined Function (UDF); User Defined Aggregate Functions (UDAF); User Defined Tabular Functions (UDTF); Summary; Chapter 5: Using Pig with HDInsight; Understanding Relations, Bags, Tuples, and Fields; Data Types; Connecting to Pig; Operators and Commands; Executing Pig Scripts; Summary; Chapter 6: Working with HBase
OverviewWhere to Use HBase?; The Architecture of HBase; HBase HMaster; HRegion and HRegion Server; ZooKeeper; HBase Meta Table; Read and Write to an HBase Cluster; HFile; Major and Minor Compaction; Creating an HBase Cluster; Working with HBase; HBase Shell; Create Tables and Insert Data; HBase Shell Commands; Using .NET SDK to read/write Data; Writing Data; Reading/Querying Data; Summary; Chapter 7: Real-Time Analytics with Storm; Overview; Storm Topology; Stream Groupings; Storm Architecture; Nimbus; Supervisor Node; ZooKeeper; Worker, Executor, and Task; Creating a Storm Cluster
Creating a Cluster Using PowerShellCreating a Cluster Using an Azure Command-Line Interface; Creating a Cluster Using .NET SDK; The Resource Manager Template; HDInsight in a Sandbox Environment; Hadoop on a Virtual Machine; Hadoop on Windows; Preparing the Host Machine; Installing and Configuring Java JDK; Installing and configuring Python 2.7.x; Download and Install HDP for Windows; Summary; Chapter 3: Working with Data in HDInsight; Azure Blob Storage; The Benefits of Blob Storage; Uploading Data; Using Azure Command-Line Interface; Using Windows PowerShell
Using Microsoft Azure Storage ExplorerRunning MapReduce Jobs; Using PowerShell; Using .NET SDK; Hadoop Streaming; Streaming Mapper and Reducer; Serialization with Avro Library; Data Serialization; Binary Encoding; JSON Encoding; Using Microsoft Avro Library; Summary; Chapter 4: Querying Data with Hive; Hive Essentials; Hive Architecture; Submitting a Hive Query; Using Hive View; Using Secure Shell (SSH); Using Visual Studio; Using .NET SDK; Writing HiveQL; Data Types; Create/Drop/Alter/Use Database; The Hive Table; Internal Tables; External Tables; Storage Formats; Row Formats and SerDe
Partitioned TablesCreate Table Options; Temporary Tables; Data Retrieval; Hive Metastore; Apache Tez; Connecting to Hive Using ODBC and Power BI; ODBC and Power BI Configuration; Prepare Data for Analysis; Creating Hive Tables; Analyzing Data Using Power BI; Hive UDFs in C#; User Defined Function (UDF); User Defined Aggregate Functions (UDAF); User Defined Tabular Functions (UDTF); Summary; Chapter 5: Using Pig with HDInsight; Understanding Relations, Bags, Tuples, and Fields; Data Types; Connecting to Pig; Operators and Commands; Executing Pig Scripts; Summary; Chapter 6: Working with HBase
OverviewWhere to Use HBase?; The Architecture of HBase; HBase HMaster; HRegion and HRegion Server; ZooKeeper; HBase Meta Table; Read and Write to an HBase Cluster; HFile; Major and Minor Compaction; Creating an HBase Cluster; Working with HBase; HBase Shell; Create Tables and Insert Data; HBase Shell Commands; Using .NET SDK to read/write Data; Writing Data; Reading/Querying Data; Summary; Chapter 7: Real-Time Analytics with Storm; Overview; Storm Topology; Stream Groupings; Storm Architecture; Nimbus; Supervisor Node; ZooKeeper; Worker, Executor, and Task; Creating a Storm Cluster