Linked e-resources
Details
Table of Contents
Intro; Table of Contents; About the Authors; About the Technical Reviewer; Acknowledgments; Introduction; Chapter 1: Introduction to PySpark SQL; Introduction to Big Data; Volume; Velocity; Variety; Veracity; Introduction to Hadoop; Introduction to HDFS; Introduction to MapReduce; Introduction to Apache Hive; Introduction to Apache Pig; Introduction to Apache Kafka; Producer; Broker; Consumer; Introduction to Apache Spark; PySpark SQL: An Introduction; Introduction to DataFrames; SparkSession; Structured Streaming; Catalyst Optimizer; Introduction to Cluster Managers
Standalone Cluster ManagerApache Mesos Cluster Manager; YARN Cluster Manager; Introduction to PostgreSQL; Introduction to MongoDB; Introduction to Cassandra; Chapter 2: Installation; Recipe 2-1. Install Hadoop on a Single Machine; Problem; Solution; How It Works; Step 2-1-1. Creating a New CentOS User; Step 2-1-2. Adding a CentOS user to sudo; Step 2-1-3. Installing Java; Step 2-1-4. Creating Password-Less Logging from pysparksqlbook; Step 2-1-5. Downloading Hadoop; Step 2-1-6. Moving Hadoop Binaries to the Installation Directory; Step 2-1-7. Modifying the Hadoop Environment File
Step 2-1-8. Modifying the Hadoop Properties FilesStep 2-1-9. Updating the .bashrc File; Step 2-1-10. Running the Namenode Format; Step 2-1-11. Starting Hadoop; Step 2-1-12. Checking the Installation of Hadoop; Step 2-1-13. Stopping the Hadoop Processes; Recipe 2-2. Install Spark on a Single Machine; Problem; Solution; How It Works; Step 2-2-1. Downloading Apache Spark; Step 2-2-2. Extracting the .tgz File of Spark; Step 2-2-3. Moving the Extracted Spark Directory to /allBigData; Step 2-2-4. Changing the Spark Environment File; Step 2-2-5. Amending the .bashrc File
Step 2-2-6. Starting the PySpark ShellRecipe 2-3. Use the PySpark Shell; Problem; Solution; How It Works; Recipe 2-4. Install Hive on a Single Machine; Problem; Solution; How It Works; Step 2-4-1. Downloading Hive; Step 2-4-2. Extracting Hive; Step 2-4-3. Moving the Extracted Hive Directory; Step 2-4-4. Updating hive-site.xml; Step 2-4-5. Updating the .bashrc File; Step 2-4-6. Creating Datawarehouse Directories of Hive; Step 2-4-7. Initiating the Metastore Database; Step 2-4-8. Checking the Hive Installation; Recipe 2-5. Install PostgreSQL; Problem; Solution; How It Works
Step 2-5-1. Installing PostgreSQLStep 2-5-2. Initializing the Database; Step 2-5-3. Enabling and Starting the Database; Recipe 2-6. Configure the Hive Metastore on PostgreSQL; Problem; Solution; How It Works; Step 2-6-1. Downloading the PostgreSQL JDBC Connector; Step 2-6-2. Copying the JDBC Connector to the Hive lib Directory; Step 2-6-3. Connecting to PostgreSQL; Step 2-6-4. Creating the Required User and Database; Step 2-6-5. Populating Data in the pymetastore Database; Step 2-6-6. Granting Permissions; Step 2-6-7. Changing the pg_hba.conf File; Step 2-6-8. Testing Our User
Standalone Cluster ManagerApache Mesos Cluster Manager; YARN Cluster Manager; Introduction to PostgreSQL; Introduction to MongoDB; Introduction to Cassandra; Chapter 2: Installation; Recipe 2-1. Install Hadoop on a Single Machine; Problem; Solution; How It Works; Step 2-1-1. Creating a New CentOS User; Step 2-1-2. Adding a CentOS user to sudo; Step 2-1-3. Installing Java; Step 2-1-4. Creating Password-Less Logging from pysparksqlbook; Step 2-1-5. Downloading Hadoop; Step 2-1-6. Moving Hadoop Binaries to the Installation Directory; Step 2-1-7. Modifying the Hadoop Environment File
Step 2-1-8. Modifying the Hadoop Properties FilesStep 2-1-9. Updating the .bashrc File; Step 2-1-10. Running the Namenode Format; Step 2-1-11. Starting Hadoop; Step 2-1-12. Checking the Installation of Hadoop; Step 2-1-13. Stopping the Hadoop Processes; Recipe 2-2. Install Spark on a Single Machine; Problem; Solution; How It Works; Step 2-2-1. Downloading Apache Spark; Step 2-2-2. Extracting the .tgz File of Spark; Step 2-2-3. Moving the Extracted Spark Directory to /allBigData; Step 2-2-4. Changing the Spark Environment File; Step 2-2-5. Amending the .bashrc File
Step 2-2-6. Starting the PySpark ShellRecipe 2-3. Use the PySpark Shell; Problem; Solution; How It Works; Recipe 2-4. Install Hive on a Single Machine; Problem; Solution; How It Works; Step 2-4-1. Downloading Hive; Step 2-4-2. Extracting Hive; Step 2-4-3. Moving the Extracted Hive Directory; Step 2-4-4. Updating hive-site.xml; Step 2-4-5. Updating the .bashrc File; Step 2-4-6. Creating Datawarehouse Directories of Hive; Step 2-4-7. Initiating the Metastore Database; Step 2-4-8. Checking the Hive Installation; Recipe 2-5. Install PostgreSQL; Problem; Solution; How It Works
Step 2-5-1. Installing PostgreSQLStep 2-5-2. Initializing the Database; Step 2-5-3. Enabling and Starting the Database; Recipe 2-6. Configure the Hive Metastore on PostgreSQL; Problem; Solution; How It Works; Step 2-6-1. Downloading the PostgreSQL JDBC Connector; Step 2-6-2. Copying the JDBC Connector to the Hive lib Directory; Step 2-6-3. Connecting to PostgreSQL; Step 2-6-4. Creating the Required User and Database; Step 2-6-5. Populating Data in the pymetastore Database; Step 2-6-6. Granting Permissions; Step 2-6-7. Changing the pg_hba.conf File; Step 2-6-8. Testing Our User