Linked e-resources

Details

Intro
Table of Contents
About the Author
About the Technical Reviewers
Acknowledgments
Introduction
Chapter 1: Introduction to Apache Spark
Overview
History
Spark Core Concepts and Architecture
Spark Cluster and Resource Management System
Spark Applications
Spark Drivers and Executors
Spark Unified Stack
Spark Core
Spark SQL
Spark Structured Streaming
Spark MLlib
Spark GraphX
SparkR
Apache Spark 3.0
Adaptive Query Execution Framework
Dynamic Partition Pruning (DPP)
Accelerator-aware Scheduler
Apache Spark Applications

Spark Example Applications
Apache Spark Ecosystem
Delta Lake
Koalas
MLflow
Summary
Chapter 2: Working with Apache Spark
Downloading and Installation
Downloading Spark
Installing Spark
Spark Scala Shell
Spark Python Shell
Having Fun with the Spark Scala Shell
Useful Spark Scala Shell Command and Tips
Basic Interactions with Scala and Spark
Basic Interactions with Scala
Spark UI and Basic Interactions with Spark
Spark UI
Basic Interactions with Spark
Introduction to Collaborative Notebooks
Create a Cluster
Create a Folder

Create a Notebook
Setting up Spark Source Code
Summary
Chapter 3: Spark SQL: Foundation
Understanding RDD
Introduction to the DataFrame API
Creating a DataFrame
Creating a DataFrame from RDD
Creating a DataFrame from a Range of Numbers
Creating a DataFrame from Data Sources
Creating a DataFrame by Reading Text Files
Creating a DataFrame by Reading CSV Files
Creating a DataFrame by Reading JSON Files
Creating a DataFrame by Reading Parquet Files
Creating a DataFrame by Reading ORC Files
Creating a DataFrame from JDBC

Working with Structured Operations
Working with Columns
Working with Structured Transformations
select(columns)
selectExpr(expressions)
filler(condition), where(condition)
distinct, dropDuplicates
sort(columns), orderBy(columns)
limit(n)
union(otherDataFrame)
withColumn(colName, column)
withColumnRenamed(existingColName, newColName)
drop(columnName1, columnName2)
sample(fraction), sample(fraction, seed), sample(fraction, seed, withReplacement)
randomSplit(weights)
Working with Missing or Bad Data
Working with Structured Actions

Describe(columnNames)
Introduction to Datasets
Creating Datasets
Working with Datasets
Using SQL in Spark SQL
Running SQL in Spark
Writing Data Out to Storage Systems
The Trio: DataFrame, Dataset, and SQL
DataFrame Persistence
Summary
Chapter 4: Spark SQL: Advanced
Aggregations
Aggregation Functions
Common Aggregation Functions
count(col)
countDistinct(col)
min(col), max(col)
sum(col)
sumDistinct(col)
avg(col)
skewness(col), kurtosis(col)
variance(col), stddev(col)
Aggregation with Grouping
Multiple Aggregations per Group

Browse Subjects

Show more subjects...

Statistics

from
to
Export