Linked e-resources
Details
Table of Contents
At a Glance; Contents; About the Author; About the Technical Reviewer; Acknowledgments; Introduction; Part I: Concepts; Chapter 1: Overview: Building Data Analytic Systems with Hadoop; 1.1 A Need for Distributed Analytical Systems; 1.2 The Hadoop Core and a Small Amount of History; 1.3 A Survey of the Hadoop Ecosystem; 1.4 AI Technologies, Cognitive Computing, Deep Learning, and Big Data Analysis; 1.5 Natural Language Processing and BDAs; 1.6 SQL and NoSQL Querying; 1.7 The Necessary Math; 1.8 A Cyclic Process for Designing and Building BDA Systems.
1.9 How The Hadoop Ecosystem Implements Big Data Analysis1.10 The Idea of "Images as Big Data" (IABD); 1.10.1 Programming Languages Used; 1.10.2 Polyglot Components of the Hadoop Ecosystem; 1.10.3 Hadoop Ecosystem Structure; 1.11 A Note about "Software Glue" and Frameworks; 1.12 Apache Lucene, Solr, and All That: Open Source Search Components; 1.13 Architectures for Building Big Data Analytic Systems; 1.14 What You Need to Know; 1.15 Data Visualization and Reporting; 1.15.1 Using the Eclipse IDE as a Development Environment; 1.15.2 What This Book Is Not; 1.16 Summary.
Chapter 2: A Scala and Python Refresher2.1 Motivation: Selecting the Right Language(s) Defines the Application; 2.1.1 Language Features-a Comparison; 2.2 Review of Scala; 2.2.1 Scala and its Interactive Shell; 2.3 Review of Python; 2.4 Troubleshoot, Debug, Profile, and Document; 2.4.1 Debugging Resources in Python; 2.4.2 Documentation of Python; 2.4.3 Debugging Resources in Scala; 2.5 Programming Applications and Example; 2.6 Summary; 2.7 References; Chapter 3: Standard Toolkits for Hadoop and Analytics; 3.1 Libraries, Components, and Toolkits: A Survey.
3.2 Using Deep Learning with the Evaluation System3.3 Use of Spring Framework and Spring Data; 3.4 Numerical and Statistical Libraries: R, Weka, and Others; 3.5 OLAP Techniques in Distributed Systems; 3.6 Hadoop Toolkits for Analysis: Apache Mahout and Friends; 3.7 Visualization in Apache Mahout; 3.8 Apache Spark Libraries and Components; 3.8.1 A Variety of Different Shells to Choose From; 3.8.2 Apache Spark Streaming; 3.8.3 Sparkling Water and H20 Machine Learning; 3.9 Example of Component Use and System Building; 3.10 Packaging, Testing and Documentation of the Example System; 3.11 Summary.
3.12 ReferencesChapter 4: Relational, NoSQL, and Graph Databases; 4.1 Graph Query Languages: Cypher and Gremlin; 4.2 Examples in Cypher; 4.3 Examples in Gremlin; 4.4 Graph Databases: Apache Neo4J; 4.5 Relational Databases and the Hadoop Ecosystem; 4.6 Hadoop and Unified Analytics (UA) Components; 4.7 Summary; 4.8 References; Chapter 5: Data Pipelines and How to Construct Them; 5.1 The Basic Data Pipeline; 5.2 Introduction to Apache Beam; 5.3 Introduction to Apache Falcon; 5.4 Data Sources and Sinks: Using Apache Tika to Construct a Pipeline; 5.5 Computation and Transformation.
1.9 How The Hadoop Ecosystem Implements Big Data Analysis1.10 The Idea of "Images as Big Data" (IABD); 1.10.1 Programming Languages Used; 1.10.2 Polyglot Components of the Hadoop Ecosystem; 1.10.3 Hadoop Ecosystem Structure; 1.11 A Note about "Software Glue" and Frameworks; 1.12 Apache Lucene, Solr, and All That: Open Source Search Components; 1.13 Architectures for Building Big Data Analytic Systems; 1.14 What You Need to Know; 1.15 Data Visualization and Reporting; 1.15.1 Using the Eclipse IDE as a Development Environment; 1.15.2 What This Book Is Not; 1.16 Summary.
Chapter 2: A Scala and Python Refresher2.1 Motivation: Selecting the Right Language(s) Defines the Application; 2.1.1 Language Features-a Comparison; 2.2 Review of Scala; 2.2.1 Scala and its Interactive Shell; 2.3 Review of Python; 2.4 Troubleshoot, Debug, Profile, and Document; 2.4.1 Debugging Resources in Python; 2.4.2 Documentation of Python; 2.4.3 Debugging Resources in Scala; 2.5 Programming Applications and Example; 2.6 Summary; 2.7 References; Chapter 3: Standard Toolkits for Hadoop and Analytics; 3.1 Libraries, Components, and Toolkits: A Survey.
3.2 Using Deep Learning with the Evaluation System3.3 Use of Spring Framework and Spring Data; 3.4 Numerical and Statistical Libraries: R, Weka, and Others; 3.5 OLAP Techniques in Distributed Systems; 3.6 Hadoop Toolkits for Analysis: Apache Mahout and Friends; 3.7 Visualization in Apache Mahout; 3.8 Apache Spark Libraries and Components; 3.8.1 A Variety of Different Shells to Choose From; 3.8.2 Apache Spark Streaming; 3.8.3 Sparkling Water and H20 Machine Learning; 3.9 Example of Component Use and System Building; 3.10 Packaging, Testing and Documentation of the Example System; 3.11 Summary.
3.12 ReferencesChapter 4: Relational, NoSQL, and Graph Databases; 4.1 Graph Query Languages: Cypher and Gremlin; 4.2 Examples in Cypher; 4.3 Examples in Gremlin; 4.4 Graph Databases: Apache Neo4J; 4.5 Relational Databases and the Hadoop Ecosystem; 4.6 Hadoop and Unified Analytics (UA) Components; 4.7 Summary; 4.8 References; Chapter 5: Data Pipelines and How to Construct Them; 5.1 The Basic Data Pipeline; 5.2 Introduction to Apache Beam; 5.3 Introduction to Apache Falcon; 5.4 Data Sources and Sinks: Using Apache Tika to Construct a Pipeline; 5.5 Computation and Transformation.