Pro Hadoop data analytics :: designing and building big data systems using the Hadoop ecosystem /: Kerry Koitzsch.

Koitzsch, Kerry.

doi:10.1007/978-1-4842-1910-2

Pro Hadoop data analytics : designing and building big data systems using the Hadoop ecosystem / Kerry Koitzsch.

Koitzsch, Kerry.

2017

QA76.9.D3

Available Online

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Linked e-resources

Linked Resource

Online Access

Concurrent users

Unlimited

Authorized users

Document Delivery Supplied

Can lend chapters, not whole ebooks

Details

Title

Pro Hadoop data analytics : designing and building big data systems using the Hadoop ecosystem / Kerry Koitzsch.

Author

Koitzsch, Kerry.

ISBN

9781484219102 (electronic book)
1484219104 (electronic book)
9781484219096
1484219090

DOI

https://doi.org/10.1007/978-1-4842-1910-2

Publication Details

[United States] : Apress, 2017.

Language

English

Description

1 online resource

Item Number

10.1007/978-1-4842-1910-2 doi

Call Number

QA76.9.D3

Dewey Decimal Classification

005.74

Summary

Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics of classification, clustering, and recommendation. In Pro Hadoop Data Analytics best practices are emphasized to ensure coherent, efficient development. A complete example system will be developed using standard third-party components which will consist of the toolkits, libraries, visualization and reporting code, as well as support glue to provide a working and extensible end-to-end system. The book emphasizes four important topics: The importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results. Deep-dive topics will include Spark, H20, Vopal Wabbit (NLP), Stanford NLP, and other appropriate toolkits and plugins. Best practices and structured design principles. This will include strategic topics as well as the how to example portions. The importance of mix-and-match or hybrid systems, using different analytical components in one application to accomplish application goals. The hybrid approach will be prominent in the examples. Use of existing third-party libraries is key to effective development. Deep dive examples of the functionality of some of these toolkits will be showcased as you develop the example system.

Note

Includes index.

Access Note

Access limited to authorized users.

Digital File Characteristics

text file PDF

Source of Description

Description based on print version record.

Available in Other Form

Pro Hadoop data analytics.

Linked Resources

Online Access

Record Appears in

Online Resources > Ebooks
All Resources

At a Glance; Contents; About the Author; About the Technical Reviewer; Acknowledgments; Introduction; Part I: Concepts; Chapter 1: Overview: Building Data Analytic Systems with Hadoop; 1.1 A Need for Distributed Analytical Systems; 1.2 The Hadoop Core and a Small Amount of History; 1.3 A Survey of the Hadoop Ecosystem; 1.4 AI Technologies, Cognitive Computing, Deep Learning, and Big Data Analysis; 1.5 Natural Language Processing and BDAs; 1.6 SQL and NoSQL Querying; 1.7 The Necessary Math; 1.8 A Cyclic Process for Designing and Building BDA Systems.

1.9 How The Hadoop Ecosystem Implements Big Data Analysis1.10 The Idea of "Images as Big Data" (IABD); 1.10.1 Programming Languages Used; 1.10.2 Polyglot Components of the Hadoop Ecosystem; 1.10.3 Hadoop Ecosystem Structure; 1.11 A Note about "Software Glue" and Frameworks; 1.12 Apache Lucene, Solr, and All That: Open Source Search Components; 1.13 Architectures for Building Big Data Analytic Systems; 1.14 What You Need to Know; 1.15 Data Visualization and Reporting; 1.15.1 Using the Eclipse IDE as a Development Environment; 1.15.2 What This Book Is Not; 1.16 Summary.

Chapter 2: A Scala and Python Refresher2.1 Motivation: Selecting the Right Language(s) Defines the Application; 2.1.1 Language Features-a Comparison; 2.2 Review of Scala; 2.2.1 Scala and its Interactive Shell; 2.3 Review of Python; 2.4 Troubleshoot, Debug, Profile, and Document; 2.4.1 Debugging Resources in Python; 2.4.2 Documentation of Python; 2.4.3 Debugging Resources in Scala; 2.5 Programming Applications and Example; 2.6 Summary; 2.7 References; Chapter 3: Standard Toolkits for Hadoop and Analytics; 3.1 Libraries, Components, and Toolkits: A Survey.

3.2 Using Deep Learning with the Evaluation System3.3 Use of Spring Framework and Spring Data; 3.4 Numerical and Statistical Libraries: R, Weka, and Others; 3.5 OLAP Techniques in Distributed Systems; 3.6 Hadoop Toolkits for Analysis: Apache Mahout and Friends; 3.7 Visualization in Apache Mahout; 3.8 Apache Spark Libraries and Components; 3.8.1 A Variety of Different Shells to Choose From; 3.8.2 Apache Spark Streaming; 3.8.3 Sparkling Water and H20 Machine Learning; 3.9 Example of Component Use and System Building; 3.10 Packaging, Testing and Documentation of the Example System; 3.11 Summary.

3.12 ReferencesChapter 4: Relational, NoSQL, and Graph Databases; 4.1 Graph Query Languages: Cypher and Gremlin; 4.2 Examples in Cypher; 4.3 Examples in Gremlin; 4.4 Graph Databases: Apache Neo4J; 4.5 Relational Databases and the Hadoop Ecosystem; 4.6 Hadoop and Unified Analytics (UA) Components; 4.7 Summary; 4.8 References; Chapter 5: Data Pipelines and How to Construct Them; 5.1 The Basic Data Pipeline; 5.2 Introduction to Apache Beam; 5.3 Introduction to Apache Falcon; 5.4 Data Sources and Sinks: Using Apache Tika to Construct a Pipeline; 5.5 Computation and Transformation.

Browse Subjects

Show more subjects...

Pro Hadoop data analytics : designing and building big data systems using the Hadoop ecosystem / Kerry Koitzsch.

Linked e-resources

Details

Table of Contents

Browse Subjects

Statistics