001448190 000__ 05007cam\a2200529\a\4500 001448190 001__ 1448190 001448190 003__ OCoLC 001448190 005__ 20230310004226.0 001448190 006__ m\\\\\o\\d\\\\\\\\ 001448190 007__ cr\un\nnnunnun 001448190 008__ 220718s2022\\\\xxu\\\\\o\\\\\001\0\eng\d 001448190 019__ $$a1336590483$$a1336990616 001448190 020__ $$a9781484282335$$q(electronic bk.) 001448190 020__ $$a1484282337$$q(electronic bk.) 001448190 020__ $$z1484282329 001448190 020__ $$z9781484282328 001448190 0247_ $$a10.1007/978-1-4842-8233-5$$2doi 001448190 035__ $$aSP(OCoLC)1336459705 001448190 040__ $$aYDX$$beng$$cYDX$$dORMDA$$dGW5XE$$dEZ9$$dEBLCP$$dOCLCF$$dN$T$$dOCLCQ 001448190 049__ $$aISEA 001448190 050_4 $$aTK5105.88813 001448190 08204 $$a006.7/6$$223/eng/20220719 001448190 1001_ $$aL'Esteve, Ron. 001448190 24514 $$aThe Azure data lakehouse toolkit :$$bbuilding and scaling data lakehouses with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake /$$cRon L'Esteve. 001448190 260__ $$a[United States] :$$bApress,$$c2022. 001448190 300__ $$a1 online resource 001448190 336__ $$atext$$btxt$$2rdacontent 001448190 337__ $$acomputer$$bc$$2rdamedia 001448190 338__ $$aonline resource$$bcr$$2rdacarrier 001448190 347__ $$atext file 001448190 347__ $$bPDF 001448190 500__ $$aIncludes index. 001448190 5050_ $$aPart I: Getting Started -- Chapter 1: The Data Lakehouse Paradigm -- Part II: Data Platforms -- Chapter 2: Snowflake -- Chapter 3: Databricks -- Chapter 4: Synapse Analytics -- Part III: Apache Spark ELT -- Chapter 5: Pipelines and Jobs -- Chapter 6: Notebook Code -- Part IV: Delta Lake.-Chapter 7: Schema Evolution -- Chapter 8: Change Feed -- Chapter 9: Clones -- Chapter 10: Live Tables -- Chapter 11: Sharing -- Part V: Optimizing Performance -- Chapter 12: Dynamic Partition Pruning for Querying Star Schemas -- Chapter 13: Z-Ordering & Data Skipping -- Chapter 14: Adaptive Query Execution -- Chapter 15: Bloom Filter Index -- Chapter 16: Hyperspace -- Part VI: Advanced Capabilities -- Chapter 17: Auto Loader -- Chapter 18: Python Wheels -- Chapter 19: Security & Controls. 001448190 506__ $$aAccess limited to authorized users. 001448190 520__ $$aDesign and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease. The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled storage and compute capabilities. Extensive coverage of Delta Lake ensures that you are aware of and can benefit from all that this new, open source storage layer can offer. In addition to the deep examples on Databricks in the book, there is coverage of alternative platforms such as Synapse Analytics and Snowflake so that you can make the right platform choice for your needs. After reading this book, you will be able to implement Delta Lake capabilities, including Schema Evolution, Change Feed, Live Tables, Sharing, and Clones to enable better business intelligence and advanced analytics on your data within the Azure Data Platform. What You Will Learn Implement the Data Lakehouse Paradigm on Microsoft's Azure cloud platform Benefit from the new Delta Lake open-source storage layer for data lakehouses Take advantage of schema evolution, change feeds, live tables, and more Write functional PySpark code for data lakehouse ELT jobs Optimize Apache Spark performance through partitioning, indexing, and other tuning options Choose between alternatives such as Databricks, Synapse Analytics, and Snowflake Who This Book Is For Data, analytics, and AI professionals at all levels, including data architect and data engineer practitioners. Also for data professionals seeking patterns of success by which to remain relevant as they learn to build scalable data lakehouses for their organizations and customers who are migrating into the modern Azure Data Platform. 001448190 650_0 $$aMicrosoft Azure (Computing platform) 001448190 650_0 $$aCloud computing. 001448190 650_0 $$aElectronic data processing. 001448190 650_0 $$aDatabases. 001448190 655_0 $$aElectronic books. 001448190 77608 $$iPrint version:$$z1484282329$$z9781484282328$$w(OCoLC)1310397015 001448190 852__ $$bebk 001448190 85640 $$3Springer Nature$$uhttps://univsouthin.idm.oclc.org/login?url=https://link.springer.com/10.1007/978-1-4842-8233-5$$zOnline Access$$91397441.1 001448190 909CO $$ooai:library.usi.edu:1448190$$pGLOBAL_SET 001448190 980__ $$aBIB 001448190 980__ $$aEBOOK 001448190 982__ $$aEbook 001448190 983__ $$aOnline 001448190 994__ $$a92$$bISE