Big Data Processing with Apache Spark: Efficiently tackle large datasets and big data analysis with Spark and Python

Big Data Processing with Apache Spark: Efficiently tackle large datasets and big data analysis with Spark and Python

English | October 31, 2018 | ASIN: B07HRTNFZ9 | 142 Pages | AZW3 | 1.89 MB

No need to spend hours ploughing through endless data - let Spark, one of the fastest big data processing engines available, do the hard work for you.
Key FeaturesGet up and running with Apache Spark and PythonIntegrate Spark with AWS for real-time analyticsApply processed data streams to machine learning APIs of Apache Spark

    Book Description
    Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. This book teaches you how to use Spark to make your overall analytical workflow faster and more efficient. You'll explore all core concepts and tools within the Spark ecosystem, such as Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.
    You'll begin by learning data processing fundamentals using Resilient Distributed Datasets (RDDs), SQL, Datasets, and Dataframes APIs. After grasping these fundamentals, you'll move on to using Spark Streaming APIs to consume data in real time from TCP sockets, and integrate Amazon Web Services (AWS) for stream consumption.
    By the end of this book, you'll not only have understood how to use machine learning extensions and structured streams but you'll also be able to apply Spark in your own upcoming big data projects.
    What you will learnWrite your own Python programs that can interact with SparkImplement data stream consumption using Apache SparkRecognize common operations in Spark to process known data streamsIntegrate Spark streaming with Amazon Web Services (AWS)Create a collaborative filtering model with the movielens datasetApply processed data streams to Spark machine learning APIs
      Who this book is for
      Data Processing with Apache Spark is for you if you are a software engineer, architect, or IT professional who wants to explore distributed systems and big data analytics. Although you don't need any knowledge of Spark, prior experience of working with Python is recommended.
      Table of ContentsIntroduction to Spark Distributed ProcessingIntroduction to Spark StreamingSpark Streaming Integration with AWSSpark Streaming, ML, and Windowing Operations


      [Fast Download] Big Data Processing with Apache Spark: Efficiently tackle large datasets and big data analysis with Spark and Python

      Related eBooks:
      BizTalk : Azure Applications
      The Enterprise Big Data Lake
      Machine Learning and Knowledge Discovery in Databases, Part I: European Conference, ECML PKDD 2018,
      Machine Learning and Knowledge Discovery in Database, Part IIIs: European Conference, ECML PKDD 2018
      Database Processing: Fundamentals, Design, and Implementation
      Python: Real World Machine Learning
      Learning Social Media Analytics with R
      Developing a Java Web Application in a Day: Step by step explanations with Eclipse Mars, Tomcat and
      Python and HDF5
      Beginning Oracle Database 11g Administration: From Novice to Professional
      Data Science in Practice
      Oracle Database Cloud Cookbook with Oracle Enterprise Manager 13c Cloud Control
      Copyright Disclaimer:
      This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.