"Good introduction to Apache Spark. The trainer was great at talking us through the information, specifically optimisation methods. He spoke slowly and concisely which really got his points across. He effectively tailored the course to our specifications which we also appreciated."
RL, Financial Crime Technologist, Apache Spark, April 2021
Apache Spark Architecture: Distributed Processing
• Distributed Processing: How Apache Spark Runs On A Cluster
• Azure Databricks: How To Create A Cluster
• Databricks Community Edition: How To Create A Cluster
• How does Apache Spark runs on a cluster ?
Apache Spark Architecture: Distributed Data
• Distributed Data: The DataFrame
• How To Define The Structure Of A DataFrame
DataFrame Transformations
• Selecting Columns
• Renaming Columns
• Change Columns data type
How to access columns
• Adding Columns to a DataFrame
• Removing Columns from a DataFrame
• Basics Arithmetic with DataFrame
• Apache Spark Architecture: DataFrame Immutability
• How To Filter A DataFrame
• Apache Spark Architecture: Narrow Transformations
• Dropping Rows
• Handling Null Values Part I - Null Functions
• Handling Null Values Part II - DataFrameNaFunctions
• Sort and Order Rows - Sort & OrderBy
• Create Group of Rows: GroupBy
DataFrame Statistics
• Group and Order
• Joining DataFrames - Inner Join
• Joining DataFrames - Right Outer Join
• Joining DataFrames - Left Outer Join
• Appending Rows to a DataFrame - Union
• Can you Join two DataFrames?
• Caching a DataFrame
• DataFrameWriter Part I
• DataFrameWriter Part II - PartitionBy
• User Defined Functions
• Do you know how to save the result of your work?
Apache Spark Architecture: Execution
• Query Planning
• Execution Hierarchy
• Partioning a DataFrame
• Adaptive Query Execution - An Introduction
• How Apache Spark Runs
Attendees should have the following :
"Good introduction to Apache Spark. The trainer was great at talking us through the information, specifically optimisation methods. He spoke slowly and concisely which really got his points across. He effectively tailored the course to our specifications which we also appreciated."
RL, Financial Crime Technologist, Apache Spark, April 2021
“JBI did a great job of customizing their syllabus to suit our business needs and also bringing our team up to speed on the current best practices. Our teams varied widely in terms of experience and the Instructor handled this particularly well - very impressive”
Brian F, Team Lead, RBS, Data Analysis Course, 20 April 2022
Problem 11 : You have a very complex Excel spreadsheet and you want to reproduce EXACTLY the same spreadsheet in Power BI
Solution: Power BI is not Excel, it works differently and it has different strengths. In order to tackle this issue the best way is going back to the source and try to...
All 20 points are in our latest Newsletter - Delivered directly to your inbox
On our Apache Spark Databricks Certified Developer training course, you will explore the fundamentals of Apache Spark and Delta Lake on Databricks. You will learn the architectural components of Spark, the DataFrame and Structured Streaming APIs, and how Delta Lake can improve your data pipelines. Lastly, you will execute streaming queries to process streaming data and understand the advantages of using Delta Lake.
CONTACT
+44 (0)20 8446 7555
Corporate Policies Terms & Conditions
JB International Training Ltd - Company number 08458005
Registered address Wohl Enterprise Hub 2B Redbourne Avenue London N3 2BS