Hands-on with Apache Spark for Beginners
Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. Spark is a popular tool used by data scientists, and is associated with higher salaries according to O'Reilly.*
For this workshop, we are fortunate to have consultant, Akmal Chaudhri visiting from the United Kingdom to facilitate a hands-on workshop for those embarking on their data science journeys. In this workshop, attendees will use Apache Spark to undertake some simple calculations and solve some data manipulation problems. Through Python programming exercises, attendees will be able to get some hands-on experience with Spark using a cloud-based environment. The goal is to show the power of Spark, without needing to understand its complexity.
Following the workshop, we will have a use case with Spark shared by Ghazal Ghalebandi, Software Research Engineer at Jobstreet (SEEK Asia).
Workshop restricted to 25 pax only.
Attendees should have some Python programming knowledge or experience. If you are already quite familiar with Spark, have previously taken some Spark training or have taken the edX course CS105x Introduction to Apache Spark, then congratulations, you do not need to attend this workshop.
Please scroll to the bottom to see the preparations you need to do before the workshop.
6pm - 7pm Dinner and Networking
7pm - 9pm Hands-on Workshop
9pm - 9:30pm Spark Case Study Presentation, Seek Asia
9:30pm - 10pm QnA and Networking
Akmal Chaudhri is an Independent Consultant, specializing in Big Data, NoSQL and NewSQL database technologies. His current interests also include Apache Spark, Machine Learning, Data Science and how to become a Data Scientist. He has over 25 years experience in IT and has previously held roles as a developer, consultant, product strategist and technical trainer. He has worked for several blue-chip companies, such as Reuters and IBM and also the Big Data startups Hortonworks (Hadoop) and DataStax (Cassandra NoSQL Database). He has regularly presented at many international conferences and served on the program committees for a number of major conferences and workshops. He has published and presented widely and edited or co-edited 10 books. He holds a BSc (1st Class Hons.) in Computing and Information Systems, MSc in Business Systems Analysis and Design and a PhD in Computer Science. He is a Member of the British Computer Society (MBCS) and a Chartered IT Professional (CITP).
Ghazal Ghalebandi is a Software Research Engineer at Jobstreet (SEEK Asia). Her role is developing and building machine learning models to integrate intelligence into products. She is also PhD candidate at Faculty Computer Science & IT, University Malaya. As a leading recruitment company in the South East Asia region, SEEK Asia holds a substantial database of candidate and hirers information. The goal of her research team is to embed intelligence in the matching capabilities of jobstreet.com and jobDB.com services. Ghazal will share some of SEEK Asia research team use cases using Apache Spark. This includes how they are utilizing spark for user behavior monitoring, ETL processes, data manipulation and processing, and machine learning.
Preparation for workshop:
1. Create a free Databricks Community Edition (CE) account by going to:
2. Bring a laptop computer with you. Please do not use a phone or tablet computer. You will access the online Spark environment using a web browser. No software installation on your computer is required. The preferred web browsers are Google Chrome or Mozilla Firefox. Other browsers have not been fully tested with Databricks CE.
3. Log-in to to your free Databricks CE account by going to:
4. Download and save the following files to your laptop computer, as you will use them for the labs:
Setting-up the environment:
Your fee covers:
-Room rental with high-speed internet 100MBPS
-Light dinner and beverages
-Workshop facilitation and presentation
-Certificate of Participation from DataViz My
-Participation prizes from Databricks (T-shirts / Book)
Fee is non-refundable after 22 March 2017
Numbers strictly limited to ensure efficient Wi-Fi use
6:00 PM - 10:00 PM MYT
- The Co
Online SOLD OUT RM100.00 Student SOLD OUT RM50.00
- Venue Address
- 8, Lengkok Abdullah, Bangsar, 59000 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia Malaysia
Several paid parking lots in area; parking at Bangsar LRT; Public transport use encouraged
Big Data Malaysia1,037 Followers