Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.

Description

Skimlinks, long-time supporter of the Spark London meetup, report back on their year-long experiment with Spark. Martin Goodson will give an overview of Skimlinks' experience with Spark and give his verdict from the Data Scientist’s perspective. Maria Mestre will talk about how some key components of a large-scale machine learning pipeline - from feature extraction to labelling data - can be done using PySpark and scikit-learn. Sahan Bulathwela will detail the construction of a big data product using PySpark. He will outline the journey from having to spend days on one-time statistical analyses to being able to run hundreds of analyses on 30TB+ datasets on a daily basis.

Questions and Answers

You need to be logged in to be able to post here.