Enabling Exploratory DataScience with Spark and R


演讲嘉宾为Hossein Falaki
Hossein Falaki is a software engineer at Databricks working on the next big thing. Prior to that he was a data scientist at Apple’s personal assistant, Siri. He graduated with Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing (CENS).
My Ph.D. research was focused on making mobile phones smarter networked devices when they were used in health applications. My Ph.D. dissertation is available here. As a Master's student at the University of Waterloo, I was a member of the Tetherless Computing Lab, where I worked on the KioskNet Project with Prof. S. Keshav. I also studied scanning strategies for opportunistic communication over Wi-Fi on mobile devices.(first person)



R is a favorite language of many data scientists. In addition to a language and runtime, R is a rich ecosystem of libraries for a wide range of use cases from statistical inference to data visualization. However, handling large datasets with R is challenging, especially when data scientists use R with frameworks or tools written in other languages. In this mode most of the friction is at the interface of R and the other systems. For example, when data is sampled by a big data platform, results need to be transferred to and imported in R as native data structures. In this talk we show how SparkR solves these problems to enable a much smoother experience. In this talk we will present an overview of the SparkR architecture, including how data and control is transferred between R and JVM. This knowledge will help data scientists make better decisions when using SparkR. We will demo and explain some of the existing and supported use cases with real large datasets inside a notebook


在该讲座中,我们展示了SparkR是如何解决这些问题的,以实现更流畅的体验。当中我们将介绍SparkR架构概况,包括如何在R语言和JVM之间传输数据和控制。这些知识将帮助数据科学家在使用SparkR时做出更好的决策。我们将在一个笔记本环境中演示现有的例子。演示将强调Spark cluster、R和交互式笔记本环境,如Jupyter或Databricks,便于对大数据进行探索性分析。


Enabling Exploratory DataScience with Spark and R
使用Spark和R语言进行探索性数据科学 [视频讲解·中文字幕]


 使用Spark和R语言进行探索性数据科学.pdf (1.12 MB) 



本讲座选自Spark Summit Europe 2015


共同关注 数据分析,是技术,也是艺术。

喜欢 | 0


  • 0条回应给“快速使用Spark和R语言进行探索性数据科学”的评论