Introduction

The world of data science is evolving so fast that it’s not easy to find realworld use cases that are relevant to what you’re working on. That’s why we’ve collected together these blogs from industry thought leaders with practical use cases you can put to work right now. This how-to reference guide provides everything you need — including code samples — so you can get your hands dirty working with the Databricks platform.

Democratizing Financial Time Series Analysis With Databricks

The role of data scientists, data engineers, and analysts at financial institutions includes (but is not limited to) protecting hundreds of billions of dollars’ worth of assets and protecting investors from trillion-dollar impacts, say from a flash crash. One of the biggest technical challenges underlying these problems is scaling time series manipulation. Tick data, alternative data sets such as geospatial or transactional data, and fundamental economic data are examples of the rich data sources available to financial institutions, all of which are naturally indexed by timestamp. Solving business problems in finance such as risk, fraud and compliance ultimately rests on being able to aggregate and analyze thousands of time series in parallel. Older technologies, which are RDBMS-based, do not easily scale when analyzing trading strategies or conducting regulatory analyses over years of historical data. Moreover, many existing time series technologies use specialized languages instead of standard SQL or Python-based APIs.

Fortunately, Apache Spark™ contains plenty of built-in functionality such as windowing, which naturally parallelizes time-series operations. Moreover, Koalas, an open-source project that allows you to execute distributed machine learning queries via Apache Spark using the familiar pandas syntax, helps extend this power to data scientists and analysts.

To read full download the whitepaper:

The Big Book of Data Science Use Cases

Leave a Reply

Your email address will not be published. Required fields are marked *