Bleacher Report Engineer
Bleacher Report is one of the most highly used sports sites on the web, and our users generate a massive amount of data everyday. Last year we overhauled our data pipeline to make it easier to gather, store and access this information. At the same time, Elixir added the experimental GenStage behaviour, providing a way of managing back-pressure between consumer and producer processes. In this talk I will describe our pipeline architecture, highlight areas where Elixir offers improvement over our initial choices, and show how we can set up a supervised GenStage application to process data in flight. I will also show how the related GenStage.Flow functions can be used to run ad hoc queries on the data at rest.
- 1) Describe Bleacher Report's data pipeline.
- 2) Show how Elixir's GenStage behaviour can be used with the ex_aws library to move data through the pipeline.
- 3) Show how the producer_consumer and consumer stages can be scaled within a supervision tree.
- 4) Describe performance improvements gained by replacing the existing Ruby component with this Elixir application.
- 5) Demonstrate the benefit of GenStage.Flow for ad hoc queries on large data sets.
Data engineers and Elixir users unfamiliar with the new GenStage behaviour.
Although I originally joined Bleacher Report as a frontend developer, for the last three years I've worked with a variety of backend and database technologies to wrangle the vast amounts of data our users generate every day. Recently that has meant creating Elixir microservices to complement our existing Node.js and Ruby based data pipeline. Prior to joining Bleacher Report I worked on a variety of interactive applications for Bay Area companies. I decided to pursue a software development career after creating programs to analyze data generated from my cell biology research.Twitter: @sillypog