AWS and Elastic Map Reduce (EMR) Netflix

Why Use Elastic MapReduce (EMR)?

EMR enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.

By reducing the cost and complexity of analyzing huge data sets, EMR also enables greater experimentation and innovation

Case Study: Netflix

50 billion daily events coming from netflix-enabled televisions, mobile devices, and laptops. How do you collected and store all of that data?

Netflix streams 8 TB of data into the cloud per day. This is collected, aggregated, and pushed to Amazon S3 via a fleet of EC2 servers running Apache Chukwa.

The processed data is then streamed back into Amazon S3 where it is accessible by other teams including personalisation/recommendation services.

The processed data is then streamed back into Amazon S3 where it is accessible by other teams including personalization/recommendation services and to analysts through a real-time custom visualization tool called Sting

For Netflix, they can run their prod cluster with 300 nodes during the day…

And expand it to 400+ on the evening and weekend. Also using EMR’s alarming capabilities, this can be setup to be done automatically based on the load on the cluster.

And for jobs with specific hardware or capacity requirements, analysts can spin-up their own query clusters, again streaming from the same data source.

Leave a Reply

Your email address will not be published. Required fields are marked *