Data Engineering

Data Engineering

157 bookmarks
Newest
How to trigger a spark job from AWS Lambda
How to trigger a spark job from AWS Lambda
Wondering how to execute a spark job on an AWS EMR cluster, based on a file upload event on S3? Then this post if for you. In this post we go over how to trigger spark jobs on an AWS EMR cluster, using AWS Lambda. The lambda function will execute in response to an S3 upload event. We will go over this event driven pattern with code snippets and set up a fully functioning pipeline.
·startdataengineering.com·
How to trigger a spark job from AWS Lambda
Data Engineering Project: Stream Edition · Start Data Engineering
Data Engineering Project: Stream Edition · Start Data Engineering
Data engineering project for beginners, stream edition. In this post we design and build a simple data streaming pipeline using Apache Kafka, Apache Flink and PostgreSQL DB. We will also review the design and understand some common issues to avoid while building distributed stream processing systems.
·startdataengineering.com·
Data Engineering Project: Stream Edition · Start Data Engineering
Uber's Journey Toward Better Data Culture From First Principles
Uber's Journey Toward Better Data Culture From First Principles
Data powers Uber Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and couriers. At the heart of this massive transportation platform is Big Data and Data Science that powers everything that Uber does, such as better pricing and matching, fraud detection, lowering ETAs, and experimentation. Petabytes of data are collected and processed per day and thousands of users derive insights and make decisions from this data to build/improve these products. Problems beyond scale While we are able to scale our data systems, we previously didn’t focus enough
·eng.uber.com·
Uber's Journey Toward Better Data Culture From First Principles