Data Engineering

157 bookmarks

Newest

How to trigger a spark job from AWS Lambda

Wondering how to execute a spark job on an AWS EMR cluster, based on a file upload event on S3? Then this post if for you. In this post we go over how to trigger spark jobs on an AWS EMR cluster, using AWS Lambda. The lambda function will execute in response to an S3 upload event. We will go over this event driven pattern with code snippets and set up a fully functioning pipeline.

Orchestration

·startdataengineering.com·Apr 8, 2021

How to trigger a spark job from AWS Lambda

Data Engineering Project: Stream Edition · Start Data Engineering

Data engineering project for beginners, stream edition. In this post we design and build a simple data streaming pipeline using Apache Kafka, Apache Flink and PostgreSQL DB. We will also review the design and understand some common issues to avoid while building distributed stream processing systems.

Orchestration

·startdataengineering.com·Mar 29, 2021

Data Engineering Project: Stream Edition · Start Data Engineering

Become a Data Engineer with this Complete List of Resources

Want to know how to become a data engineer? Here is a list of resources, certifications and other important links that will help you to get started with it.

Roadmaps

·analyticsvidhya.com·Mar 24, 2021

Become a Data Engineer with this Complete List of Resources

What Skills Do You Need to Become a Data Engineer?

What skills do you need to become a data engineer? Learn how to grow your data engineer skillset with this introductory guide.

Roadmaps

·springboard.com·Mar 24, 2021

What Skills Do You Need to Become a Data Engineer?

Uber's Journey Toward Better Data Culture From First Principles

Data powers Uber Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and couriers. At the heart of this massive transportation platform is Big Data and Data Science that powers everything that Uber does, such as better pricing and matching, fraud detection, lowering ETAs, and experimentation. Petabytes of data are collected and processed per day and thousands of users derive insights and make decisions from this data to build/improve these products. Problems beyond scale While we are able to scale our data systems, we previously didn’t focus enough

Architecture