02 - Design and Develop Data Processing

51 bookmarks

Custom sorting

Use external tables with Synapse SQL - Azure Synapse Analytics

Reading or writing data files with external tables in Synapse SQL

·docs.microsoft.com·Apr 8, 2022

Use external tables with Synapse SQL - Azure Synapse Analytics

Preserve metadata and ACLs using copy activity - Azure Data Factory & Azure Synapse

Learn how to preserve metadata and ACLs when using the copy activity in Azure Data Factory and Synapse Analytics pipelines.

·docs.microsoft.com·Apr 8, 2022

Preserve metadata and ACLs using copy activity - Azure Data Factory & Azure Synapse

Pipeline execution and triggers - Azure Data Factory & Azure Synapse

This article provides information about how to execute a pipeline in Azure Data Factory or Azure Synapse Analytics, either on-demand or by creating a trigger.

·docs.microsoft.com·Apr 4, 2022

Pipeline execution and triggers - Azure Data Factory & Azure Synapse

Copy activity - Azure Data Factory & Azure Synapse

Learn about the Copy activity in Azure Data Factory and Azure Synapse Analytics. You can use it to copy data from a supported source data store to a supported sink data store.

·docs.microsoft.com·Apr 4, 2022

Copy activity - Azure Data Factory & Azure Synapse

Copy data from/to a file system - Azure Data Factory & Azure Synapse

Learn how to copy data from file system to supported sink data stores (or) from supported source data stores to file system using an Azure Data Factory or Azure Synapse Analytics pipelines.

·docs.microsoft.com·Mar 30, 2022

Copy data from/to a file system - Azure Data Factory & Azure Synapse

Query folders and multiple files using serverless SQL pool - Azure Synapse Analytics

Serverless SQL pool supports reading multiple files/folders by using wildcards, which are similar to the wildcards used in Windows OS.

·docs.microsoft.com·Mar 30, 2022

Query folders and multiple files using serverless SQL pool - Azure Synapse Analytics

Shared database - Azure Synapse Analytics

Azure Synapse Analytics provides a shared metadata model where creating a Lake database in an Apache Spark pool will make it accessible from its serverless SQL pool engine.

·docs.microsoft.com·Mar 30, 2022

Shared database - Azure Synapse Analytics

Shared metadata tables - Azure Synapse Analytics

Azure Synapse Analytics provides a shared metadata model where creating a table in serverless Apache Spark pool will make it accessible from serverless SQL pool and dedicated SQL pool without duplicating the data.

·docs.microsoft.com·Mar 30, 2022

Shared metadata tables - Azure Synapse Analytics

Spark Streaming - Different Output modes explained - Spark by {Examples}

This article describes usage and differences between complete, append and update output modes in Apache Spark Streaming. outputMode describes what data is written to a data sink (console, Kafka e.t.c) when there is new data available in streaming input (Kafka, Socket, e.t.c)

Use complete as output mode outputMode("complete") when you want to aggregate the data and output the entire results to sink every time.

It is similar to the complete with one exception; update output mode outputMode("update") just outputs the updated aggregated results every time to data sink when new data arrives.

Use append as output mode outputMode("append") when you want to output only new rows to the output sink.

·sparkbyexamples.com·May 9, 2022

Spark Streaming - Different Output modes explained - Spark by {Examples}

Configure clusters - Azure Databricks

Learn how to configure Azure Databricks clusters, including cluster mode, runtime, instance types, size, pools, autoscaling preferences, termination schedule, Apache Spark options, custom tags, log delivery, and more.

·docs.microsoft.com·May 6, 2022

Configure clusters - Azure Databricks

Continuous integration and delivery - Azure Data Factory

Learn how to use continuous integration and delivery to move Azure Data Factory pipelines from one environment (development, test, production) to another.

·docs.microsoft.com·May 6, 2022

Continuous integration and delivery - Azure Data Factory

Common query patterns in Azure Stream Analytics

This article describes several common query patterns and designs that are useful in Azure Stream Analytics jobs.

·docs.microsoft.com·May 6, 2022

Common query patterns in Azure Stream Analytics

CREATE PARTITION FUNCTION (Transact-SQL) - SQL Server

CREATE PARTITION FUNCTION (Transact-SQL)

Creating a RANGE RIGHT

Creating a RANGE LEFT

·docs.microsoft.com·May 6, 2022

CREATE PARTITION FUNCTION (Transact-SQL) - SQL Server

Geofencing and geospatial aggregation with Azure Stream Analytics

This article describes how to use Azure Stream Analytics for geofencing and geospatial aggregation.

·docs.microsoft.com·May 6, 2022

Geofencing and geospatial aggregation with Azure Stream Analytics

Replicated Tables now generally available in Azure SQL Data Warehouse

We are excited to announce that Replicated Tables are Generally Available in Azure SQL Data Warehouse.A key to performance for large-scale data warehouses is how data is distributed across the system.…

·azure.microsoft.com·May 4, 2022

Replicated Tables now generally available in Azure SQL Data Warehouse

Introduction to Azure Stream Analytics windowing functions

This article describes four windowing functions (tumbling, hopping, sliding, session) that are used in Azure Stream Analytics jobs.

·docs.microsoft.com·Mar 17, 2022

Introduction to Azure Stream Analytics windowing functions

Design a scalable partitioning strategy for Azure Table storage (REST API) - Azure Storage

This article discusses partitioning a table in Azure Table storage and strategies you can use to ensure efficient scalability.

Azure Table storage is designed to store structured data.

PartitionKey: The PartitionKey property stores string values that identify the partition that an entity belongs to.

Table entities represent the units of data that are stored in a table. Table entities are similar to rows in a typical relational database table. Each entity defines a collection of properties. Each property is defined as a key/value pair by its name, value, and the value's data type.

Timestamp: The Timestamp property provides traceability for an entity.

RowKey: The RowKey property stores string values that uniquely identify entities within each partition. The PartitionKey and the RowKey together form the primary key for the entity.

·docs.microsoft.com·Mar 10, 2022

Design a scalable partitioning strategy for Azure Table storage (REST API) - Azure Storage

Create schedule triggers - Azure Data Factory & Azure Synapse

Learn how to create a trigger in Azure Data Factory or Azure Synapse Analytics that runs a pipeline on a schedule.

Specify the start datetime of the trigger for Start Date. It's set to the current datetime in Coordinated Universal Time (UTC) by default.

When creating a schedule trigger, you specify a schedule (start date, recurrence, end date etc.) for the trigger, and associate with a pipeline. Pipelines and triggers have a many-to-many relationship. Multiple triggers can kick off a single pipeline. A single trigger can kick off multiple pipelines.

Specify Recurrence for the trigger. Select one of the values from the drop-down list (Every minute, Hourly, Daily, Weekly, and Monthly). Enter the multiplier in the text box. For example, if you want the trigger to run once for every 15 minutes, you select Every Minute, and enter 15 in the text box.

·docs.microsoft.com·Apr 4, 2022

Create schedule triggers - Azure Data Factory & Azure Synapse

Incrementally copy data using Change Tracking using Azure portal - Azure Data Factory

In this tutorial, you create an Azure Data Factory with a pipeline that loads delta data based on change tracking information in the source database in Azure SQL Database to an Azure blob storage.

In some cases, the changed data within a period in your source data store can be easily to sliced up (for example, LastModifyTime, CreationTime). In some cases, there is no explicit way to identify the delta data from last time you processed the data. The Change Tracking technology supported by data stores such as Azure SQL Database and SQL Server can be used to identify the delta data.

SYS_CHANGE_VERSION

·docs.microsoft.com·Apr 12, 2022

Incrementally copy data using Change Tracking using Azure portal - Azure Data Factory

Create tumbling window trigger dependencies - Azure Data Factory & Azure Synapse

Learn how to create dependency on a tumbling window trigger in Azure Data Factory and Synapse Analytics.

Provide a value in time span format and both negative and positive offsets are allowed. This property is mandatory if the trigger is depending on itself and in all other cases it is optional. Self-dependency should always be a negative offset. If no value specified, the window is the same as the trigger itself.

·docs.microsoft.com·Apr 12, 2022

Create tumbling window trigger dependencies - Azure Data Factory & Azure Synapse

Surrogate key transformation in mapping data flow - Azure Data Factory & Azure Synapse

Learn how to use the mapping data flow Surrogate Key Transformation to generate sequential key values in Azure Data Factory and Synapse Analytics.

Use the surrogate key transformation to add an incrementing key value to each row of data. This is useful when designing dimension tables in a star schema analytical data model. In a star schema, each member in your dimension tables requires a unique key that is a non-business key.

·docs.microsoft.com·Apr 12, 2022

Surrogate key transformation in mapping data flow - Azure Data Factory & Azure Synapse

Sink transformation in mapping data flow - Azure Data Factory & Azure Synapse

Learn how to configure a sink transformation in mapping data flow.

A cache sink is when a data flow writes data into the Spark cache instead of a data store. In mapping data flows, you can reference this data within the same flow many times using a cache lookup. This is useful when you want to reference data as part of an expression but don't want to explicitly join the columns to it. Common examples where a cache sink can help are looking up a max value on a data store and matching error codes to an error message database.

·docs.microsoft.com·Apr 12, 2022

Sink transformation in mapping data flow - Azure Data Factory & Azure Synapse

Assert data transformation in mapping data flow - Azure Data Factory

Set assertions for mapping data flows

The assert transformation enables you to build custom rules inside your mapping data flows for data quality and data validation. You can build rules that will determine whether values meet an expected value domain. Additionally, you can build rules that check for row uniqueness. The assert transformation will help to determine if each row in your data meets a set of criteria. The assert transformation also allows you to set custom error messages when data validation rules are not met.

·docs.microsoft.com·Apr 11, 2022

Assert data transformation in mapping data flow - Azure Data Factory

Alter row transformation in mapping data flow - Azure Data Factory & Azure Synapse

How to update database target using the alter row transformation in the mapping data flow in Azure Data Factory and Azure Synapse Analytics pipelines.

Use the Alter Row transformation to set insert, delete, update, and upsert policies on rows. You can add one-to-many conditions as expressions. These conditions should be specified in order of priority, as each row will be marked with the policy corresponding to the first-matching expression. Each of those conditions can result in a row (or rows) being inserted, updated, deleted, or upserted. Alter Row can produce both DDL & DML actions against your database.

·docs.microsoft.com·Apr 11, 2022

Alter row transformation in mapping data flow - Azure Data Factory & Azure Synapse

Real-time data visualization of data from Azure IoT Hub – Power BI

Use Power BI to visualize temperature and humidity data that is collected from the sensor and sent to your Azure IoT hub.

Create a consumer group on your IoT hub. Create and configure an Azure Stream Analytics job to read temperature telemetry from your consumer group and send it to Power BI. Create a report of the temperature data in Power BI and share it to the web.

·docs.microsoft.com·Apr 10, 2022

Real-time data visualization of data from Azure IoT Hub – Power BI

Copy and transform data in Azure Synapse Analytics - Azure Data Factory & Azure Synapse

Learn how to copy data to and from Azure Synapse Analytics, and transform data in Azure Synapse Analytics by using Data Factory.

·docs.microsoft.com·Apr 10, 2022

Copy and transform data in Azure Synapse Analytics - Azure Data Factory & Azure Synapse

Incrementally copy data using Change Tracking using PowerShell - Azure Data Factory

In this tutorial, you create an Azure Data Factory pipeline that copies delta data incrementally from multiple tables in a SQL Server database to Azure SQL Database.

You perform the following steps in this tutorial: Prepare the source data store Create a data factory. Create linked services. Create source, sink, and change tracking datasets. Create, run, and monitor the full copy pipeline Add or update data in the source table Create, run, and monitor the incremental copy pipeline

·docs.microsoft.com·Apr 10, 2022

Incrementally copy data using Change Tracking using PowerShell - Azure Data Factory

Manage source control of Azure Data Factory solutions - Learn

Manage source control of Azure Data Factory solutions

·docs.microsoft.com·Apr 10, 2022

Manage source control of Azure Data Factory solutions - Learn

Azure Stream Analytics on IoT Edge

Create edge jobs in Azure Stream Analytics and deploy them to devices running Azure IoT Edge.

A cloud part that is responsible for the job definition: users define inputs, output, query, and other settings, such as out of order events, in the cloud.

A module running on your IoT devices. The module contains the Stream Analytics engine and receives the job definition from the cloud.

Supported stream input types are: Edge Hub Event Hub IoT Hub Supported stream output types are: Edge Hub SQL Database Event Hub Blob Storage/ADLS Gen2

For both inputs and outputs, CSV and JSON formats are supported.

Manufacturing safety systems must respond to operational data with ultra-low latency. With Stream Analytics on IoT Edge, you can analyze sensor data in near real-time, and issue commands when you detect anomalies to stop a machine or trigger alerts.

Mission critical systems, such as remote mining equipment, connected vessels, or offshore drilling, need to analyze and react to data even when cloud connectivity is intermittent.

·docs.microsoft.com·Apr 10, 2022

Azure Stream Analytics on IoT Edge

What is Azure IoT Edge

Overview of the Azure IoT Edge service

Azure IoT Edge moves cloud analytics and custom business logic to devices so that your organization can focus on business insights instead of data management.

Azure IoT Edge allows you to deploy complex event processing, machine learning, image recognition, and other high value AI without writing it in-house. A

Installs and update workloads on the device. Maintains Azure IoT Edge security standards on the device. Ensures that IoT Edge modules are always running. Reports module health to the cloud for remote monitoring. Manages communication between downstream leaf devices and an IoT Edge device, between modules on an IoT Edge device, and between an IoT Edge device and the cloud.

·docs.microsoft.com·Apr 10, 2022

What is Azure IoT Edge