设计星素材分享平台 Practice Exams Microsoft Azure DP-203 Data Engineering

收藏
15认知币下载

素材介绍

Practice Exams | Microsoft Azure DP-203 Data Engineering

Be prepared for the Microsoft Azure Exam DP-203: Data Engineering on Microsoft Azure

上次更新 9/2021

英语

https://www.udemy.com/course/practice-exams-microsoft-azure-dp-203-data-engineering/

说明

In order to set realistic expectations, please note: These questions are NOT official questions that you will find on the official exam. These questions DO cover all the material outlined in the knowledge sections below. Many of the questions are based on fictitious scenarios which have questions posed within them.

The official knowledge requirements for the exam are reviewed routinely to ensure that the content has the latest requirements incorporated in the practice questions. Updates to content are often made without prior notification and are subject to change at any time.

Each question has a detailed explanation and links to reference materials to support the answers which ensures accuracy of the problem solutions.

The questions will be shuffled each time you repeat the tests so you will need to know why an answer is correct, not just that the correct answer was item "B" last time you went through the test.

Candidates for this exam should have subject matter expertise integrating, transforming, and consolidating data from various structured and unstructured data systems into a structure that is suitable for building analytics solutions.

Azure Data Engineers help stakeholders understand the data through exploration, and they build and maintain secure and compliant data processing pipelines by using different tools and techniques. These professionals use various Azure data services and languages to store and produce cleansed and enhanced datasets for analysis.

Azure Data Engineers also help ensure that data pipelines and data stores are high-performing, efficient, organized, and reliable, given a set of business requirements and constraints. They deal with unanticipated issues swiftly, and they minimize data loss. They also design, implement, monitor, and optimize data platforms to meet the data pipelines needs.

A candidate for this exam must have strong knowledge of data processing languages such as SQL, Python, or Scala, and they need to understand parallel processing and data architecture patterns.

Skills measured on Microsoft Azure DP-203 Exam

Design and Implement Data Storage (40-45%)

design a data storage structure

design an Azure Data Lake solution

recommend file types for storage

recommend file types for analytical queries

design for efficient querying

design for data pruning

design a folder structure that represents the levels of data transformation

design a distribution strategy

design a data archiving solution

Design a partition strategy

design a partition strategy for files

design a partition strategy for analytical workloads

design a partition strategy for efficiency/performance

design a partition strategy for Azure Synapse Analytics

identify when partitioning is needed in Azure Data Lake Storage Gen2

Design the serving layer

design star schemas

design slowly changing dimensions

design a dimensional hierarchy

design a solution for temporal data

design for incremental loading

design analytical stores

design metastores in Azure Synapse Analytics and Azure Databricks

Implement physical data storage structures

implement compression

implement partitioning

implement sharding

implement different table geometries with Azure Synapse Analytics pools

implement data redundancy

implement distributions

implement data archiving

Implement logical data structures

build a temporal data solution

build a slowly changing dimension

build a logical folder structure

build external tables

implement file and folder structures for efficient querying and data pruning

Implement the serving layer

deliver data in a relational star schema

deliver data in Parquet files

maintain metadata

implement a dimensional hierarchy

Design and Develop Data Processing (25-30%)

Ingest and transform data

transform data by using Apache Spark

transform data by using Transact-SQL

transform data by using Data Factory

transform data by using Azure Synapse Pipelines

transform data by using Stream Analytics

cleanse data

split data

shred JSON

encode and decode data

configure error handling for the transformation

normalize and denormalize values

transform data by using Scala

perform data exploratory analysis

Design and develop a batch processing solution

develop batch processing solutions by using Data Factory, Data Lake, Spark, Azure Synapse Pipelines, PolyBase, and Azure Databricks

create data pipelines

design and implement incremental data loads

design and develop slowly changing dimensions

handle security and compliance requirements

scale resources

configure the batch size

design and create tests for data pipelines

integrate Jupyter/IPython notebooks into a data pipeline

handle duplicate data

handle missing data

handle late-arriving data

upsert data

regress to a previous state

design and configure exception handling

configure batch retention

design a batch processing solution

debug Spark jobs by using the Spark UI

Design and develop a stream processing solution

develop a stream processing solution by using Stream Analytics, Azure Databricks, and Azure Event Hubs

process data by using Spark structured streaming

monitor for performance and functional regressions

design and create windowed aggregates

handle schema drift

process time series data

process across partitions

process within one partition

configure checkpoints/watermarking during processing

scale resources

design and create tests for data pipelines

optimize pipelines for analytical or transactional purposes

handle interruptions

design and configure exception handling

upsert data

replay archived stream data

design a stream processing solution

Manage batches and pipelines

trigger batches

handle failed batch loads

validate batch loads

manage data pipelines in Data Factory/Synapse Pipelines

schedule data pipelines in Data Factory/Synapse Pipelines

implement version control for pipeline artifacts

manage Spark jobs in a pipeline

Design and Implement Data Security (10-15%)

Design security for data policies and standards

design data encryption for data at rest and in transit

design a data auditing strategy

design a data masking strategy

design for data privacy

design a data retention policy

design to purge data based on business requirements

design Azure role-based access control (Azure RBAC) and POSIX-like Access Control List

(ACL) for Data Lake Storage Gen2

design row-level and column-level security

Implement data security

implement data masking

encrypt data at rest and in motion

implement row-level and column-level security

implement Azure RBAC

implement POSIX-like ACLs for Data Lake Storage Gen2

implement a data retention policy

implement a data auditing strategy

manage identities, keys, and secrets across different data platform technologies

implement secure endpoints (private and public)

implement resource tokens in Azure Databricks

load a DataFrame with sensitive information

write encrypted data to tables or Parquet files

manage sensitive information

Monitor and Optimize Data Storage and Data Processing (10-15%)

Monitor data storage and data processing

implement logging used by Azure Monitor

configure monitoring services

measure performance of data movement

monitor and update statistics about data across a system

monitor data pipeline performance

measure query performance

monitor cluster performance

understand custom logging options

schedule and monitor pipeline tests

interpret Azure Monitor metrics and logs

interpret a Spark directed acyclic graph (DAG)

Optimize and troubleshoot data storage and data processing

compact small files

rewrite user-defined functions (UDFs)

handle skew in data

handle data spill

tune shuffle partitions

find shuffling in a pipeline

optimize resource management

tune queries by using indexers

tune queries by using cache

optimize pipelines for analytical or transactional purposes

optimize pipeline for descriptive versus analytical workloads

troubleshoot a failed spark job

troubleshoot a failed pipeline run

The exam is available in the following languages: English

此课程面向哪些人：

Microsoft Azure professionals who want to be Microsoft DP-203 certified

wolves

166
15686937

DJordanMedia The Complete Editing Course Bundle Pack
- 843
- 0
- 0
- 0
Luke Stackpoole – Photography Masterclass – Master The Art Of Photography
- 1059
- 0
- 0
- 0
77组电影外观Log/Rec709视频还原色彩分级调色Lut预设包Pixflow – Colorify Cinematic LUTs
- 1223
- 0
- 0
- 0