Core Data Concepts
DP-900 covers fundamental data literacy concepts. Data formats: structured data (rows and columns — relational databases), semi-structured data (flexible schema — JSON, XML, key-value pairs), unstructured data (no schema — images, videos, documents). Data storage: OLTP (Online Transaction Processing — many concurrent small reads/writes, optimised for transactions, low latency — Azure SQL Database, Cosmos DB), OLAP (Online Analytical Processing — large complex queries over historical data, optimised for aggregation — Azure Synapse Analytics, Azure Analysis Services). Data roles: Database Administrator (manage and maintain databases — performance, availability, security), Data Engineer (build data pipelines and storage infrastructure — ETL, data lake design), Data Analyst (query and visualise data to support business decisions — Power BI, SQL queries), Data Scientist (build predictive models — ML, statistics). ETL (Extract, Transform, Load): move data from source to target with transformations — Azure Data Factory orchestrates ETL pipelines.
Azure Relational and Non-Relational Data Services
Relational databases: Azure SQL Database (managed SQL Server, PaaS — elastic scale, built-in HA), Azure SQL Managed Instance (near-100% SQL Server compatibility — for complex migrations), Azure Database for PostgreSQL and MySQL (open-source managed relational databases). Relational concepts: normalisation (eliminate redundancy — 1NF, 2NF, 3NF), primary and foreign keys (enforce referential integrity), ACID transactions (Atomicity, Consistency, Isolation, Durability — guarantee data integrity). Non-relational (NoSQL) databases: Azure Cosmos DB (globally distributed, multiple APIs: NoSQL for documents, MongoDB, Cassandra, Gremlin, Table), Azure Cache for Redis (in-memory key-value), Azure Table Storage (simple NoSQL key-value). NoSQL trade-offs: flexible schema, horizontal scale, high availability — at the cost of reduced consistency guarantees (eventual consistency in distributed scenarios). Azure Blob Storage: for unstructured data (images, videos, documents, backups). Data Lake Storage Gen2: hierarchical namespace over Blob for big data analytics workloads.
Analytics and Visualisation on Azure
Azure analytics services for DP-900: Azure Synapse Analytics (unified analytics platform — SQL, Spark, pipelines, Power BI integration in one workspace), Azure Databricks (Apache Spark-based analytics and ML — collaborative notebooks, MLflow for ML lifecycle management), Power BI (business intelligence and data visualisation — datasets, reports, dashboards, published to Power BI Service). Power BI components: Power BI Desktop (report authoring tool), Power BI Service (cloud publishing and sharing), Power BI Mobile (view on mobile devices). Report vs dashboard: reports are multi-page interactive documents; dashboards are single-page tiles pinned from reports — dashboards give a high-level view. Real-time analytics: Event Hubs ingests streaming data, Stream Analytics processes in-flight data with SQL-like queries, Power BI real-time streaming datasets display live data. Batch analytics: Azure Data Factory orchestrates data movement, Synapse Analytics queries the data, Power BI reports the results.