Microsoft Modern Datawarehouse Architecture #Meetup – Les

22 Slides6.23 MB

Microsoft Modern Datawarehouse Architecture #Meetup - Les gentils développeurs Data Platform Sauget Charles-Henri - MVP Data Platform Ben Zahra Anouar – MVP AI 28/12/2019

SAUGET Charles-Henri Consultant Data Platform depuis 2009 MAIL TWITTER [email protected] @SaugetCh GITHUB BLOG https://github.com/chsauget www.sauget-ch.fr WWW.INSIDERS.COOP

Anouar BEN ZAHRA Consultant Data Platform MAIL TWITTER [email protected] @Anouarbenzahra GITHUB NUGET PACKAGE https://github.com/AnouarBe https://www.nuget.org/pro nZahra files/Anouar WWW.INSIDERS.COOP

Data Evolution

Data Abundance Variety - Support large types of datas: Structured (Tables) Semi-Structured (Json) Unstructured (Images) Volume Small data ( 10gb) Medium range data (10gb to 1 tb) Big Data ( 1tb to hundred of petabyte) Velocity Capacity to handle a large throughput (Gb/s ?) Elasticity of the architectures

Structured vs unstructured Data

On-Premise vs Cloud Computing Environmen t Licensing Model Maintainabilit y Scalability Availability

Data Warehousing evolution

Data Engineering job responsibilities New skills for new platforms The technology changed So does the job! Changing loading approaches From implementin g to provisioning

Modern Data Warehouse Architecture Separation of Storage & Compute https://azure.microsoft.com/fr-fr/solutions/architecture/modern-data-warehouse/

DEMO – Modern DWH Lake gen2 Comments.xml (20 go) Lake gen2 Parquet Files Azure Synapse Power BI Azure AS Users.xml (3 go) Azure Data Factory

Our demo architecture When you need a low cost, high throughput data store. When you need to store No-SQL data. When you do not need to query the data directly. No ad hoc query support. Suits the storage of archive or relatively static data. Suits acting as a HDInsight Hadoop data store. When you need a low cost, high throughput data store. Unlimited storage for No-SQL data When you do not need to query the data directly. No ad hoc query support. Suits the storage of archive or relatively static data. Suits acting as a Databricks , HDInsight and IoT data store. Eases the deployment of a Spark based cluster. Enables the fastest processing of Machine Learning solutions. Enables collaboration between data engineers and data scientists. Provides tight enterprise security integration with Azure Active Directory Integration with other Azure Services and Power BI. When When When When When you you you you you require a relational data store. need to manage transactional workloads need to manage a high volume on inserts and reads need a service that requires high concurrency require a solution that can scale elastically When When When When When you you you you you require a relational data store. need to manage analytical workloads need low cost storage. require the ability to pause and restart the compute. require a solution that can scale elastically

Our demo architecture When you require a fully managed event processing engine. When you require temporal analysis of streaming data. Support for analyzing IoT streaming data. Support for analyzing application data through Event Hubs. Ease of use with a Stream Analytics Query Language. When you want to orchestrate the batch movement of data. When you want to connect to wide range of data platforms. When you want to transform or enrich the data in movement. When you want to integrate with SSIS packages. Enables verbose logging of data processing activities. When you require documentation of your data stores. When you require a multi user approach to documentation. When you need to annotate data sources with descriptive metadata. A fully managed cloud service whose users can discover the data sources. When you require a solution that can help business users understand their data.

DEMO –Streaming Lake gen2 Lake gen2 Comments.xml (20 go) Parquet Files Azure Synapse Power BI Azure AS Users.xml (3 go) Avro Files Twitter data #Azure #SynapseAnalytics Event Hub Stream Analytics Azure Data Factory Power BI Stream

DEMO –Streaming Lake gen2 Power BI Direct Query Comments.xml (20 go) Lake gen2 Parquet Files Azure Synapse Power BI Azure AS Users.xml (3 go) Avro Files Twitter data #Azure #SynapseAnalytics Event Hub Stream Analytics Azure Data Factory Power BI Stream

Power BI Premium vs Azure AS ? https://powerbi.microsoft.com/en-us/blog/power-bi-premium-and-azure-analysis-services/

Synapse future Data Develop Orchestrate Monitor

SQL 2019 – Big Data Cluster Same capabilities than previous architecture but on-premise!

SQL 2019 – Big Data Cluster Supported by K8S infrastructure!

CI/CD

Back to top button