Lecture Administration and visualization: Chapter 3.3 - Data lake

Lecture "Administration and visualization: Chapter - Data lake" provides students with content about: Traditional business analytics process; Architecture for data lake; Software component; . Please refer to the detailed content of the lecture! | 1 Chapter 3 Data lake 2 Outline Definition Architecture for data lake Software component 3 Traditional business analytics process 1. Start with end-user requirements to identify desired reports and analysis 2. Define corresponding database schema and queries 3. Identify the required data sources 4. Create a Extract-Transform-Load ETL pipeline to extract required data curation and transform it to target schema schema-on- write 5. Create reports analyze data Dedicated ETL tools . SSIS Relational Queries ETL pipeline Results LOB Applications Defined schema All data not immediately required is discarded or archived 4 Two approaches to information management for analytics Top-down and bottom-up Top-down How can we make it happen deductive Prescriptive What will analytics happen Theory Predictive Theory analytics Hypothesis Why did T ION Hypothesis it happen IZA TIM Pattern What Diagnostic OP analytics Observation happened Observation Descriptive Confirmation analytics ION INF O RM AT Bottom-up inductive Data warehousing uses a top-down approach Understand Gather Implement data warehouse corporate requirements Reporting and strategy Reporting and analytics analytics design Business development requirements Dimension modeling Physical design ETL design ETL development Technical requirements Data sources Set up infrastructure Install and tune The data lake uses a bottom-up approach Ingest all data Store all data Do analysis regardless of requirements in native format without using analytic engines like schema definition Hadoop Devices Batch queries Interactive queries Real-time analytics Machine Learning Data warehouse New big data thinking All data has value All data has potential value Data hoarding No defined schema stored in native format Schema is imposed and transformations are done at query time schema-on-read . Apps and users interpret the data as they see fit Iterate Gather data Store indefinitely Analyze See results from all sources 8 Defining the Data Lake A

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU MỚI ĐĂNG
20    73    2    18-06-2024
8    92    1    18-06-2024
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.