Lecture Administration and visualization: Chapter 5.1 - Exploratory data analysis

Lecture "Administration and visualization: Chapter - Exploratory data analysis" provides students with content about: Data science process; Exploratory data analysis (EDA) focus; EDA definition; EDA common questions; . Please refer to the detailed content of the lecture! | 1 Exploratory Data Analysis Learning outcomes Understand key elements in exploratory data analysis EDA Explain and use common summary statistics for EDA Plot and explain common graphs and charts for EDA 3 Motivation Before making inferences from data it is essential to examine all your variables. To understand your data Why To listen to the data to catch mistakes to see patterns in the data to find violations of statistical assumptions to generate hypotheses and because if you don t you will have trouble later 4 Data science process 1. Formulate a question 4. Product 2. Gather data 3. Analyze data 5 Source Foundational Methodology for Data Science IBM 2015 Exploratory data analysis EDA focus The focus is on the data its structure outliers and models suggested by the data. EDA approach makes use of and shows all of the available data. In this sense there is no corresponding loss of information. Summary statistics Visualization Clustering and anomaly detection Dimensionality reduction 6 EDA definition The EDA is precisely not a set of techniques but an attitude philosophy about how a data analysis should be carried out. Helps to select the right tool for preprocessing or analysis Makes use of humans abilities to recognize patterns in data 7 EDA common questions What is a typical value What is the uncertainty for a typical value What is a good distributional fit for a set of numbers Does an engineering modification have an effect Does a factor have an effect What are the most important factors Are measurements coming from different laboratories equivalent What is the best function for relating a response variable to a set of factor variables What are the best settings for factors Can we separate signal from noise in time dependent data Can we extract any structure from multivariate data Does the data have outliers 8 EDA is an iterative process Repeat. Identify and prioritize relevant questions in decreasing order of importance Ask questions Construct graphics to .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.