Lecture Administration and visualization: Chapter 5.2 - Feature engineering

Lecture "Administration and visualization: Chapter - Feature engineering" provides students with content about: Feature engineering toolbox; Variable data types; Number variables; Quantization or binning; . Please refer to the detailed content of the lecture! | 1 Feature engineering 2 Feature engineering quot Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models resulting in improved model accuracy on unseen data. quot Jason Brownlee 3 Feature engineering Coming up with features is difficult time-consuming requires expert knowledge. Applied machine learning is basically feature engineering. Andrew Ng 4 The dream . Raw Datas Mod Tas data et el k 5 The Reality Features ML Ready Model Task dataset Raw data Feature engineering toolbox Just kidding Variable data types 8 Number variables 9 Binarization Counts can quickly accumulate without bound convert them into binary values 0 1 to indicate presence 10 Quantization or Binning Group the counts into bins Maps a continuous number to a discrete one Bin size Fixed-width binning Eg. 0 12 years old 12 17 years old 18 24 years old 25 34 years old Adaptive-width binning 11 Equal Width Binning divides the continuous variable into several categories having bins or range of the same width Pros easy to compute Cons large gaps in the counts many empty bins with no data 12 Adaptive-width binning Equal frequency binning Quantiles values that divide the data into equal portions continuous intervals with equal probabilities Some q-quantiles have special names The only 2-quantile is called the median The 4-quantiles are called quartiles Q The 6-quantiles are called sextiles S The 8-quantiles are called octiles The 10-quantiles are called deciles D 13 Example quartiles 14 Log Transformation Original number x Transformed number x log10 x Backtransformed number 10x 15 Box-Cox transformation 16 Feature Scaling or Normalization Models that are smooth functions of the input such as linear regression logistic regression are affected by the scale of the input Feature scaling or normalization changes the scale of the features 17 Min-max scaling Squeezes or stretches all values within the range of 0 1 to add .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU MỚI ĐĂNG
336    76    1    18-06-2024
63    349    2    18-06-2024
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.