Đang chuẩn bị liên kết để tải về tài liệu:
Near optimal step size and momentum in gradient descent for quadratic functions

Không đóng trình duyệt đến khi xuất hiện nút TẢI XUỐNG

Many problems in statistical estimation, classification, and regression can be cast as optimization problems. Gradient descent, which is one of the simplest and easy to implement multivariate optimization techniques, lies at the heart of many powerful classes of optimization methods. | Turk J Math (2017) 41: 110 – 121 ¨ ITAK ˙ c TUB ⃝ Turkish Journal of Mathematics http://journals.tubitak.gov.tr/math/ doi:10.3906/mat-1411-51 Research Article Near optimal step size and momentum in gradient descent for quadratic functions 1 Engin TAS ¸ 1,∗, Memmeda˘ ga MEMMEDLI˙ 2 Department of Statistics, Faculty of Science and Literature, Afyon Kocatepe University, Afyonkarahisar, Turkey 2 Department of Statistics, Faculty of Science, T.C. Anadolu University, Eski¸sehir, Turkey Received: 22.11.2014 • Accepted/Published Online: 18.03.2016 • Final Version: 16.01.2017 Abstract: Many problems in statistical estimation, classification, and regression can be cast as optimization problems. Gradient descent, which is one of the simplest and easy to implement multivariate optimization techniques, lies at the heart of many powerful classes of optimization methods. However, its major disadvantage is the slower rate of convergence with respect to the other more sophisticated algorithms. In order to improve the convergence speed of gradient descent, we simultaneously determine near-optimal scalar step size and momentum factor for gradient descent in a deterministic quadratic bowl from the largest and smallest eigenvalues of the Hessian. The resulting algorithm is demonstrated on specific and randomly generated test problems and it converges faster than any previous batch gradient descent method. Key words: Gradient descent, step size, momentum, convergence speed, stability 1. Introduction In domains like statistics, finance, bioinformatics, information retrieval, collaborative filtering, and social network analysis, learning tasks such as regression, classification, and ranking start with a loss function that measures the error between the prediction of the model and the actual output value. An empirical risk function is then defined over a training data to estimate this loss accordingly. Consider, for example, least-squares regression; we seek the plane that .

Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.