Bài giảng "Tối ưu hóa nâng cao - Chương 5: Gradient descent" cung cấp cho người học các kiến thức: Gradient descent, gradient descent interpretation, fixed step size, backtracking line search,. . | Bài giảng Tối ưu hóa nâng cao: Chương 5 - Hoàng Nam Dũng Gradient Descent Hoàng Nam Dũng Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội Gradient descent Consider unconstrained, smooth convex optimization min f (x) x with convex and differentiable function f : Rn → R. Denote the optimal value by f ∗ = minx f (x) and a solution by x ∗ . 1 Gradient descent Consider unconstrained, smooth convex optimization min f (x) x with convex and differentiable function f : Rn → R. Denote the optimal value by f ∗ = minx f (x) and a solution by x ∗ . Gradient descent: choose initial point x (0) ∈ Rn , repeat: x (k) = x (k−1) − tk · ∇f (x (k−1) ), k = 1, 2, 3, . . . Stop at some point. 1 ● ● ● ● ● 4 2 ● ● ● ●● 53 Gradient descent interpretation At each iteration, consider the expansion 1 2 f (y ) ≈ f (x) + ∇f (x)T (y − x) + ky − xk2 . 2t Quadratic approximation, replacing usual Hessian ∇2 f (x) by 1t I . f (x) + ∇f (x)T (y − x) linear approximation to f 1 2 2t ky − xk 2 proximity term to x, with weight 1/2t Choose next point y = x + to minimize quadratic approximation x + = x − t∇f (x). 4 Gradient descent interpretation ● ● Blue point Blue pointisisx,x, redred point is is point ∗ T x = argminy f (x) + ∇f (x) (y − x) + 1 ky −1 xk22 + T ky − xk22 2t x = argmin f (x) + ∇f (x) (y − x) + y 2t 5 Outline I How to choose step sizes I Convergence analysis I Nonconvex functions I Gradient boosting 6 Fixed step size Fixed step size Simply take ttkk ==t tfor Simply take forallallk k==1,1, 3, .3,. .,. .can 2, 2, diverge ., can if t is diverge if too t is big. too big. 2 2 2 2 Consider f (x) = (10x Consider f (x) = (10x + x )/2, gradient descent after 1 1 +2 x2 )/2, gradient descent after 8 steps: 8 steps: 20 ● 10 ● ● * 0 −10 −20 −20 −10 0 10 20 7 9 Fixed step size Can be Can