Many machine learning problems can be cast as optimization problems. This lecture introduces optimization. The objective is for you to learn: The definitions of gradient and Hessian; the gradient descent algorithm; Newton’s algorithm; stochastic gradient descent (SGD) for online learning; popular variants, such as AdaGrad and Asynchronous SGD;.