On 1st September Gilbert Strang from MIT gave a presentation on ‘Shallow Thoughts about Deep Learning’;
By adding more layers (more depth), machine learning has taken an enormous leap forward in its expressive power. The results are easier to believe than to understand. We hope to use the tensorflow website to illustrate the approximation power of deep learning. A frequently used algorithm to choose coefficients in order to approximate training data is stochastic gradient descent. Its major cost is computation of the gradient (with many many variables). Backpropagation is a successful reinvention of reverse AD (Automatic Differentiation using the adjoint operator). This talk is purely expository, about important question that are waiting for answers.
The talk was given as part of a larger seminar to celebrate the 70th birthday of Cambridge Mathematician Arieh Iserles, https://sites.google.com/view/ai70/home