hillbig Theoretical analysis of deep learning by Dr. Daiji Suzuki, especially on representation capability, generalization capability, and optimization theory. He covers a wide range of important topics, including the latest Neural Tangent Kernel and dual effect. I don’t think there is anything as comprehensive as this in English.
I get an error when I access the original Slideshare, I can see it on X/Twitter, cache?
- Kolmogorov’s addition theorem
- universal approximation
- Ridgelet conversion
- Number of representations and layers
As an easy-to-understand concrete example, in the case of a function whose value is determined by the distance from the origin, four layers would be of polynomial order with respect to the number of dimensions (I think it’s linear, frankly).
-
regenerative nuclear hillbelt space I’m redescribing the kernel ridge regression in terms of the idea of a regenerative nuclear Hilbert space, but I’ll skip that part. Deep learning can be interpreted as learning the kernel function itself in accordance with the data. …
- skip this spot
Approximation performance by function class
- kernel ridge regression
- adaptive method
- deep learning
- sparse estimation
- I guess if you have too many things to prepare in advance, it becomes impractical.
The various function classes mentioned in past discussions are special cases of [Bezov space
→Sparsity.
-
Deep learning is superior when spatial smoothness is non-uniform
-
Non-probabilistic gradient method takes exponential time to get out of the saddle point.
Neural Tangent Kernel Mean Field
This page is auto-translated from /nishio/鈴木大慈-深層学習の数理 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.