# It All Started With AI …and Here We Are

**WhatsApp shared by Ghulam Mustafa, Ph.D, USA:** It started with AI: I started working on ML about 10 years ago as a natural extension of my interest in probability and statistical inference. I learned of the Occam’s razor, that simpler models are preferred over complex which led to bias-variance trade-off and to Kullback-Liebler variational inference.

In deep AI, I learned that the cost is related to energy. we want to minimize cost (and hence energy) via back propagation (and gradient descent), which led to Lagrange optimization -which is the direct outcome of Euler-Lagrange formulation in mechanics.

Most energy based models in AI rely on these concepts and make extensive use of Partition Function to compute the Gibbs measure. All these are founded in Statistical Mechanics. Of particular interest are Spin Glasses like the Ising model or closely related Sherrington-Kirkpatrick, of how local changes give rise to large scale behavior, which led to the mean fields for capturing aggregate behaviors. This is done in the Hamiltonian setting, as Newtonian formulism runs out of gas at this point (and that I have a bone to pick with Isaac).

All related to the thermodynamic equilibrium (equal a priori distribution) and the ideal gas models of Maxwell and Boltzmann, with Gibbs providing statistical frame work, which I learned is a slippery slope, as it didn’t satisfactorily explain the Poincare Recurrence (that, given enough time, system will return to its initial configuration, albeit with zero measure).

First Birkhoff, later von Neumann were able to prove Boltzmann’s Ergodic hypothesis, that for systems in equilibrium, time average equals space average (in thermodynamic limit). Space here is the phase space comprising of positions and conjugate momenta. If only Boltzmann had not hung himself outside his window, he might have lived to see that his hypothesis, which resulted in his untimely demise, was in fact correct.

Birkhoff combined his ergodic theorems with Poincare’s differential topology to form the basis of Dynamical Systems – that deterministic mechanical systems can exhibit random like behavior. Ansonov outlined systems (mostly Hamiltonian) under which this is possible, Smale constructed a model (the horse shoe), that led to Symbolic Dynamics that describes response of a complex system as a random sequence of symbols (coin tosses like 0,1), classes that AI tries to learn. That completes the learning circle for me, for now.