BY KATHARINE SHILCUTT

In the seminal 1961 science fiction novel “Stranger in a Strange Land,” Robert Heinlein coined a word with a meaning that’s expanded far beyond anything Heinlein could have imagined at the time: “grok,” or to understand a thing so innately, so intuitively, that you essentially merge with that thing to create an entirely new form greater than the sum of its parts.
Grokking made the leap from science fiction to computer science culture in the early 1980s, when it began to be employed to describe a feeling of being so deeply immersed in coding that programmers had essentially entered a new worldview, transforming their entire understanding of programming.
And now, a discovery at Rice has led to a similarly radical shift in the way we understand neural networks and the way they are also able to grok. This research suggests neural networks — machine learning models that use a network of interconnected nodes to process data in a way that mimics the human brain — can learn and generalize better than previously thought. It’s a phenomenon with potential to drastically improve artificial intelligence (AI) training efficiency by reducing the long hours and massive amounts of computing power necessary to train them.
“When we got those grokking results showing that this is so widespread, it was hard for us to believe,” said Rice Ph.D. student Imtiaz Humayun, who co-authored the 2024 paper “Deep Networks Always Grok and Here Is Why” with Rice Ph.D. alumnus Randall Balestriero and Rice professor Richard Baraniuk. “We quadruple-checked everything because it completely changes the way we understand how neural networks learn.”
We quadruple-checked everything because it completely changes the way we understand how neural networks learn.
—Imtiaz Humayun
Their research provided the first evidence that grokking, or delayed generalization, in a deep neural network (DNN) occurs long after achieving a near-zero training error. Previous studies reported the occurrence of grokking in specific controlled settings, such as DNNs initialized with large-norm parameters or transformers trained on algorithmic datasets, but Rice researchers demonstrated that grokking is actually much more widespread.
This discovery runs counter to previous beliefs that if you overtrain a neural network, it will overfit, or get worse at a task across time. However, Humayun, who is also a student researcher for Google, and his team found that you simply need to give the AI more time to learn. And at a certain point, a phase shift happens. The neural network suddenly groks. Even more fascinating is the fact that this happens across all neural networks: large language models, facial recognition models — Humayun and his team tested everything they could.
“And we saw that for all the combinations we tried, neural networks — when you keep on training them — change their mind, and they start grokking and generalizing instead of just overfitting,” he said. “The way it learns internally undergoes a phase change during training, for some reason that we are yet to completely understand, but that ends up giving us a much more robust network toward the end.”
This is not the first dramatic discovery Humayun has made alongside Baraniuk, the C. Sidney Burrus Professor of Electrical and Computer Engineering, professor of computer science and director of OpenStax. Last year, Humayun was also co-author of a paper with Baraniuk; Rice Ph.D. students Sina Alemohammad, Josue Casco-Rodriguez and Hossein Babaei; Rice Ph.D. alumnus Lorenzo Luzi; Rice Ph.D. alumnus and current Stanford postdoctoral student Daniel LeJeune; and Simons Postdoctoral Fellow Ali Siahkoohi that demonstrated the previously unimagined negative consequences of training AI systems on synthetic data — work that quickly made head- lines across the world.
“I’ve learned so much from Rich B,” said Humayun, referring to Baraniuk with the affectionate nickname used by all of Baraniuk’s students. “He does what he calls wish-driven research, asking us to start from somewhere and look for bizarre connections. It’s harder to go out of the box rather than starting somewhere far outside of the box.”
During his undergraduate years in Bangladesh, Humayun became fascinated with AI after dabbling in it for robotics competitions. He realized his fellow students needed Bengali datasets if they wanted to train AIs in their native language, so Humayun established a nonprofit organization dedicated to creating AI datasets in Bengali and open sourcing them through competitions. Instead of just a few students working on Bengali technologies, soon there were over 10,000, all competing for bigger and better prizes. Last year, Humayun hosted a competition on Kaggle with a $53,000 prize, supported by Google, which resulted in the current state-of-the-art Bengali speech recognition AI model.
“That’s one thing that drew me to Rice,” said Humayun. “Rich B has OpenStax, this nonprofit venture that’s changing the world and has a really high impact — and I have my nonprofit — plus his research was something that I was really interested in, and I was like, ‘That’s where I want to be.’”
Now in his sixth year at Rice, Humayun is eager to see what else he can accomplish alongside Baraniuk before he graduates. He wants to understand the phase change that enables neural networks to grok so researchers can make it happen earlier — “so we don’t need to have megatons of carbon emissions before we end up grokking,” Humayun said. He’s also been excited to see his Ph.D. research, which was once largely theoretical, become more practical by the day. And he’s enthusiastic about witnessing a Nobel Prize in physics finally awarded to AI researchers (John Hopfield and Geoffrey Hinton for their fundamental discoveries in machine learning).
“This is such a positive thing, because the world will be using exponentially more AI soon,” Humayun said. “It’s inevitable. And through these Nobel Prizes, it’s just being made explicit how much AI is meaningful to humanity in general.”