This is the thirty-seventh article in a series dedicated to the various aspects of machine learning (ML). Today’s article will introduce computational learning theory, a school of thought dedicated to answering or even just pondering some of the most profound, troubling, and profoundly troubling questions in the machine learning field.
Back in grade school, you were just one of twenty or thirty (or more, or less) students in a classroom that may or may not have been paying attention to whatever your teachers were lecturing you about day in, day out.
Maybe you did well on tests. Maybe you didn’t. Maybe you shouldn’t have done well on those tests but cheated and ended up getting an undeserved good score. Regardless, if you are reading this article on machine learning, we assume that you graduated, or that you at least have arrived at a personal standpoint where you actually value education.
Whatever your experience was, you might agree that your level of enthusiasm relating to education could hardly be measured by your test score. Some people got good grades because they wanted to go to a good college, and others got bad grades yet still enjoyed learning.
The big challenge for your teachers and school administration was measuring just how well students are actually learning. Like we said, test scores tell one story, while a student’s personal educational development is a whole other story. How can one tell whether a student is “actually” learning biology or not, and is not just regurgitating vocabulary words they crammingly memorized over a 0.5-2 day period back onto the midterm?
In a 45 minute class period, where there are a dozen or dozens of students to teach, this can be a thorny problem, with no clear solution.
The issues we have with effectively teaching students are paralleled in the machine learning world, where the question of whether an AI agent is actually learning or not is always on the table. The field dedicated to navigating these problems is called computational learning theory, and it is the subject of today’s article.
Computational Learning Theory
Computational learning theory, or CLT, is concerned with questions that spring up in development and training, and may persist beyond these stages. Examples include, “how badly does an agent need to screw up in order to learn the correct way to accomplish a task?”
A question like that is incredibly important, especially when dealing with agents like R2, Domino’s automated delivery bot, who could be liable to run over a previously unseen animal or other obscure life form if not programmed and trained correctly to recognize the signs of a sentient, pain-feeling and mortal being that would be harmed or worse by R2 running it over.
Many studies in CLT revolve around how well an agent can learn a particular action over an unknown amount of time and an unknown amount of mistakes. Though these answers can be discovered for most agents through training, the fact of the matter is that training is costly, so dedicating work to figuring out the broader details of how AI agent’s learn can be beneficial in the long run.
There are a few core questions of computational learning theory, which we will enumerate below.
Mistakes
We mentioned this above, but it is of high interest for developers to know how many mistakes a machine learning agent will make before forming a “correct” hypothesis about its environment and actions.
This is so important because, as mentioned, training is expensive. This is especially true when it comes to learning methods like reinforcement learning, where the agent is basically engaged in a trial-and-error method of learning. How many pizzas does R2 need to destroy before it learns to not hustle over speed bumps?
Computation
Memory usage is one of the big “costs” of running an AI agent. The more memory, and generally other computational costs, needed to run a ML algorithm will beget more real-life costs, as more (expensive) high-tech hardware will be needed in order to have an agent running without a hitch.
So, how high-tech does an agent need to be in order to arrive at the correct hypothesis?
Sampling and Training Time
The basic question here is, How much time does an agent need to spend in training in order to figure out the right hypotheses? And how diverse does the training data need to be to ensure that the agent can come out of the other end ready to tackle unpredictable real world problems?
For example, R2 is likely trained on more than just one pizza delivery route, which we know is key to avoid overfitting, where an agent becomes overused to the training data and can’t function in real world settings. But, the bigger question is how many different routes does it need, and how many times does it need to run these routes for it to have a solid understanding of what it takes to deliver a pizza on time, undamaged, without harming anyone or thing on the way there.
Summary
We posed a lot of questions in this article, and we did so to lay the groundwork for a wide-ranging and important field in machine learning, which is computational learning theory. Questions related to training, computational effort, and mistakes all inform this field, and we hope to give you some headway as to how these questions are answered in the next article.
Recent Comments