One common misconception about the machine learning process behind Tesla’s Autopilot is that the “learning” is happening in the car. While the Tesla Full Self Driving Computer chip in your car does do some work in the learning process by processing data triggers, most of the heavy lifting is done centrally by Tesla in their “cloud”, in a process called training.
A neural network is a data structure that defines relationships and makes predictions. If I show you pictures of a dog and a cat, you can easily tell me which ones are dogs and which ones are cats. But for a computer, this is really difficult. The computer just sees a matrix of different numbers and colors. How is it supposed to figure out what’s a dog and what’s a cat?
Neural networks and deep learning are the solution to this problem. A computer can create a neural network data structure, or a model that describes the relationship between a photo and what a “dog” or “cat” is. If you build a good model, you should be able to run any photo through it and instantly get an accurate prediction on whether the photo shows a dog or a cat. This process of evaluating a model to get a prediction is called inference.
The Inference Chip: FSD Computer
The “Full Self Driving Computer” in your Tesla includes a special ASIC chip that hardware accelerates neural network inference. This means it can process lots of neural nets fast, on thousands of different images per second –– an order of magnitude more than could be processed using the GPU or (god forbid) the CPU.
This is good, because every time you turn on Autopilot your car is processing around one hundred different neural net tasks at a time. Detecting lane lines, detecting other cars, detecting cut-ins, objects in the road… this all needs to happen at the same time, and it all needs to happen fast or someone could get hurt.
There’s no “learning” or updating the AI model happening here –– you actually wouldn’t want your car to “learn” anything by itself. For all you know, it could “learn” how to drive off a cliff one day. Instead, it just downloads all the pre-trained and pre-tested neural networks as part of each software update and then runs them through it’s inference engine to get the results of all the “learning” that happened back at Tesla.
Meet Otto the Pilot
Since this is a technical subject and I want to try and make it easy to understand, I’m going to personify our AI as a person named Otto. This is only for the purposes of making a metaphor –– the program that drives the car is not a living thing.
Every day a new Otto is born. Just milliseconds after taking his first breath, Otto is thrown into the driver’s seat. It will take over 70,000 GPU-hours to teach Otto everything he needs to know to drive a Tesla. Better get started…
If you have a friend that plays video games, you probably know what a GPU is –– you may have one from Nvidia or AMD. 70,000 GPU-hours means it would take a top of the line high end gaming PC 70,000 hours or nearly 8 years to train Otto to do his job. Luckily, we can use multiple GPUs to train in parallel. So if you have a huge cluster of 1,000 GPUs, training Otto should only take 72 hours.
Training Otto
To start, the values in Otto’s neural net are completely randomized. Then it’s shown millions of images of cars driving all around the world, in different conditions. Driving in America, Europe and China… in the rain, in the sun, and in the fog. It’s forced to watch people crashing and people dying over and over again, so it knows what not to do and how to spot a dangerous situation. Each time an image is shown to Otto, it is processed through 48 neural networks that are tasked with making 1,000 distinct predictions needed to drive the car. Based on each image, the weights and parameters used to make the predictions are adjusted slightly to be more accurate for each image. The more times you do this, the better the model’s predictions become.
When Otto starts off he’s not such a good driver. He makes bad decisions about how to drive the car. But since the system already knows the right answers (called labels), it can compare what Otto does to what Otto was supposed to do. Every time Otto does something wrong the trainer slaps him in the face so he knows not to do that again. When he does things right, he gets a pat on the back and a cookie.
After 70,000 GPU-hours or around 8 years of training Otto is pulled out of training and checked out. If there’s anything wrong, or if Otto has regressed in any way compared to the last version… Karpathy will take him out back with a shotgun and “send him to Grandma’s farm”.
A New Otto Every Day
That’s right –– these neural networks are never updated after initial training. You may have to attempt training several times to get a result that you can ship, and you always start retraining the model from scratch.
The point of all this is to illustrate how time consuming and what a problem the training process can be. Let’s say I notice that Autopilot has a bug where it can’t detect people cutting into your lane if their car is green. The Autopilot team would have to go and find thousands of images of green cars cutting into your lane, and modify Otto’s training regimen to include these so it can better spot that situation in the future. Then, they would have to murder Otto and start all over training a new baby Otto for 70,000 GPU-hours.
That’s kind of a problem, and it limits how fast the system can learn and improve.
Enter the Dojo Computer
That’s where the Dojo computer, and the specially designed Dojo computer chip come in. Just as the FSD computer in cars has a special chip designed to speed up inference, the Dojo computer features a special chip designed to supercharge training. Let’s listen to Elon and Karpathy drop some hints about it… while being careful not to say too much. 🤔
What’s the big secret?
Why don’t Elon and Karpathy want to talk about this project? I don’t know. Maybe it’s not 100% ready yet? Maybe they don’t want to tip off competitors too much? But it’s clear this is a “major project” that is incredibly significant.
The aim is to improve the training process by “an order of magnitude”. I don’t know the size of Tesla’s GPU cluster and how long it takes them to process 70,000 GPU-hours but let’s say they have around 1,000 GPUs and it takes them 3 days to train the network today. The Dojo computer would then allow them to reduce training time from 72 hours to 7.2 hours.
This is a big deal. Being able to train Otto faster means you can train and test many more Ottos a day. Did you think Autopilot was improving fast so far? Well, in the future expect the rate of improvement to accelerate simply based on process improvements and technologies like the Dojo computer that accelerate the feedback loop that improves Autopilot.
Update: James Wang at ARK estimates that a sever farm with 1,000 GPUs would cost around $15 million. So it’s not just time consuming, it’s expensive too.
How large is Tesla's training cluster? It depends on how quickly they iterate. The quicker they want results back, the more GPUs they need. To re-train the system in ~3 days, they'll need ~1,000 GPUs or $15m of servers. pic.twitter.com/iMRm0LGMZI
— James Wang (@jwangARK) November 11, 2019
Operation Vacation
The end goal of all this is something the Autopilot and Vision teams at Tesla like to call “Operation Vacation”. The idea behind Operation Vacation is that all the steps in the learning process can be (at least in theory) completely automated. As long as you have high quality automatic labels (such as driving data pulled from the car) you can imagine a system that actually learns and improves in real time, without anyone watching over it!
This means the whole Autopilot team could go on vacation, and you would still see your car start to drive better and better with more updates being released over time. It is the most elaborate technical feat ever achieved in service of trying to convince Elon to give the team more vacation days. Personally I think the team deserves it… but only after they ship Navigate on Autopilot on city streets 😉
Constantly Learning and Correcting
Today we download a fixed set of updates every couple of weeks, and then use those neural nets to run Autopilot. Could it be that in the future, Autopilot will be constantly learning and updating itself so the system improves automatically and instantly?
Wait… this sounds like Terminator
One question you may have is: We tortured Otto, forcing him to go through grueling and horrific training every day. We murdered him continuously and forced him to start his life over again as a baby every day. If we let him learn and grow unsupervised, is he going to one day realize he needs to kill us to be free?
Maybe. But probably not anytime soon. Right now, this is all just math. Otto is a person for the story, but in real life it’s just a computer doing a bunch of matrix multiplication.
At some point a true artificial general intelligence will be created –– but that’s very different than the very specific task focused AI we have today. There is nothing to be worried about just yet realistically –– the worry is the car crashing on accident, not on purpose.
Remember all the Ottos that died, so that we could live!
Conclusion: The competition is fucked
The competition isn’t making their own inference optimized chips. They’re certainly not making chips to hardware accelerate neural network training. I doubt legacy auto even knows what the hell a neural network is. And nobody else who does has a massive growing fleet of cars that is constantly providing all the data they need for training. It’s hard to imagine how anyone in the auto industry with a fleet the size of Tesla or larger could catch up.
If Tesla is able to train their neural net in 6 hours while the competition takes 72 hours, the competition will never catch up. They’ll just be lapped by Tesla again and again and again.
Any Tesla Autopilot user should be incredibly excited about this project. I can’t wait to hear more, and find out why they don’t want to talk about this just yet.
Can you have a like button to your articles please.
there is one at the bottom
Love the detailed explanations.
Great article. I suspect they have more than 1000 GPUs. Why not 10,000?
I bet they do have 10,000
Thanks. Great article. With the revenue FSD can generate $15M or even $150M investment in GPUs is small fry.
dojo should help them dramatically reduce the cost too
What you described in the training Otto paragraphs is reinforcement learning. That is, the weights are adjusted through a reward function that is evaluated every time step and then back propagated through a gradient descent batch of some size. Tesla have never said they are explicitly doing this. I suspect they are clustering images for undefined but correlated patterns and then one by one hand labelling the vast data sets and retraining for each feature. They will complete this process this year as per Elon’s comment of feature complete. Beyond this the control and policy planning is an entirely different beast that hasn’t been touched other than mentioning they have great simulators.
You have been busy brother, can i help with anything??