Author : Ashwin Nair
Tesla has made quite a buzz on the AI scene last week with the Tesla AI Day presentation day.
Autonomous vehicles must make split-second decisions with the use of neural networks and camera vision. If the brains of a self-driving car are the neural networks modules, the eyes are the cameras and the sensors.
The presentation displayed amazing innovations in the development of autonomous vehicles. It was not dialled down in terms of technicality, which is commendable, because sometimes going in depth without dumbing down the details can show glimpses of how the brightest minds in a certain field of work are trying to solve the toughest problems in our generation.
Andrej Karpathy, Tesla’s Director of AI and Autopilot Vision showcased the development of Tesla’s computer vision systems.
The new vector space visualized on the right compared to their environment recognition system on the left
Predicting a vector space instead of an image space may prove to be a huge leap forward for solving autonomous vehicle problems. Usually, computer vision deals with 2 Dimensional Images. But the world occurs organically in 3 dimensions, with time as a factor as well. For an autonomous vehicle to effectively navigate its surroundings, it requires cameras and sensors. Tesla’s autopilot function utilized 8 cameras for object detection.
Another excellent idea is the fusion of multiple camera sensors data before detecting what is surrounding the vehicle. It may sound simple, but it is an extremely difficult engineering feat to combine detection, machine learning on all the sensors at once as opposed to relying on individual sensors to make decisions, and then only combining the decisions. To make navigation as seamless and accurate as possible, these eight inputs are combined into a single virtual environmental prediction model that provides the car’s computers with an aerial view.
You can’t make a decent prediction without decent data
One of the interesting concepts that was presented was the use of clips of data, including GPS/IMU, odometry, video images, for multiple vehicles at the same location and time, of static objects and moving objects. Initially, Tesla used an estimated amount of 1,000 people to label the data of the car’s surroundings manually, and additionally they make use of auto-labelling from data acquired by Tesla vehicles on the road.
This sort of combined annotation at a certain time and place will be improved as the fleet of Tesla vehicles increase. This data is used as well, to produce simulations for annotation of complex scenarios where accurate labelling of real-world data is impossible, such as a multitude of pedestrians!
Aside from the autonomous vehicle, more details about the Dojo supercomputer were unveiled. It is a chip that is specially designed for training in machine learning, and development is under progress. This will be used extensively in the training of models for Tesla vehicles, and it will aid in the enhancement in terms of speed and computing power, for autonomous vehicle decision making. Furthermore, it will assist in handling unexpected driving scenarios more rationally.
Benefits of the Dojo Computer
In the distant past we used Machine Learning with just the CPU (central processing unit ) . Then, we moved on to using GPUs (graphics processing units). Currently, we use TPUs (Tensor Processing Units ) especially on Google Colab for Tensorflow Machine Learning, by Google especially. And now have a glimpse of the future, possibly the DPU, the Dojo Processing Unit. It’s fascinating to view the history and the accelerated improvements in technology. The future knows no bounds.
References