Visual Perception - The Eyes of Self-Driving Cars

Posted at 2 years ago

Autonomous cars, or self-driving cars, has gradually entered the public eye from the original black technology. According to the degree of intelligence, autonomous driving is divided into 5 levels from L1 to L5: L1 refers to assisted driving, L2 refers to partial autonomous driving, L3 refers to conditional autonomous driving, L4 refers to highly autonomous driving, and L5 refers to fully autonomous driving- a true driverless vehicle.

The concept of "unmanned driving", which is increasingly active in the public eye, often refers to autonomous driving at the level of L3 and above. The current L4 pilot is highly automated driving. As we all know, driving a car is the most important thing in the eyes, hands, and heart, so how can driverless cars do this?

The core technology system of unmanned driving can be divided into three levels: perception, decision-making, and execution.

The perception system is equivalent to human eyes and ears, responsible for perceiving the surrounding environment, and collecting and processing environmental information and in-vehicle information, mainly including vehicle cameras, lidar, millimeter-wave radar, ultrasonic radar and other technologies.

The decision-making system is equivalent to the human brain, responsible for data integration, path planning, navigation and decision-making, mainly including high-precision maps, Internet of Vehicles and other core technologies.

The executive system is equivalent to the human cerebellum and limbs, responsible for the acceleration, braking and steering of the car and other driving actions, mainly including core technologies such as the wire-controlled chassis.

Among them, the unmanned visual perception system is based on the deep learning vision technology of neural network and is applied to the field of unmanned driving. It is mainly divided into four modules: Dynamic Object Detection (DOD: Dynamic Object Detection), Passage Space (FS: Free Space) ), Lane Detection (LD: Lane Detection), Static Object Detection (SOD: Static Object Detection).

DOD: Dynamic Object Detection

The purpose of dynamic object detection is to identify dynamic objects such as vehicles (cars, trucks, electric vehicles, bicycles) and pedestrians.

Difficulties in detection include multiple detection categories, multi-target tracking, and ranging accuracy; complex external environmental factors, many occlusions, and different orientations; many types of pedestrians and vehicles, which are difficult to cover and prone to false detection; adding tracking, pedestrian identity switching, etc. Many challenges.

FS: Free Space

Spatial detection is to divide the safe boundary (drivable area) of the vehicle, mainly for vehicles, ordinary road edges, side stone edges, boundaries without visible obstacles, unknown boundaries, etc.

Difficulties in detection include complex environmental scenes with complex and diverse boundary shapes, which makes generalization difficult.

Unlike other detections that have a clear single detection type (such as vehicles, pedestrians, and traffic lights), the passage space needs to accurately divide the driving safety area, as well as the boundaries of obstacles that affect the vehicle's forward movement. However, when the vehicle accelerates and decelerates, the road is bumpy, and the slope is up and down, the pitch angle of the camera will change, and the original camera calibration parameters are no longer accurate. After projecting to the world coordinate system, there will be a large ranging error, and the boundary of the passing space will appear. Problems such as shrinking or opening can occur.

The passing space is more concerned with the edge, so the burr and jitter at the edge need to be filtered to make the edge smoother. The side boundary points of obstacles are easily projected to the world coordinate system by mistake, resulting in the passable lane next to the preceding vehicle being identified as an impassable area, so the strategy and post-processing of boundary points are more difficult.

LD: Lane Detection

The purpose of lane detection is to detect various lane lines (one-sided/two-sided lane lines, solid lines, dashed lines, double lines), as well as the color of the line type (white/yellow/blue) and special lane lines ( bus line, deceleration line, etc.) and so on.

The difficulties of lane detection include the variety of line types and the difficulty in detecting irregular road surfaces. In the case of ground water, invalid signs, road repairs, and shadows, the lane lines are easily detected by mistake or missed. Trapezoid and inverted trapezoid lane lines are easy to fit when going up and down slopes, bumpy roads, and vehicles starting and stopping. The fitting of curved lane lines, remote lane lines, and roundabout lane lines is more difficult, and the detection results are easy to be ambiguous.

SOD: Static Object Detection

Static object detection is the detection and recognition of static objects such as traffic lights and traffic signs.

Difficulties in detection include traffic lights and traffic signs, which are small object detection, and occupy a very small pixel ratio in the image, especially at long-distance intersections, which are more difficult to identify.

In the case of strong light, sometimes it is difficult for the human eye to distinguish, and the car parked in front of the zebra crossing at the intersection needs to correctly identify the traffic lights before making the next judgment. There are many types of traffic signs, and the collected data is prone to uneven quantity, resulting in imperfect detection model training. Traffic lights are easily affected by light and are indistinguishable in color (red vs. yellow) under different lighting conditions. And at night, red lights are similar in color to street lights and store lights, which can easily lead to false detections.

Deep learning models are inseparable from the blessing of data. One of the main hinderances to the application of visual perception systems in the field of autonomous driving is the lack of a large amount of data that is related to the autonomous driving scenario. As the world's leading provider of AI data solutions, Magic Data, the company develop the MagicHub open-source community, can professionally and effectively provide data services and solutions for specific needs of self-driving cars manufacturers. For more information, visit

Related Datasets

Datasets Download Rank

ASR-RAMC-BigCCSC: A Chinese Conversational Speech Corpus
Multi-Modal Driver Behaviors Dataset for DMS
ASR-SCKwsptSC: A Scripted Chinese Keyword Spotting Speech Corpus
ASR-SCCantDuSC: A Scripted Chinese Cantonese (Canton) Daily-use Speech Corpus
ASR-SCCantCabSC: A Scripted Chinese Cantonese (Canton) Cabin Speech Corpus
ASR-EgArbCSC: An Egyptian Arabic Conversational Speech Corpus
ASR-CCantCSC: A Chinese Cantonese (Canton) Conversational Speech Corpus
ASR-SpCSC: A Spanish Conversational Speech Corpus
ASR-CStrMAcstCSC: A Chinese Strong Mandarin Accent Conversational Speech Corpus