Intelligent Vision Tech Express 2020 - Huawei
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Cloud Service 03 Discussion on Video Cloud 63 Service Trends P2P Technology 67 ONTENTS Products and Solutions Catalog 72 Preface Ecosystem 04 Embrace the Intelligent Vision, 02 Build an Intelligent World 5G Discussion on Intelligent Vision 74 01 Ecosystem Trends Products and Solutions Catalog 78 Discussion on the Impact of 5G on Intelligent Vision 05 Appendix 5G-enabled Image Encoding and Transmission Technologies Products and Solutions Catalog 10 15 05 Abbreviations 81 AI Legal Statement 82 02 Product Portfolio 83 Image, Algorithm, and Storage 17 Trends Led by AI Discussion on Frontend Intelligence Trends 24 Discussion on Development Trends 28 Among Intelligent Video and Image Cloud Platforms Chip Evolution and Development 32 Algorithm Repository Technology 36 SuperColor Technology 42 Video Codec Technology 47 Storage EC Technology 52 Multi-Lens Synergy Technology Products and Solutions Catalog 56 60 CONTENTS
Embrace the Intelligent Vision Build an Intelligent World — President of Huawei Intelligent Vision Domain 02
In the past 120 years, three industrial revolutions have made breakthroughs in fields such as electricity and information technologies, dramatically improving productivity and our daily life. Today, the fourth industrial revolution, driven by AI and ICT technologies, ushers in an intelligent era where all things are sensing, interconnected, and intelligent. Vision, the core of biological evolution, will serve as a significant enabler in this era. The combination of AI and vision systems will enable machines to perceive information and respond intelligently, which revolutionizes people's work and everyday life, and improves productivity and security. Today, we are delighted to see that new ICT technologies, such as 5G, AI, and machine vision are being put into commercial use, and playing a significant role in the video surveillance industry. 2020 marks the first year of 5G commercialization as well as a turning point of AI development. Additionally, machine vision now surpasses human vision to obtain more information in specific scenarios. The three technologies are interwoven with each other, fueling the development of intelligent vision. Huawei remains steady in its commitment to embed 5G technologies into intelligent vision, which opens up opportunities by providing high bandwidth, low latency, and broad connection capabilities. Huawei is developing intelligent cameras like how we develop smartphones by revolutioniz- ing the technical architecture, ecosystem, and industry chain. Huawei embeds innovative operating system (OS) into software-defined cameras (SDCs) to enable remote loading of intelligent algorithms anytime, anywhere. The HoloSens Store allows users to download and install algorithms on cameras depending on their needs. Huawei adheres to the "platform + ecosystem" strategy to build a future-proof intelligent vision ecosystem and empower more industries. Huawei is committed to providing platforms and opening algorithms and applications to benefit vendors and customers across industries. Huawei develops cloud-edge-device synergy to maximize data value. Huawei will give full play to the technical advantages of the device-edge-cloud industry chain, develop devices based on cloud technologies, and empower the cloud through interconnection with various devices, thereby advancing the digital transformation of all industries. Intelligent vision serves as the eyes of the intelligent world, the core of worldwide sensory connections, and a key enabler for digital transformation of industries. Huawei Intelligent Vision looks forward to, together with our partners across indus- tries, driving industry development and the intelligent transformation of cities, production, and people's life with the power of technology, to build an intelligent world where all things can sense. 03
5G Discussion on the Impact of 5G on Intelligent Vision 05 5G-enabled Image Encoding and Transmission Technologies 10 Products and Solutions Catalog 15 01
Niu Liyang, Liu Zhen Discussion on the Impact of 5G on Intelligent Vision Niu Liyang, Liu Zhen 1. 5G Development New 5G infrastructure is driving the expansion of the global digital economy, and each country’s information capability is represented by the state of their 5G networks. 5G is even revolutionizing the whole industry chain, from electronic devices to base station devices to mobile phones. Therefore, major economies around the world are accelerating their application of 5G and actively exploring upstream and downstream industries to seize the strategic high ground. According to TeleGeography, a prominent telecommunications market research company, the number of global 5G networks in commercial use had reached 82 by June 2020, and will be doubled by the end of 2020. 2. Features of 5G Networks With their high bandwidth, low latency, and massive connectivity, 5G networks contribute to the building of a fully connected world. They have three major applications: Enhanced Mobile Broadband (eMBB), Ultra-Reliable Low Latency Communications (URLLC), and Massive Machine Type Communications (mMTC). Users can select the 5G devices they require according to different scenarios, and developers can select development scenarios based on the types of applications they want to create. Source: International Telecommunication Union (ITU), partly updated eMBB 10 ms Latency Latency 1 ms Fast transmission at Gbit/s 3D video and UHD video Uplink Uplink Smart home Cloud-based office/gaming 1 Mbit/s service rate service rate 200 Mbit/s Intelligent video surveillance Augmented reality (AR) Voice intercom Industrial automation 4G 5G High-reliability applications, Downlink Downlink Smart city such as mobile healthcare 10 Mbit/s service rate service rate 2 Gbit/s Self-driving car mMTC URLLC 5G application scenarios Comparison between 5G and 4G 3. Impact of 5G on Intelligent Vision Extending the breadth of intelligent vision In the 4G era, video services were limited to the consumer field. This was due to the low bandwidth and high latency of 4G networks. However, compared with 4G, 5G improves the service rate by about 100-fold, and reduces latency by about 10-fold, enriching video application scenarios, from remote areas with complex terrains, to mines, factories, harbors with cabling difficulties, and places requiring security for major events. 05
5G/Discussion on the Impact of 5G on Intelligent Vision 5G camera Rongbuk Monastery 5G camera installed atop Mount Qomolangma Video image from a 5G camera 5G increases the peak transmission rate limit, laying a solid foundation for the internet of everything. It will play an important role in communications among machines and drive innovation across a range of emerging industries. Because of its high mobility and low power consumption, 5G is capable of supporting a wide array of frontend devices, such as vehicle-mounted devices, drones, wearables, and industrial robots, which will serve as significant carriers for video awareness. It is estimated that by 2023, the number of connected short-distance Internet of Things (IoT) terminals will reach 15.7 billion. In addition, the 5G network can be sliced into multiple subnets to meet the differing requirements of terminals in terms of latency, bandwidth, number of connections, and security. This will further enrich the application scenarios of 5G. Vehicle- mounted device 5G network Drone Harbor Vehicle Emergency assurance 5G slicing 5G slicing 5G slicing (harbor private (bus private (emergency assurance network) network) private network) Wearable 穿戴设备 Industrial robot 工业机器人 Diverse 5G terminals become enablers of intelligent vision Network slicing enriches 5G application scenarios Typical application case Optical fibers deployed at harbors are prone to corrosion, and those on gantry cranes can easily become entangled during operations. To solve this problem, HD cameras are connected to 5G networks to monitor gantry cranes, so that operators can remotely check lifting and hoisting operations in real time and promptly identify anomalies. In addition, powered by 5G and artificial intelligence (AI), most container hoisting operations can be completed by machines, greatly improving efficiency. When 5G is applied in a harbor, the transfer efficiency of the harbor is doubled, and the deployment and maintenance costs of optical fibers are reduced by about CNY100,000 each year. Additionally, operators no longer need to work at heights, greatly improving their work efficiency and ensuring safety. 06
Niu Liyang, Liu Zhen 5G networks enable HD cameras to obtain Remote operation in the central control room full coverage Optical fibers on existing 18 HD cameras are Remote detection and remote gantry cranes required for precise control control joystick On-site operation Optical fibers Camera • Optical fibers easily 50 gantry cranes, Operators in the central control room can remotely become entangled each fitted with 10 to operate two or three gantry cranes at the same time • Cabling is subject to 18 cameras sea tide impact 5G ushers in the AI era 5G is revolutionizing the way we think about AI. AI is now deeply rooted in the video surveillance industry, which in turn poses increasingly high requirements on video and image quality. 4K video encoded in the H.265 format requires an average transmission bandwidth of 10 Mbit/s to 20 Mbit/s. However, when intelligent services are enabled, the immediate peak transmission rate will soar to over 100 Mbit/s, far higher than that provided by 4G networks. Once they are connected to 5G networks, cameras can utilize the high bandwidth to quickly deliver detailed, high-quality video images, thereby improving intelligent analysis performance. 4G network 720p video 720p camera Low-definition video, which cannot be used for intelligent services Bandwidth: 1 Mbit/s VS 5G network 4K video High-quality video, meeting the 4K camera requirements of intelligent services Bandwidth: 200 Mbit/s With its low latency, 5G serves as the supporting system for AI. During the Industrial Revolutions, people increased their productivity by mastering mechanical energy. At present, we are experiencing an AI revolution, in which people are improving the intelligent capabilities of machines by harnessing computing power. As the cost of computing power drops, the cloud, edges, and devices are coming to possess ample computing power, which they can use to perform video-based analysis using intelligent algorithms, and generate massive amounts of valuable data. This data can only be fully utilized when it is quickly transferred among the cloud, edges, and devices. 07
5G/Discussion on the Impact of 5G on Intelligent Vision Intelligent capabilities are like electric power. The electric power possesses great potential, but cannot be directly applied in industries unless a power transmission network is built. 5G, in essence, serves as the transmission network for computing power and intelligent data. It enables the full implementation of intelligent capabilities, and by doing so, is promoting the intelligent transformation of industries and people's everyday life. AI Cloud 0 1 1 0 0 0 1 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 Edge node AI Edge node AI Edge node AI 5G 5G 5G Intelligent data transmission on the devices, edges, and cloud Typical application case Major economies around the globe are seeking to digitally transform their manufacturing sectors. Aircraft manufacturing is the most valuable sector of the manufacturing industry. Aircraft manufacturers adopt 5G and AI technologies for quality assurance, reducing the time required for carbon fiber stitching gap checks from 40 minutes to 2 minutes. In addition, 5G cameras provide a wide range of intelligent applications in factories, including safety helmet detection, workwear detection, and perimeter intrusion detection. Aircraft manufacturing plant 4. Application Bottlenecks of 5G in Intelligent Vision The high bandwidth and low latency of 5G enable wireless video transmission, extending the boundary of intelligent vision applications. When powered by 5G, cameras can connect to massive sensors to implement multi-dimensional awareness. Additionally, as 5G develops, it is enabling the creation of various innovative kinds of devices, fueling the digital transformation of all industries. 08
Niu Liyang, Liu Zhen Every technology encounters various difficulties when it is being applied. 5G is no exception when it is applied to intelligent vision. The 5G uplink and downlink bandwidths are unbalanced, and the total 5G uplink bandwidth of a single base station is limited to around 300 Mbit/s. However, most of the time, cameras upload P-frames containing changes in an image from the previous frame, as well as periodically upload I-frames containing all information. As a result, bandwidth usage can fluctuate dramatically. The instantaneous transmission rate of a single 4K camera can reach 60 Mbit/s. If five 4K cameras are connected to a single 5G base station, the uplink bandwidth of the base station will be insufficient for video transmission during peak hours. Therefore, video encoding needs to be optimized so cameras can adapt to the limited uplink bandwidth of 5G networks. In addition, packet loss and bit errors during wireless transmission may cause image quality issues such as artifacts and video stuttering, which require more reliable transmission modes. Limited uplink bandwidth Packet loss and bit errors frequently occur during wireless transmissions Artifacts and video stuttering may occur due to wireless network transmission limitations. A 5G network uses short wavelengths for transmission, which results in fast signal attenuation. The network bandwidth decreases rapidly as the distance increases. Therefore, the number of cameras that can be connected to a single 5G base station is limited. In addition, carriers tend to build 5G base stations based on their actual requirements in terms of construction costs and benefits, and 5G coverage is limited in the short term. Therefore, it is important to properly and efficiently use 5G base station resources and improve the coverage and access capability of a single base station. 400 m 400m 300300m m 200 200m m 100 100m m 100 100m m 200 200m m 300 m 300m 400m 400 m Mbit/s 90 90Mbps 6060Mbps 140Mbps Mbit/s 140 210Mbps Mbit/s 210 Mbit/s 210Mbps 210 Mbit/s 9090Mbps 140Mbps Mbit/s 140 Mbit/s 6060Mbps Mbit/s Bandwidth attenuation of a 5G base station To solve these problems, 5G cameras should not simply be combinations of cameras and 5G modules. Instead, they should provide efficient video/image encoding capabilities to reduce the bandwidth required for transmission. Additionally, reliable transmission technologies are needed to prevent the packet loss and bit errors which occur during wireless transmission. In this way, 5G base station resources can be utilized properly. Built-in 5G module More efficient 5G module encoding More reliable transmission 09
5G/5G-enabled Image Encoding and Transmission Technologies 5G-enabled Image Encoding and Transmission Technologies Chen Yun, Liu Zhen 5G expands the scope of intelligent vision, and embeds artificial intelligence (AI) into a wide range of industries. However, due to the limitations of 5G New Radio (NR), wireless 5G networks feature limited uplink bandwidth, and have high requirements for network stability. Technical innovations have sought to overcome these challenges for utilizing 5G in intelligent vision applications. 1. Challenges to Video and Image Transmission on 5G Networks Video and image transmission requires high uplink bandwidth and stable wireless networks 5G networks adopt a time-division transmission mode, and spend 80% of the time transmitting downlink data and 20% of the time transmitting uplink data, under typical configurations. Generally, the uplink bandwidth of a single 5G base station accounts for only 20% of the total bandwidth, and can reach 300 Mbit/s. However, in the intelligent vision industry, video and image transmission requires far higher uplink bandwidth than that provided by 5G networks. Wired transmission Typical wireless time-division transmission 1 RX+ (positive end for receiving data) 4:1 subframe 1 2 RX- (negative end for receiving data) configuration D D D S U D D D S U 2 3 TX+ (positive end for transmitting data) 3 4 4 Not used 8:2 subframe D D D D D D D S U U 5 5 Not used 6 7 6 TX- (negative end for transmitting data) configuration 8 7 Not used Time segment labeled with a D is used for data downlink, that labeled with a U is used 8 Not used for data uplink, and that labeled with an S can be configured. Wired transmission in full-duplex mode to receive Uplink transmission occupies only 20% of the total time, and uplink and send data packets anytime data packets can be sent only during the specific time In addition, during video and image transmission, an I-frame containing the full image information is sent first, after which P-frames containing changes in the image from previous frames are sent, followed by an I-frame being sent again. The size of I-frames is larger than that of P-frames. As a result, image data occupies uneven network bandwidth during the 10 ms time window. Sending P-frames does not require a lot of bandwidth, but sending I-frames requires a high amount. For example, the average bit rate of 4K video streams is 12 Mbit/s to 20 Mbit/s, and the peak bit rate during I-frame transmission can reach 60 Mbit/s. This is known as I-frame burst, as it places great strain on the data transmission time window on 5G networks. I-frame I-frame File size I-frame I-frame I-frame P-frame I-frame P-frame P-frame P-frame P-frame 0 Time Bandwidth usage in a 10 ms time window, with each column indicating the size of a file 10
Chen Yun, Liu Zhen In actual applications, a 5G base station always connects to multiple cameras at the same time. In this case, I-frame bursts may occur simultaneously for multiple cameras, resulting in I-frame collision, further intensifying the pressure on 5G NR bandwidth. According to tests, the probability of I-frame collision is close to 100% when over 7 cameras using traditional encoding algorithms are connected to a single 5G base station. Camera 1 I-frame Camera 2 Camera 3 Data packets of three cameras are scattered within 5 seconds, preventing I-frame collision Probability 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of cameras 25 frames 25 frames 25 frames per second per second per second GOP-25 GOP-30 GOP-60 Probability that I-frames of all cameras do not collide with each other Furthermore, 5G networks are challenged by unstable transmission. Compared with wired network transmission, 5G wireless network transmission is subject to packet loss and bit errors, especially during network congestion. This results in video quality issues, such as image delays, artifacts, and video stuttering, which in turn affect backend intelligent applications. Efficiently utilizing 5G base station resources to promote the large-scale commercial use of 5G in intelligent vision In addition to limited uplink bandwidth and network transmission reliability, 5G networks feature a fast attenuation speed, which restricts the coverage of a single base station. This also affects the commercial use of 5G in intelligent vision. 5G transmission is mainly conducted on the millimeter wave and sub-6 GHz (centimeter-level wavelength) bands. These two bands feature short wavelengths, resulting in limited transmission range, poor penetration and diffraction performance, and faster 5G network attenuation. Therefore, the coverage of a single 5G base station is far smaller than that of a 4G base station. In addition, unlike 4G base stations which cover almost all areas, carriers build 5G base stations based on actual project requirements with construction costs and benefits taken into consideration. Therefore, efficiently utilizing 5G base station resources is essential to improving the coverage and access capabilities of a single base station, and to achieving the large-scale commercial use of 5G in intelligent vision. Rate (Mbit/s) Supports 6–8 access channels for Supports 2–3 access channels for 40% of areas 60% of areas 210 140 90 60 Outdoor macrocell 100 m 200 m 300 m 400 m Coverage radius (m) Total uplink bandwidth of 5G networks decreases as the coverage radius increases 11
5G/5G-enabled Image Encoding and Transmission Technologies 2. Key Technologies The biggest challenge for large-scale commercial use of 5G in intelligent vision is efficiently utilizing 5G uplink bandwidth, and preventing packet loss and bit errors. As a remedy, the industry at large has sought to optimize image encoding and transmission. Image encoding optimization Image encoding optimization is designed to eliminate I-frame bursts and reduce bandwidth required for video and image transmission. The region of interest (ROI)-based encoding technology is used to compress image backgrounds, which reduces the overall bandwidth required. In addition, stream smoothing technology is adopted to optimize I-frames, thereby reducing the peak bandwidth required and preventing network congestion. ROI-based encoding technology, reducing the average bandwidth required for video transmission In the intelligent vision industry, bandwidth required for video transmission has soared, as image resolution has continually increased. On top of that, high-quality person and vehicle images are captured and transmitted for intelligent analysis, which requires even higher bandwidth than that for video transmission. However, in real world applications, people tend to only focus on key information in video and images, such as pedestrians and vehicles, and have little need for high definition image backgrounds. ROI-based encoding technology was developed with this understanding in mind. It automatically distinguishes the image foreground from the background, ensuring high resolution in ROI within images, while compressing the background, which reduces the overall bandwidth required for transmission. This technology has managed to reduce the size of video streams and snapshots, with average bit rate a remarkable 30% lower in complex scenarios, and 60% lower in simple scenarios. Compressed encoding of background, reducing bit rate Original Processed by AI Encoder video/image algorithms streams Encoding stream AI Normal encoding of foreground, ensuring high image quality Average bit rate of 1080p video (Mbit/s) 4.5 4 Reduced by 30% 3.5 Complex scenario Common scenario Simple scenario 3 2.5 Reduced by 50% 2 Reduced by 60% 1.5 1 0.5 0 Complex scenario Common scenario Simple scenario Standard H.265 ROI-based encoding encoding ROI-based video encoding vs. Traditional encoding method 12
Chen Yun, Liu Zhen I-frame optimization, reducing peak bandwidth required for transmission The peak bit rate during I-frame bursts is extremely high, which can lead to network congestion. To address this, the industry has adopted a stream smoothing technology to adjust encoder parameters and control the size and frequency of I-frames, reducing the peak bandwidth required for video transmission during I-frame bursts. File size File size Time Time 0 0 Before I-frame optimization After I-frame optimization Peak bit rate of I-frames reduced by 40% after stream smoothing, reducing network congestions caused by I-frame bursts Transmission optimization Transmission optimization technology mainly focuses on intelligent flow controls and network transmission reliability. Intelligent flow controls can detect network transmission status in real time and adjust data packet sending parameters accordingly, to improve overall network bandwidth usage. Network transmission reliability can be enhanced via automatic repeat request (ARQ) and forward error correction (FEC) technologies, and help prevent packet loss and bit errors. Intelligent flow controls In wireless transmission, if data is continuously sent while the network is congested, transmission capabilities will deteriorate sharply. Intelligent flow control technology makes use of flow control units to detect the length of data queues in real time, and adjust the data packet sending parameters accordingly. This allows for more data to be sent during off-peak hours, and prevents data stacking during peak hours, for optimized network bandwidth usage. Channel Data Encoder Packets sent without flow control are prone to Receiver No flow controls packet loss and Video delay and network congestions stuttering No flow control: Data is directly sent to the channel, causing network congestions and packet loss. Encoder Data Intelligent flow control Channel Receiver Adjust the encoder and data packet sending parameters based on the length of data Smooth, clear queues, preventing data stacking. video images Intelligent flow control: Flow control unit monitors network status in real time and adjusts the packet sending parameters to improve network usage and prevent network congestions. 13
5G/5G-enabled Image Encoding and Transmission Technologies Enhanced transmission reliability to prevent packet loss and bit errors Video transmission through the Transmission Control Protocol (TCP) features low efficiency, particularly when packet loss occurs on wireless networks. On 5G networks, video and images are transmitted through the User Datagram Protocol (UDP), which features two implementation methods: acknowledgment and retransmission mechanisms based on ARQ and FEC. ARQ adds a verification and retransmission mechanism on the basis of the conventional UDP-based transmission. If the receiver detects that the transmitted data packet is incorrect, the receiver requests that the transmitter retransmit the data packet. FEC reserves verification and error correction bits during data transmission. When the receiver detects an error in the data, it uses the error correction bits to perform the exclusive or (XOR) operation, in order to restore the data. The transmission optimization technologies can ensure smooth video transmission, even when packet loss rate approaches 10%. However, transmission reliability improvement mechanisms need to be deployed on both the peripheral units (PUs) and backend platforms. Sender 1 0 0 0 D1 D1 D1 0 1 0 0 D2 Data .... D2 transmission 0 0 1 0 = D3 D3 Retransmission D3 0 0 0 1 D4 D4 NOT OK! D4 R11 R12 R13 R14 C1 C1 Redundant Original Sent Received Receiver coding matrix A data B data C1 data C2 ARQ adds a verification and retransmission mechanism on Data D2 lost during data transmission can be restored using the the basis of the conventional UDP-based transmission. If received data and redundancy coding matrix (A^B=C). Data lost the receiver detects that the transmitted data packet is in matrix B can also be restored (C^A). incorrect, the receiver requests the transmitter to retransmit the data packet. ARQ FEC 3. Camera Bit Rate and Base Station Coverage After Optimization These innovations have helped facilitate the commercial use of 5G in intelligent vision. More specifically, ROI-based encoding and I-frame optimization help reduce the average bit rate at the encoding end and the peak bit rate, so that 5G uplink bandwidth can be utilized in a more efficient manner. Intelligent flow controls and transmission reliability improvement technologies enable cameras to actively monitor data sending queues. This helps prevent network congestion and improve 5G bandwidth usage. In addition, advancements in encoding and transmission technologies allow a single 5G base station to connect to more cameras and increase its coverage range. Unit: Mbit/s Uplink bandwidth: 300 Mbit/s 60 20 15 8 6 1 4 3 Peak bandwidth of 1080p video Peak bandwidth of 4K video Number of 1080p cameras Number of 4K cameras supported supported by a single base station by a single base station Before After Before After Number of cameras that can be connected to a single 5G Peak bandwidth required for video transmission base station within 400 m 14
Tan Shenquan, Liu Zhen Products and Solutions Catalog Tan Shenquan, Liu Zhen Huawei 5G Cameras Huawei, has leveraged its accumulated prowess in 5G and network communications, in releasing a series of patented innovations to resolve longstanding 5G transmission challenges, such as the limited coverage of individual 5G base stations, low uplink bandwidth, and packet loss. Huawei has also launched a series of related products, such as 5G cameras, that can be applied across a wide range of industries, including intelligent harbors and manufacturing. Intelligent encoding and I-frame optimization, improving resource utilization of 5G base stations 5G networks feature limited uplink bandwidth, resulting in network congestion when I-frame bursts occur during video transmission. To resolve this problem, Huawei has proposed an region of interest (ROI)-based encoding technology to increase the compression ratio of image backgrounds. This helps reduce the average bit rate of video streams. Furthermore, the I-frame optimization technology helps reduce the bandwidth required for video transmission during peak hours, to prevent network congestion. After the optimization, the maximum number of cameras that can be connected to a single 5G base station has increased by two to three times, and 5G base station coverage has increased by two to three times as well, significantly improving the resource utilization of 5G base stations. User Datagram Protocol (UDP)-based reliable transmission, ensuring smooth, efficient video transmission To prevent packet loss and bit errors during wireless transmission, Huawei has adopted UDP and the dynamic optimization policy, to ensure smooth video transmission even when packet loss occurs. Packet loss rate within 10% Clear, smooth video Image encoding and transmission optimization technologies ensure smooth video transmission even when the packet loss rate reaches 10% Huawei 5G Camera Models M2281-10-QLI-W5 M6781-10-GZ40-W5 X7341-10-HMI-W5 Supports n78, n79, and n41 frequency bands and standalone (SA)/non-standalone (NSA) Flexible deployment hybrid networking Built-in integrated antenna, intelligent encoding and transmission optimization for Large-scale access 5G New Radio (NR), ensuring large-scale access of 5G cameras Professional-grade artificial intelligence (AI) chips and dedicated software-defined camera AI-powered innovation (SDC) operating system (OS), supporting a wide range of intelligent functions such as person analysis, crowd flow analysis, and vehicle analysis; support for long-tail algorithms 15
AI Image, Algorithm, and Storage Trends Led by AI 17 Discussion on Frontend Intelligence Trends 24 Discussion on Development Trends Among 28 Intelligent Video and Image Cloud Platforms Chip Evolution and Development 32 Algorithm Repository Technology 36 SuperColor Technology 42 Video Codec Technology 47 02 Storage EC Technology 52 Multi-Lens Synergy Technology 56 Products and Solutions Catalog 60
Ge Xinyu, Zhang Yingjun Image, Algorithm, and Storage Trends Led by AI Ge Xinyu, Zhang Yingjun 1. AI+Video Future Prospects The rapid development of AI is driving considerable growth within the global video analysis industry In recent years, the fast development of deep learning technology has driven the rapid growth of the overall video analysis industry. According to statistics, from 2018 to 2023, the compound annual growth rate (CAGR) of the video analysis product market is predicted to reach 37.1%. Additionally, the proportion of intelligent cameras powered by deep learning is expected to increase from 5% to 66%. Video analysis applications Proportion of intelligent cameras shipped with deep learning analytics and rules based analytics 100% S 0.38bn 90% S S 80% S 70% 2018 global 60% revenue 50% 40% % 37.1% 30% 20% 10% 2018-2023 CAGR 0% 2018 2019 2020 2021 2022 2023 66.4% 63.6% 42.9% 34.4% 26.1% 22.3% 2018 2019 2020 2021 2022 2023 Rules Based Deep Learning Based YOY revenue growth Data source: IHS MarKit 2019 AI has become a core enabler of digital transformation across industries As artificial intelligence (AI) technology matures and an intelligent society develops, AI is being used in a wide range of industries. Currently, the transportation industry is using AI+video to achieve the efficacy of traffic management. In the future, AI+video will gradually be embedded in more sectors, such as government, finance, energy, and education. Transport networks can use AI to: Recognize key people and vehicles, thereby improving traffic safety governance in urban areas; realize refined management of urban traffic and promote Transportation smooth traffic optimization based on precise data. Governments can use AI to: Improve their administrative efficiency by informatizing infrastructure; improve the intelligence of various application systems; enhance information awareness, analysis, and Government processing capabilities by analyzing massive video data. Banks can use AI to: Turn their focus from improving service efficiency to enhancing marketing, improving the intelligence of unstaffed bank branches, and accelerating the reconstruction of smart branches. Finance Energy companies can use AI to: Realize visualized exploration and development, and construct intelligent pipelines and gas stations. Energy Educational institutions can use AI to: Establish uniform systems across countries/regions; promote intelligent education; establish intelligent education demonstration areas; and drive Education education networking. 17
AI/Image, Algorithm, and Storage Trends Led by AI 2. To Achieve AI Development, an Image Quality Assessment Standard is Needed for Intelligent Cameras Why is it necessary to have an image quality assessment standard? The rapid development of AI in recent years has revolutionized the public safety industry. In the past, video needed to be watched by people, but now, machines also play an important role in viewing and analyzing video. However, the current technical standards do not reflect the true capabilities of today’s video surveillance technologies. Machines are capable of conducting a wide range of recognition tasks, including recognizing objects such as pedestrians, cyclists, and vehicles. To improve the recognition accuracy of AI algorithms, high-quality video is needed. ...... Pedestrians Cyclists Vehicles All-scenario and all-weather coverage: New intelligent applications pose higher requirements on full-color imaging in low light conditions, and this is now a trend within the industry. For example, person re-identification (ReID) requires cameras to accurately capture the color of the surroundings and the gait details of people. Against this backdrop, infrared multi-spectral light compensation technology has been proposed, which enables cameras to perform better in low light conditions, and do so in an environmental-friendly way. Re-I D ReID technology Full-color imaging in low light conditions AI and image enhancement technologies have developed rapidly. Technologies such as AI noise reduction use global and local optimization methods to improve image quality. They focus on optimizing image quality for targets such as license plates, which greatly enhances the accuracy of image recognition. However, the industry still lacks a complete and objective image assessment standard. The status quo of image quality assessment standards The current Chinese national standard GA/T 1127–2013 General technical requirements for cameras used in security video surveillance mainly lists requirements for camera network access and manual video viewing. According to the traditional assessment method, experienced workers grade images subjectively, but this method cannot be used in machine assessment. Now that AI is enabling image assessment to become increasingly objective, an objective image assessment standard needs to be formulated. 18
Ge Xinyu, Zhang Yingjun No Reference Metric (NORM) (2017 to now) Audiovisual HD Quality (AVHD) (2012 to now) GA/T 1356-2018 Specifications for compliance tests with national standard GB/T 25724-2017 GA/T 1127-2013 General technical requirements for cameras used in security video surveillance Recommendation ITU-R BT.500-13 (2012), Methodology for the subjective assessment of the quality of television pictures GB 50198-2011 Technical code for project of civil closed circuit monitoring television system Recommendation ITU-T J.341 (2011), Objective perceptual multimedia video quality measurement of HDTV for digital cable television in the presence of a full reference Recommendation ITU-T J.341 (2011), Objective multimedia video quality measurement of HDTV for digital cable television in the presence of a reduced reference signal 1997 1998 2000 2002 2003 2007 2009 2010 2011 2012 2013 2018 2019 HDTV Phase I (2010), Full References (FR) and Reduced Reference (RR) objective video quality models that predict the quality of high definition television QART (Quality Assessment for Recognition Tasks) (2010) RRNR-TV (2009), Reduced Reference (RR) and No References (NR) objective video quality models that predict the quality of standard definition television Recommendation ITU-R BT.500-12 (2009), Methodology for the subjective assessment of the quality of television pictures Recommendation ITU-R BT.1788 (2007), Methodology for the subjective assessment of video quality in multimedia applications FRTV Phase II (2003), Full References (FR) objective video quality models that predict the quality of standard definition television Recommendation ITU-R BT.500-11 (2002), Methodology for the subjective assessment of the quality of television pictures FRTV Phase I (2000), Full References (FR) objective video quality models that predict the quality of standard definition television GYT 134 (1998), The method for the subjective assessment of the quality of digital television picture Recommendation ITU-R BT.500-7 (1997), Methodology for the subjective assessment of the quality of television pictures Key issues relating to the formulation of a new standard There are five key issues to consider when developing an image quality assessment system for intelligent cameras. Objectivity of camera When humans judge imaging quality using their eyes, their assessment is subjective. An objective imaging quality quality assessment model would be based on existing full-reference, semi-reference, or assessment no-reference models within the industry. Consistency of The assessment result arrived at by intelligent vision must be consistent with the subjective assessment result and perception. This is a key factor that any standard system must promote and recognize. subjective perception Currently, the image quality indicators of cameras are mainly evaluated using test cards and Identity of assessment software or by manual judgment. This is different from the actual scenarios where these cameras scenario and real would be used, which involve moving objects like people and vehicles. In addition, infrared environment multi-spectral light compensation technology is widely used in actual scenarios. Therefore, the spectral characteristics of the target must be consistent. Concordance of Currently, the image quality indicators of cameras are tested separately, and the relationship and assessment indicators weight of indicators for different intelligent tasks are not considered. and actual effect Repeatability of Different assessors should get the same result regardless of time or place. assessment methods Thoughts and suggestions on the design of a standard system The assessment indicators should be associated with user scenarios and reflect practicability of the service. The assessment dimensions should include the user task type, user scenario type, and basic factor of image customer assessment. Score weighting should be decided based on each user task and scenario to calculate the overall score. 19
AI/Image, Algorithm, and Storage Trends Led by AI Indicator system for the image quality assessment for intelligent cameras Overall score Calculation 计算函数f(x) function f(x) Recognition task 1 Recognition task 2 Recognition task 3 Aggregate scores by user task weight Score Score Score ... ..... Calculation function f(x) 计算函数f(x) Daytime Nighttime Even illumination Light raking in the daytime in the daytime Low light at night Rain and fog Aggregate scores Score Score Score Score by user scenario weight Backlight in the daytime Low light with glare Rain and snow Score Score Score ... ..... Calculation 计算函数f(x) function f(x) Objective quality factors of a Objective quality factors single frame in the spatial domain: in the temporal domain: Basic image indicator factor Definition Color reproduction Stability Texture detail Color sensitivity Frame rate Noise Color saturation Contrast Exposure quality Geometric distortion 3. Service Development Requirements for AI Algorithms and Future Evolution Evolution from traditional single-object analysis to multi-object associative recognition The traditional single-object recognition method cannot accurately recognize or analyze occluded objects. Instead, multiple algorithms must be integrated to improve recognition efficiency, which has become a key service requirement and future direction for algorithm evolution. ... Person recognition Behavior recognition Gait recognition License plate recognition Multi-algorithm integration 20
Ge Xinyu, Zhang Yingjun Evolution from traditional service closed-loop in a single area to comprehensive security protection Social and transportation development facilitates provincial and national population mobility. Therefore, the traditional service, with a closed-loop in a single area, cannot meet the requirements of comprehensive security protection which is gradually developing towards cross-region intelligent management. Airports Railway/Subway stations Bus stations/Bus stops Pedestrian zones/Areas Comprehensive intelligence across all scenarios: Implement closed-loop video surveillance for key areas such as city's entrances, railway stations, subway stations, bus stations, airports, pedestrian zones, urban-rural intersections, street communities, and agricultural trade markets. Full awareness of people and vehicles within a residential community: Collect and update data for people and vehicles entering and leaving residential communities every day in real time; quickly, and accurately recognize objects. Multi-dimensional data collision and analysis: Align vast quantities of video and image data with multi-dimensional social data such as travel data, to better analyze people. 4. Storage Requirements of AI Development The status quo of video and image storage To improve recognition accuracy, AI algorithms pose higher requirements on the image quality of cameras (including definition and resolution). In smart cities and intelligent transportation systems, HD cameras are widely deployed, and this requires considerable storage space for video and images. As a result, storage duration and coverage areas increase, which can lead to a range of problems such as a limited equipment room footprint, high power consumption, and maintenance difficulties. In a medium-sized city Limited equipment room footprint 40+ cabinets; Video resolution Storage duration Coverage area line reconstruction Maintenance High power 4K 90 days All areas consumption difficulties Component/Node/ 440+ kW Site faults 1080p 30 days Key areas Customers' primary concern is how to improve storage space utilization and reduce equipment room footprint, storage deployment costs, power consumption, and total cost of ownership (TCO). 21
AI/Image, Algorithm, and Storage Trends Led by AI Future trends High-density storage: more storage media per unit Video compression: Deep video compression enables better utilization of storage space. For example, region of interest (ROI) compression technology separates and extracts ROIs from the background to reduce video bit rate and storage space without decreasing the ROI detection rate. Pixel-level image segmentation Motor Motor 机动车 vehicle 机动车 vehicle Bit rate before Bit rate after compression: 2642 kbit/s compression: 551 kbit/s In smart cities and intelligent transportation systems, video streams are mainly used to conduct AI analysis of people and vehicles. A balance needs to be struck between lowering storage costs and ensuring the accuracy of this analysis. 5. Trends The core objective of AI is to turn the physical world into metadata for analysis. However, in actual applications, a single piece of metadata is generally useless. This requires frontend devices to go from uni-dimensional data collection to multi-dimensional data awareness, and backend platforms to evolve from relying on image intelligence to data intelligence. In this way, data can be fully associated and utilized for analysis and prediction. Frontend devices: from uni-dimensional data collection to multi-dimensional data awareness Department A Department B Department C Aggregated data lake Diversified awareness dimensions and integrated device form Person Phone Accommodation Vehicle Relationship Travel Multi-dimensional data awareness Siloed systems where data is isolated (+time/space/multi-modal) where data has converged 22
Ge Xinyu, Zhang Yingjun Backend platform: from image intelligence to data intelligence Internet of things (IoT) data Internet data ...... ...... Image intelligence: unforeseeable Data intelligence: foreseeable 23
AI/Discussion on Frontend Intelligence Trends Discussion on Frontend Intelligence Trends Xu Tongjing The aim of artificial intelligence (AI) is to train computers to see, hear, and read like human beings. Current AI technologies are mainly used to recognize images, speech, and text. Renowned experimental psychologist D. G. Treichler proposes that 83% of the information we obtain from the world around us is through our vision. Therefore, over 50% of AI applications nowadays are related to intelligent vision, and around 65% of industry 83% digitalization information comes from intelligent vision. In 11% addition, to bridge the physical and digital worlds, all things must 3.5% be sensing. The type, quantity, and quality of data collected by 1% frontend sensing devices determine the intelligence level. 1.5% 1. Five Advantages of Frontend Intelligence Superior imaging quality with ultimate computing power Intelligent cameras, as sensing devices in the intelligent vision sector, were introduced around five years ago. Different from traditional IP cameras (IPCs), intelligent cameras can adapt to challenging environments and collect video data of a higher quality. However, due to immature algorithms and chips, intelligent cameras cannot provide sharp, HD-quality images in harsh weather conditions such as during rain, sandstorms, and on overcast days. In addition, factors such as poor installation angle, occlusion, low light, and low resolution may also lead to inaccurate object recognition. If the imaging quality cannot be guaranteed, intelligence will remain an unachievable mirage. Intelligent image quality adjustment With AI algorithms, intelligent cameras can automatically adjust image signal processing (ISP) parameters such as shutter speed, aperture, and exposure according to the ambient lighting and object speed, deliver optimal images for further detection and recognition, and associate face images with personal data. 24
Xu Tongjing Applicable to varied scenarios Intelligent vision systems are increasingly expected to satisfy the needs of various industries for various intelligent applications at various times and in various scenarios. For example, cameras must be able to detect vehicle queue length and accidents in the daytime and detect parking violations at night or load different algorithms at different preset positions. Thanks to frontend intelligence, customers can load their desired algorithms on intelligent cameras to satisfy their personalized or scenario-specific requirements. This also helps reduce risk exposure in the delivery of diversified algorithms. In addition, lightweight container technology is used to construct an integrated multi-algorithm framework. This enables each algorithm to operate independently, ensuring service continuity during algorithm upgrade and switchover. Customers can also flexibly choose their desired intelligent capabilities to adapt to specific application scenarios. Radar Radar Vehicle Intelligent feature Intelligent extraction camera camera Vehicle capture Gantry Gantry Optimal computing efficiency Video plays an essential role in some key industries such as social governance and transportation. However, the traditional Computing video surveillance market tends to be saturated and cannot efficiency 100% satisfy digital transformation across industries. Thanks to ultimate computing power, a lot of intelligent applications are now possible. Compared with backend intelligence, frontend intelligence improves computing efficiency by 30% to 60%. With frontend intelligence, each camera processes only one video channel at the frontend, which poses lower requirements on computing power, and directly obtains raw data for analysis, further reducing computational requirements and enhancing processing efficiency. Frontend intelligence also enables cameras to deliver high-quality images to the backend, so the backend platform can 0 focus on intelligent analysis while focusing less on secondary image Backend intelligence Frontend intelligence decoding. With the same computing power, image analysis is roughly 10 times more efficient than video analysis. Moving intelligence to the frontend can maximize the value of intelligent applications for customers with limited resources. System linkage within milliseconds In many industries, such as transportation and emergency response, fast response and closed-loop management are the basic and also the Intelligent camera most critical requirements of services. Frontend intelligence enables cameras to analyze video in real time and to immediately link related Millimeter-wave radar service systems upon detecting objects that trigger behavior analysis rules, in locations such as airports and high-speed rail stations. In road traffic scenarios, cameras need to link external devices such as illuminators, radar detectors, and traffic signal detectors within milliseconds. For example, cameras need to work with illuminators to provide enhanced lighting for specific areas at the right moment or periodically synchronize with traffic signal detectors to accurately detect Collision Motor vehicles, traffic incidents. In other linkage scenarios, for example, linkage warning upon non-motorized vehicles, between radar detectors and PTZ dome cameras or between barrier lane change and pedestrians appear simultaneously gates/swing gates and cameras, frontend intelligence can dramatically improve the system response efficiency and ensure quick service closure. 25
AI/Discussion on Frontend Intelligence Trends Improved engineering efficiency To apply intelligent applications on a large scale, engineering issues must be considered. A top concern for engineering vendors is upgrading and reconstructing the live network using existing investments and at the lowest cost. The prevalence of intelligent cameras (including common cameras with inclusive AI computing power), where intelligent algorithms can be dynamically loaded, can dramatically improve the frontend data collection quality, enhance the intelligent analysis efficiency by 10-fold and intelligent application availability by several-fold, and lower the total cost of ownership (TCO) by over 50%. Intelligent analysis efficiency Intelligent application availability TCO reduced by over 50% improved by 10-fold improved by several-fold 100% 100% 100% 0 0 0 Backend Frontend Backend Frontend Backend Frontend intelligence intelligence intelligence intelligence intelligence intelligence In addition, frontend intelligence enables a camera to run multiple algorithms concurrently. For example, an intelligent camera can simultaneously load multiple algorithms such as traffic violation detection, vehicle capture and recognition, and traffic flow statistics, while multiple devices were required to support these functions in the past. This sharply lowers the engineering implementation difficulty and improves the engineering efficiency. 2. Key Factors for Implementing Frontend Intelligence In terms of product technologies, intelligent cameras must be equipped with AI main control chips and intelligent operating systems to implement frontend intelligence. The most basic functionality of a camera is to shoot HD video around the clock, and HD and sharp images are the most basic requirements for computer vision. Computing power is required to optimize images to improve the intelligent recognition rate. In scenarios where intelligent services require high real-time performance, ultimate computing power is required to meet real-time data awareness, computing, and response requirements. 26
Xu Tongjing Computing power is the foundation of intelligent capabilities, while professional AI chips give a huge boost to computing power. Accelerated by dedicated hardware, these AI chips support tera-scale computing and visual processing based on deep learning on a neural network. To support frontend intelligence, cameras must be equipped with professional AI chips. Customers require cameras with different hardware forms and software with different capabilities depending on the usage scenario. Currently, most cameras are designed for specific scenarios, but their software and hardware are closely coupled. If software can be decoupled from hardware, users can install desired algorithms on cameras just like installing apps on smartphones. This maximizes the value of hardware, saves overall costs, and improves user experience. To decouple software from hardware, an open and intelligent operating system is required. With the intelligent operating system, differences between bottom-layer hardware are no longer obstacles. After the computing and orchestration capabilities of bottom-layer hardware devices are invoked, they are uniformly encapsulated by the operating system. This significantly simplifies development and allows developers to focus solely on the software's functional capabilities. In addition, the lightweight container is used to construct an Intelligent operating system integrated multi-algorithm framework, where each algorithm runs independently in a virtual space, allowing independent loading and online upgrading. In summary, an intelligent camera operating system is the basis of frontend intelligence. From the perspective of application ecosystems, frontend intelligence requires a future-proof algorithm and hardware ecosystem to boost industry digital transformation. In the mobile Internet sector, the app market provides an overwhelming number of apps. Users can download and install desired apps on their smartphones. In the intelligent video sector, the burning question is: How can we aggregate excellent ecosystem partners to provide superior algorithms and applications to meet customers' fragmented and long-tail requirements? To address this issue, the intelligent algorithm platform was developed, which aggregates ecosystem partners in the intelligent vision sector to provide intelligent video/image applications for a range of industries. The platform protects developers' rights and interests through license files and verification mechanisms and also allows users to easily choose from a range of reliable intelligent algorithms. In addition, intelligent cameras can connect to a range of hardware sensors in wired or wireless mode to help build a multi-dimensional awareness ecosystem. With a rich ecosystem, a large number of long-tail algorithms dedicated to specific industries can be quickly released to meet the requirements of various scenarios. The industry has reached a consensus on frontend intelligence and related standards. Mainstream vendors and users in the industry are actively embracing frontend intelligence. Vendors in the industry have launched products such as software-defined cameras and scenario-specific intelligent cameras. The industry ecosystem is thriving. Intelligent awareness can help collect multi-dimensional data, dramatically improve the data collection quality, and unleash the value of mass video data while reducing computing power required for backend data processing and the overall TCO. In addition, distributed processing significantly improves system reliability. 27
You can also read