It is essential to adopt machine vision for industrial automation applications. The Renesas RZ/A2M provides ultra-high graphics and image processing capabilities with integrated Renesas’ unique DRP (Dynamically Reconfigurable Processor) technology. DRP can offer flexibility as well as excellent image processing capabilities by providing hardware-level computing capabilities and ability to modify the computing logic at real time from the software level as the same time. In this article, we will focus on the advantages of RZ/A2M in industrial automation applications, including higher image processing capabilities and ultra-low power consumption.
The reason why the robotic arm in the video can recognize and grasp the target object so flexibly is because the unique DRP module integrated in the RZ/A2M accelerates the image processing process of machine vision. We can see on the external monitor that the entire image processing process (including bayer to RGB, shadow correction & white balance, RGB to binary image, contour search, image noise reduction, Bayer to grayscale sending display, etc.) only takes less than 3ms and achieved the excellent performance of 60fps under VGA (640x480) resolution.
What is DRP?
DRP stands for Dynamically Reconfigurable Processer. It is a unique architecture developed by Renesas, which can dynamically adjust the logic circuit of the hardware arithmetic unit to realize various arithmetic functions.
DRP has 6 independent units called "Tiles". They can load multiple configuration data (that is, algorithm libraries) and execute them in parallel. These algorithm libraries are stored in the system memory, and the CPU issues instructions and load into each Tile when needed.
The algorithm library in each Tile can be modified at any time, and the operation of other Tiles will not be affected during the modification process.
With dynamic loading, DRP offers high-speed processing to different image processing algorithms that applications require with minimal hardware resources.
Why DRP can offer excellent performance?
DRP is a type of hardware resource which implement all computing logic at the hardware level. Each Tile is an independent computing unit which has limited hardware resources, so it comes to a need that two or more Tiles to work together when algorithm libraries use more hardware resources. The following are the hardware resources owned by DRP.
During the operation of DRP, it will automatically integrate hardware resources according to the complexity of the algorithm library, such as combining two 16-bit multipliers into one 32-bit multiplier in a Tile; or A 16-bit multiplier is used in combination with a counter to further expand the computing power of DRP.
We all know the computing power of hardware very well. For example, the computing power of FPGA, which is widely used at present, is not at the same order of magnitude as that of CPU. However, the limitation of FPGA is also obvious. The computing scale is directly linked to the number of gate circuits. Once the number of gates required by the algorithm exceeds the FPGA that was selected at the beginning of the project, the FPGA must be replaced with a larger-scale version, which is very inconvenient.
At this case, the flexibility of DRP allows not only switching among different libraries, but also dynamically adjusting the logic among arithmetic circuits within the same library and different clock cycles. This allows for a variety of calculation methods. Through this time-sharing multiplexing method, DRP maximizes computing performance and provides infinite possibilities by small footprint.
Dynamic Reconfiguration can modify the combination of arithmetic circuits by each clock, while Dynamic Loading can reload the entire new algorithm library within 1ms
DRP can even run the same algorithm library through multiple Tiles to increase processing speed. For example, the performance can be directly improved by 6 times on the original basis only by dividing a picture into 6 equal parts and handed over to 6 Tiles for image processing!
In general, performance improvements are often accompanied by an increase in power consumption. However, DRP provides a unique approach to improve image processing capabilities which allows much lower energy consumption than CPU.
How to use DRP?
I have introduced many advantages of DRP so far and you may have concerns if DRP is hard to get started with. However, Renesas provides you with a complete solution to that allows you to jump into development with ease. At present, we have developed over 50 algorithm libraries to use. Most of the algorithm libraries have similar function and interface as the OpenCV library which allows developers using DRP in an ordinary project conveniently.
Switch between CV Library and DRP Library
Let us take the Bayer to RGB library as an example to see what needs to be done when using DRP.
First let us consider the function interface. The parameters that need to be provided include input / output address, image width and height, and whether tinning is required (the image size can be compressed at the same time during the conversion process).
Inside the function, you need to first load the DRP library into the DRP hardware which is the compiled binary format DRP library is stored in the array g_drp_lib_bayer_binning2rgb. To use 6 Tiles to process a picture in parallel, you need load this library into all 6 Tiles.
Next, pass the calculation parameters to each Tile and then start it when loading is complete. Since one complete picture is cut into 6 sections, each Tile is only responsible for 1/6 of a picture, so here you need to calculate the starting position and output position of each Tile separately. After the calculation is completed, the Start command is issued to start the Tile operation.
Finally, wait for computing to complete in all the 6 Tiles.
After the robot arm is powered on and initialized, the target is initially not found, so it transitions to object detection mode. The workflow of this mode is as follows:
We can see that after the camera collects a frame of image,
- A Bayer to RGB conversion is done by DRP. This conversion also compresses the width and height of the image to 1/4 of the original, which can speed up the subsequent process without losing accuracy; this library only occupies 1 Tile, so you can split the picture into 6 at the same time, which takes 0.4ms.
- Load shadow correction & white balance library in DRP to correct the results of the previous step. This library uses more resources and occupies two Tiles, so the image needs to be split into 3 operations which takes 0.8ms.
- Convert the RGB image to HSV image and extract the value of V. It is convenient for subsequent moving object detection and contour extraction, which takes 0.2ms.
- Use a "weighted moving average method" to extract moving objects. This algorithm takes 0.6ms.
- Find the contour and center point according from the object information obtained in the previous step. (Done by CPU)
- If the target is found, it will switch to the object tracking process, otherwise repeat the process.
- Some other display related processing procedures.
If the target is found in the above process, it will enter the process of object tracking. The workflow of the object tracking mode is as follows:
- We can see the processing methods of the first two steps are the same, the Bayer to RGB + shadow correction & white balance takes in total of 1.2ms.
- Since it has been determined that there are objects in the screen under this mode, the image is directly binarized, and the RGB to Binary library in DRP is called for conversion, which takes 0.8ms.
- The CPU separately calculates the deviation values of the coordinates, the angle, and the center of the screen, and adjusts the control amount of each motor according to this value, which takes 0.7ms.
- Since the image size was compressed in the previous sequence, the coordinates calculated in the previous step are not actual coordinates, and a coordinate conversion is required.
- A library of Bayer to grayscale graphs for display and other display-related processes are used, takes 0.3ms.
At present, over 50 libraries we provide can achieve performance improvement ranging from 2 to 80 times compared with the Cortex-A9 with 528MHz given in the RZ/A2M. The performance improvement of the DRP library is generally between 10 and 20 times. The reasons for the insignificant improvement effect of libraries below 10 times is mainly because algorithm itself is too simple and too small rooms for optimization. We listed some of the existing libraries and comparison of computing power in below table for your reference.
Of course, we can also develop new libraries based on customer demands and needs.
Or, if a customer needs to implement their own confidential algorithms using DRP, we can also provide relevant training on DRP library development and teach you how to develop DRP library.