Accelerator type DRP(STP:Stream Transpose®) accelerate the processing, when CPU processing capability is lack.
Data transfer can perform effectively as it equipped DMA controller directly connected to DRP core. As it is possible for renewal of the firmware to define the processing at a moment, so that it can equip approximately infinite functions in the system. As IP core, correspond with 16 nm, 28 nm, a 40 nm process now.
The structure of the high performance pursued accelerator type DRP
Accelerator type DRP adopts the programmable operation device array +DMA controller structure.
Data transfer (DMA) is isolated from the processing (programmable operation device array), and made it exclusive for data transfer. Thus, improve the performance and the area efficiency, in this way, reduce CPU load, and realize whole system performance improvement.
The programmable operation device array is composed with operation device and memory, and the layout is that memory and multiplier surrounding 8 or 16 bits operation device (ALU) array. In accelerator type DRP, realize higher performance by carrying out the processing in parallel using many operation devices and memories.
Efficient Data Transfer
Data transfer between memories are efficiently performed by the function strengthened DMA. By running DMA and programmable operation device array in the same time, reduce the data transfer overhead.
The verification model of accelerator type DRP is incorporated in the design tool. Access instruction issue for DMA and algorithm executed in programmable operation device array, can be described as same C program, such that it is easy to optimize.
And, the function allotment between accelerator type DRP and CPU is changeable by the software, then the flexible system to meet required specification can be constructed.
The Hardware Switching Capability of the DRP
1. Dynamic Reconfiguration
Maximum 64 pcs (in case of 40 nm core for ASIC) data path (DP) information can be stored. Dynamically reconfigure and change the circuit implemented in limited area of programmable operation device array by time sharing, then expand the effective logic area. The processing are executed by switching DP in 0 clock (less than 1 ns).
2. Dynamic Loading
During executing, read the different firmware additionally from an external memory, then change it to the hardware corresponding to the totally different functions. Execute by time sharing the huge application which can not full loading to the chip. The changing time is around several hundred μs (in the case of a 40 nm core)