Got really stuck recently, fortunately I finally found the bug.
I wrote a small GPU version bilinear interpolation image scaling Kernel but got some weird results, the output image was randomly shift right/left a few pixel for no reason.
At beginning, I thought the mistakes happened in Kernels. Then I notice it might be some “data type conversion& data type accuracy” based on some “seems to be correct” results, but I was on the wrong direction.
The problem actually happens with OpenCV data matrix aligned problem. Previously, it ran “looks perfect” on other kernels because input and output are both falsely aligned.Anyway….
The performance seems to be good so far. My scaling kernel (running on my low-end nvs 3100m) took 0.28 milliseconds while cvResize from OpenCV took 6.7 milliseconds on single operation . That’s 20x speedup
Though memory transfer latency is not taken into account and cvResize is higher level of processing function and its slow (see performance comparison on read from CFILE , c++ fileread API, FILE from C ) , there are many better GPU will have much stronger performance.
Gradient and Bins