cv :: parallel_for_ не очень большое улучшение

Question

cv :: parallel_for_ не очень большое улучшение

Я тестирую класс cv::ParallelLoopBody для обработки кода изображения.

Сначала я начал реализовывать нормализацию, где я должен разделить все пиксели с определенными значениями для каждого канала, что является простым приятным распараллеленным кодом.

Однако при тестировании я не вижу разницы.

Я что-то здесь не так делаю?

Это мой класс:

class Parallel_process : public cv::ParallelLoopBody
{

private:
cv::Mat img; //my image to normalize
std::vector<int> A;
int diff;

public:
Parallel_process(cv::Mat inputImage, std::vector<int> AA, int diffVal)
: img(inputImage), A(AA), diff(diffVal){}

virtual void operator()(const cv::Range& range) const
{
for(int i = range.start; i < range.end; i++)
{
//in is a patch of my original image
cv::Mat in(img, cv::Rect(0, (img.rows/diff)*i, img.cols, img.rows/diff));
std::vector<int> AAA (A);
in.forEach<cv::Vec3f>
(
[&AAA](cv::Vec3f &pixel, const int* po) -> void
{
pixel[0]/=AAA[0];
pixel[1]/=AAA[1];
pixel[2]/=AAA[2];
}
);
}
}
};

И в main() Функция, которую я называю своим оператором, выглядит так:

cv::parallel_for_(cv::Range(0, 91), Parallel_process(img, AA, 91)); //my image is 1288*728 size so 728/91=8

РЕДАКТИРОВАТЬ

Это моя конфигурация OpenCV:

General configuration for OpenCV 3.3.1 =====================================
Version control:               unknown

Extra modules:
Location (extra):            /home/jrsros/opencv_contrib-3.3.1/modules
Version control (extra):     unknown

Platform:
Timestamp:                   2017-12-14T13:05:47Z
Host:                        Linux 4.10.0-40-generic x86_64
CMake:                       3.5.1
CMake generator:             Unix Makefiles
CMake build tool:            /usr/bin/make
Configuration:               Release

CPU/HW features:
Baseline:                    SSE SSE2 SSE3
requested:                 SSE3
Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2
requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2
SSE4_1 (3 files):          + SSSE3 SSE4_1
SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (5 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (8 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

C/C++:
Built as dynamic libs?:      YES
C++11:                       YES
C++ Compiler:                /usr/bin/c++  (ver 7.2.0)
C++ flags (Release):         -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffast-math -ffunction-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -O3 -DNDEBUG  -DNDEBUG
C++ flags (Debug):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -Wno-implicit-fallthrough -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffast-math -ffunction-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -fopenmp -g  -O0 -DDEBUG -D_DEBUG
C Compiler:                  /usr/bin/gcc-5
C flags (Release):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffast-math -ffunction-sections  -msse -msse2 -msse3 -fvisibility=hidden -fopenmp -O3 -DNDEBUG  -DNDEBUG
C flags (Debug):             -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffast-math -ffunction-sections  -msse -msse2 -msse3 -fvisibility=hidden -fopenmp -g  -O0 -DDEBUG -D_DEBUG
Linker flags (Release):
Linker flags (Debug):
ccache:                      NO
Precompiled headers:         NO
Extra dependencies:          dl m pthread rt /usr/lib/x86_64-linux-gnu/libGLU.so /usr/lib/x86_64-linux-gnu/libGL.so /usr/lib/x86_64-linux-gnu/libtbb.so cudart nppc nppial nppicc nppicom nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cufft -L/usr/local/cuda-8.0/lib64
3rdparty dependencies:

OpenCV modules:
To be built:                 cudev core cudaarithm flann hdf imgproc ml objdetect phase_unwrapping plot reg surface_matching video viz xphoto bgsegm cudabgsegm cudafilters cudaimgproc cudawarping dnn face freetype fuzzy img_hash imgcodecs photo shape videoio xobjdetect cudacodec highgui bioinspired dpm features2d line_descriptor saliency text calib3d ccalib cudafeatures2d cudalegacy cudaobjdetect cudaoptflow cudastereo datasets rgbd stereo structured_light superres tracking videostab xfeatures2d ximgproc aruco optflow stitching python2
Disabled:                    js world contrib_world
Disabled by dependency:      -
Unavailable:                 java python3 ts cnn_3dobj cvv dnn_modern matlab sfm

GUI:
QT:                          NO
GTK+ 2.x:                    YES (ver 2.24.30)
GThread :                    YES (ver 2.48.2)
GtkGlExt:                    YES (ver 1.2.0)
OpenGL support:              YES (/usr/lib/x86_64-linux-gnu/libGLU.so /usr/lib/x86_64-linux-gnu/libGL.so)
VTK support:                 YES (ver 6.2.0)

Media I/O:
ZLib:                        /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.8)
JPEG:                        /usr/lib/x86_64-linux-gnu/libjpeg.so (ver )
WEBP:                        /usr/lib/x86_64-linux-gnu/libwebp.so (ver encoder: 0x0202)
PNG:                         /usr/lib/x86_64-linux-gnu/libpng.so (ver 1.2.54)
TIFF:                        /usr/lib/x86_64-linux-gnu/libtiff.so (ver 42 - 4.0.6)
JPEG 2000:                   /usr/lib/x86_64-linux-gnu/libjasper.so (ver 1.900.1)
OpenEXR:                     build (ver 1.7.1)
GDAL:                        NO
GDCM:                        NO

Video I/O:
DC1394 1.x:                  NO
DC1394 2.x:                  NO
FFMPEG:                      YES
avcodec:                   YES (ver 56.60.100)
avformat:                  YES (ver 56.40.101)
avutil:                    YES (ver 54.31.100)
swscale:                   YES (ver 3.1.101)
avresample:                NO
GStreamer:
base:                      YES (ver 1.8.3)
video:                     YES (ver 1.8.3)
app:                       YES (ver 1.8.3)
riff:                      YES (ver 1.8.3)
pbutils:                   YES (ver 1.8.3)
OpenNI:                      NO
OpenNI PrimeSensor Modules:  NO
OpenNI2:                     NO
PvAPI:                       NO
GigEVisionSDK:               NO
Aravis SDK:                  NO
UniCap:                      NO
UniCap ucil:                 NO
V4L/V4L2:                    NO/YES
XIMEA:                       NO
Xine:                        NO
Intel Media SDK:             NO
gPhoto2:                     NO

Parallel framework:            TBB (ver 4.4 interface 9002)

Trace:                         YES (with Intel ITT)

Other third-party libraries:
Use Intel IPP:               2017.0.3 [2017.0.3]
at:               /home/jrsros/opencv-3.3.1/build/3rdparty/ippicv/ippicv_lnx
Use Intel IPP IW:            sources (2017.0.3)
at:            /home/jrsros/opencv-3.3.1/build/3rdparty/ippicv/ippiw_lnx
Use VA:                      NO
Use Intel VA-API/OpenCL:     NO
Use Lapack:                  NO
Use Eigen:                   YES (ver 3.2.92)
Use Cuda:                    YES (ver 8.0)
Use OpenCL:                  YES
Use OpenVX:                  NO
Use custom HAL:              NO

NVIDIA CUDA
Use CUFFT:                   YES
Use CUBLAS:                  YES
USE NVCUVID:                 NO
NVIDIA GPU arch:             20 30 35 37 50 52 60 61
NVIDIA PTX archs:
Use fast math:               YES

OpenCL:                        <Dynamic loading of OpenCL library>
Include path:                /home/jrsros/opencv-3.3.1/3rdparty/include/opencl/1.2
Use AMDFFT:                  NO
Use AMDBLAS:                 NO

Python 2:
Interpreter:                 /usr/bin/python2.7 (ver 2.7.12)
Libraries:                   /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.12)
numpy:                       /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.11.0)
packages path:               lib/python2.7/dist-packages

Python 3:
Interpreter:                 /usr/bin/python3 (ver 3.5.2)

Python (for build):            /usr/bin/python2.7

Java:
ant:                         NO
JNI:                         NO
Java wrappers:               NO
Java tests:                  NO

Matlab:
mex:                         /usr/local/MATLAB/R2017b/bin/mex
Compiler/generator:          Not working (bindings will not be generated)

Documentation:
Doxygen:                     NO

Tests and samples:
Tests:                       NO
Performance tests:           NO
C/C++ Examples:              NO

Install path:                  /usr/local

cvconfig.h is in:              /home/jrsros/opencv-3.3.1/build
-----------------------------------------------------------------

Пример кода, который я использую для проверки моего параллелизма:

#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/core/utility.hpp>

#include <iostream>
#include <sys/time.h>

class Parallel_process : public cv::ParallelLoopBody
{

private:
cv::Mat img;
std::vector<int> A;
int diff;

public:
Parallel_process(cv::Mat inputImgage, std::vector<int> AA, int diffVal)
: img(inputImgage), A(AA), diff(diffVal){}

virtual void operator()(const cv::Range& range) const
{
for(int i = range.start; i < range.end; i++)
{

cv::Mat in(img, cv::Rect(0, (img.rows/diff)*i, img.cols, img.rows/diff));
std::vector<int> AAA (A);
in.forEach<cv::Vec3f>
(
[&AAA](cv::Vec3f &pixel, const int* po) -> void
{
pixel[0]/=AAA[0];
pixel[1]/=AAA[1];
pixel[2]/=AAA[2];
}
);
}
}
};

int main(int argc, char* argv[])
{

cv::Mat src=cv::imread(argv[1]);

std::vector<std::vector<int> > AA;

//compute AA here

double timeStart=tic(), timeEnd=0;

//Normalize the image /AA
cv::parallel_for_(cv::Range(0, 91), Parallel_process(src, AA, 91));

timeEnd = tic() - timeStart;
std::cout << "ALL " <<1/timeEnd << std::endl<< std::endl; //FPS

return 0;
}

Я компилирую свой код в Linux Ubuntu Core i7-2630QM 2Ghz * 8 потоков (4 ядра):

g++ -std=c++1z -Wall -Weffc++ -Ofast -march=native test4.cpp -o test4 `pkg-config --cflags --libs opencv`

EDIT2
В htop Я вижу, что он использует все потоки в конце

1 2 3 4 5 6 7 8
Variant 1
threads,mean(us),min(us),max(us)
1,18798.2,18106,22780
2,9762.84,9427,10397
3,6813.55,6765,9296
4,5317.75,5088,7433
5,5067.11,4931,7552
6,4925.41,4780,9473
7,4797.74,4641,9492
8,4798.18,4504,27244

1 2 3 4 5 6 7 8
Variant 2
threads,mean(us),min(us),max(us)
1,18512.8,17780,20084
2,9788.73,9302,11338
3,6850.47,6671,9765
4,5209.64,5022,8831
5,5052.46,4881,7041
6,6851.99,4762,11422
7,5077.32,4624,9886

3

c++lambda opencv parallel-processing parallelism-amdahl

Решение

Задача ещё не решена.

Другие решения

Других решений пока нет …

Источник