Я написал некоторый код C ++, который структурирован следующим образом:
double kernel(params)
{
//code
}
void optimize(params)
{
//some code
double x = kernel();
//some more code
}
int main()
{
//some code
optimize();
//some more code
}
Я попытался профилировать его с помощью callgrind, используя следующие команды:
g++ -O3 -g sgd.cpp
valgrind --tool=callgrind ./a.out commandline_args
callgrind_annotate callgrind.out.XXXX
Я получаю следующий вывод:
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
12,916,968,785 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
5,862,783,191 /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/memcpy-ssse3.S:__memmove_ssse3 [/lib/i386-linux-gnu/libc-2.15.so]
2,847,653,393 /build/buildd/eglibc-2.15/malloc/malloc.c:_int_malloc [/lib/i386-linux-gnu/libc-2.15.so]
1,327,109,692 /build/buildd/eglibc-2.15/malloc/malloc.c:_int_free [/lib/i386-linux-gnu/libc-2.15.so]
847,560,182 sgd.cpp:main [a.out]
503,022,767 /build/buildd/eglibc-2.15/malloc/malloc.c:malloc [/lib/i386-linux-gnu/libc-2.15.so]
235,458,068 /build/buildd/eglibc-2.15/malloc/malloc.c:free [/lib/i386-linux-gnu/libc-2.15.so]
213,580,120 /build/buildd/eglibc-2.15/math/../sysdeps/i386/fpu/e_exp.S:__ieee754_exp [/lib/i386-linux-gnu/libm-2.15.so]
203,349,602 ???:operator new(unsigned int) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
192,222,108 /build/buildd/eglibc-2.15/math/../sysdeps/ieee754/dbl-64/w_exp.c:exp [/lib/i386-linux-gnu/libm-2.15.so]
128,438,068 /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/strcat.S:0x0012ac73 [/lib/i386-linux-gnu/libc-2.15.so]
128,431,176 ???:operator delete(void*) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
128,358,564 /usr/include/c++/4.6/ext/new_allocator.h:main
117,645,255 /usr/include/c++/4.6/bits/stl_vector.h:main
112,167,083 /usr/include/c++/4.6/bits/stl_algobase.h:main
За исключением main (), он не показывает, какие части исходного кода занимают большую часть времени. Я точно знаю, что большую часть времени тратится на функцию optimize (), и, в свою очередь, значительную часть этого времени на функцию kernel (), но я не вижу этого в выводе.
Как мне узнать подробности, чтобы ускорить мой код?
Если это поможет, я широко использую std :: vectors в коде. Некоторое время назад я реализовал подобный код, используя массивы, и тогда казалось, что callgrind работает нормально. Может ли это быть проблемой?
Если я отключу флаг O3, я получу следующий вывод:
--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
19,026,610,083 PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
5,233,252,577 /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/memcpy-ssse3.S:__memmove_ssse3 [/lib/i386-linux-gnu/libc-2.15.so]
2,542,000,057 /build/buildd/eglibc-2.15/malloc/malloc.c:_int_malloc [/lib/i386-linux-gnu/libc-2.15.so]
1,184,626,252 /build/buildd/eglibc-2.15/malloc/malloc.c:_int_free [/lib/i386-linux-gnu/libc-2.15.so]
983,472,430 sgd.cpp:optimize(std::vector<double, std::allocator<double> >, std::vector<int, std::allocator<int> >, std::vector<double, std::allocator<double> >) [a.out]
781,018,740 ???:std::vector<double, std::allocator<double> >::operator[](unsigned int) [a.out]
772,117,839 sgd.cpp:kernel(std::vector<double, std::allocator<double> >, int, int, double) [a.out]
476,616,742 ???:std::vector<double, std::allocator<double> >::vector(std::vector<double, std::allocator<double> > const&) [a.out]
449,016,969 /build/buildd/eglibc-2.15/malloc/malloc.c:malloc [/lib/i386-linux-gnu/libc-2.15.so]
324,200,916 ???:std::vector<double, std::allocator<double> >::size() const [a.out]
305,705,504 ???:std::_Vector_base<double, std::allocator<double> >::_Vector_base(unsigned int, std::allocator<double> const&) [a.out]
267,492,204 ???:std::_Vector_base<double, std::allocator<double> >::~_Vector_base() [a.out]
238,309,873 /usr/include/c++/4.6/bits/stl_algobase.h:double* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<double>(double const*, double const*, double*) [a.out]
238,308,370 /usr/include/c++/4.6/bits/stl_algobase.h:double* std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
228,776,040 /usr/include/c++/4.6/bits/stl_algobase.h:std::_Miter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >::iterator_type std::__miter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
228,776,038 /usr/include/c++/4.6/bits/stl_algobase.h:double* std::copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
210,178,748 /build/buildd/eglibc-2.15/malloc/malloc.c:free [/lib/i386-linux-gnu/libc-2.15.so]
210,172,446 ???:std::vector<double, std::allocator<double> >::~vector() [a.out]
209,711,018 sgd.cpp:square(double) [a.out]
190,646,380 /build/buildd/eglibc-2.15/math/../sysdeps/i386/fpu/e_exp.S:__ieee754_exp [/lib/i386-linux-gnu/libm-2.15.so]
181,517,469 ???:operator new(unsigned int) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
171,582,030 /usr/include/c++/4.6/bits/stl_iterator_base_types.h:std::_Iter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, true>::_S_base(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
171,581,742 /build/buildd/eglibc-2.15/math/../sysdeps/ieee754/dbl-64/w_exp.c:exp [/lib/i386-linux-gnu/libm-2.15.so]
152,853,344 ???:__gnu_cxx::new_allocator<double>::allocate(unsigned int, void const*) [a.out]
152,852,752 ???:std::_Vector_base<double, std::allocator<double> >::_Vector_impl::_Vector_impl(std::allocator<double> const&) [a.out]
152,517,360 /usr/include/c++/4.6/bits/stl_algobase.h:std::_Niter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >::iterator_type std::__niter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > > >(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
152,517,360 /usr/include/c++/4.6/bits/stl_iterator_base_types.h:std::_Iter_base<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, false>::_S_base(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >) [a.out]
152,517,360 /usr/include/c++/4.6/bits/stl_iterator.h:__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >::__normal_iterator(double const* const&) [a.out]
133,746,571 ???:std::_Vector_base<double, std::allocator<double> >::_M_deallocate(double*, unsigned int) [a.out]
133,452,690 ???:std::vector<double, std::allocator<double> >::end() const [a.out]
133,452,690 ???:std::vector<double, std::allocator<double> >::begin() const [a.out]
131,134,604 sgd.cpp:sign(double) [a.out]
123,920,353 /usr/include/c++/4.6/bits/stl_algobase.h:double* std::__copy_move_a<false, double const*, double*>(double const*, double const*, double*) [a.out]
121,192,848 ???:std::vector<int, std::allocator<int> >::operator[](unsigned int) [a.out]
114,649,360 /build/buildd/eglibc-2.15/string/../sysdeps/i386/i686/multiarch/strcat.S:0x0012ac73 [/lib/i386-linux-gnu/libc-2.15.so]
114,642,456 ???:operator delete(void*) [/usr/lib/i386-linux-gnu/libstdc++.so.6.0.16]
114,388,018 /usr/include/c++/4.6/bits/stl_uninitialized.h:double* std::__uninitialized_copy<true>::__uninit_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
114,388,018 /usr/include/c++/4.6/bits/stl_uninitialized.h:double* std::uninitialized_copy<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*) [a.out]
114,388,018 /usr/include/c++/4.6/bits/stl_uninitialized.h:double* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*, double>(__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >, double*, std::allocator<double>&) [a.out]
105,086,674 /usr/include/c++/4.6/bits/stl_vector.h:std::_Vector_base<double, std::allocator<double> >::_M_allocate(unsigned int) [a.out]
95,533,505 ???:std::_Vector_base<double, std::allocator<double> >::_M_get_Tp_allocator() [a.out]
95,533,300 /usr/include/c++/4.6/bits/stl_construct.h:void std::_Destroy<double*>(double*, double*) [a.out]
95,533,300 /usr/include/c++/4.6/bits/stl_construct.h:void std::_Destroy<double*, double>(double*, double*, std::allocator<double>&) [a.out]
95,532,970 /usr/include/c++/4.6/bits/allocator.h:std::allocator<double>::allocator(std::allocator<double> const&) [a.out]
95,323,350 /usr/include/c++/4.6/bits/stl_iterator.h:__gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >::base() const [a.out]
76,594,040 /usr/include/c++/4.6/bits/allocator.h:std::allocator<double>::~allocator() [a.out]
76,428,152 /usr/include/c++/4.6/bits/stl_algobase.h:std::_Niter_base<double*>::iterator_type std::__niter_base<double*>(double*) [a.out]
76,426,584 /usr/include/c++/4.6/ext/new_allocator.h:__gnu_cxx::new_allocator<double>::deallocate(double*, unsigned int) [a.out]
76,426,344 ???:std::_Vector_base<double, std::allocator<double> >::_Vector_impl::~_Vector_impl() [a.out]
75,798,592 /usr/include/c++/4.6/bits/stl_algobase.h:__gnu_cxx::__enable_if<std::__is_scalar<double>::__value, double*>::__type std::__fill_n_a<double*, unsigned int, double>(double*, unsigned int, double const&) [a.out]
47,768,335 /usr/include/c++/4.6/bits/stl_iterator_base_types.h:std::_Iter_base<double*, false>::_S_base(double*) [a.out]
47,767,040 ???:__gnu_cxx::new_allocator<double>::max_size() const [a.out]
47,662,045 ???:std::_Vector_base<double, std::allocator<double> >::_M_get_Tp_allocator() const [a.out]
38,297,020 /usr/include/c++/4.6/ext/new_allocator.h:__gnu_cxx::new_allocator<double>::~new_allocator() [a.out]
В нем содержится больше информации, чем в предыдущем выводе, но есть еще две проблемы: во-первых, вывод неоптимизированного кода не помогает мне сделать оптимизированный код быстрее. Во-вторых, большую часть времени (~ 50%) занимают функции libc, которые я не использую напрямую в своем коде. Как узнать, какие части кода соответствуют этим вызовам?
Задача ещё не решена.