Home / Documentation / Introduction

THIS IS AN OLD PAGE. GO TO kcachegrind.github.io FOR THE CURRENT VERSION.

Documentation
Screenshots
Download/Sources
Links
Roadmap
Bugs & Wishes

Project Page




Profiling as Part Of Application Development

When you develop a program, usually, one of the last steps is to make it as fast as possible (but still correct). You don't want to waste your time optimizing functions rarely used. So you need to know in which part of your program most of the time is spent.

This is done with a technique called Profiling. The program is run under control of a profiling tool, which gives you the time distribution among executed functions in the run. After examination of the program's profile, you probably know where to optimize and afterwards you verify the optimisation success again with another profile run.

Profiling Tools

Most known is the GCC profiling tool GProf: You need to compile your program with option "-pg"; running the program generates a file "gmon.out", which can be transformed into human readable form with the command line tool "gprof". The disadvantage is the needed compilation step for a prepared executable, which has to be statically linked.

Another profiling tool is Cachegrind, part of Valgrind. It uses the processor emulation of Valgrind to run the executable, and catches all memory accesses for the trace. The user program does not need to be recompiled; it can use shared libraries and plugins, and the profile measuring doesn't influence the trace results. The trace includes the number of instruction/data memory accesses and 1st/2nd level cache misses, and relates it to source lines and functions of the run program. A disadvantage is the slowdown involved in the processor emulation, it's around 50 times slower.

Cachegrind only can deliver a flat profile. There is no call relationship among the functions of an application stored. Thus, Inclusive Costs, i.e. costs of a function including the cost of all functions called from there, can't be calculated. Calltree extends Cachegrind by including call relationship and exact event counts spent while doing a call.

Because Calltree is based on simulation, the slowdown due to some preprocessing of events while collecting does not influence the results.

See Similar Tools for profiling tools similar to Callgrind and KCachegrind, but also using hardware performance counters.

Visualisation of Profiling Data

KCachegrind is a visualization tool for the profiling data generated by Cachegrind and Calltree (they profile data file format is upwards compatible). But as most visualization possibilities of KCachegrind depend on call relationships, you get much more out of it if your are using Calltree as profile tool.