|Private, to be used in future projects
|Platform agnostic, only tested on Windows
|~ 10 hours
|KoreTech Profiler Source(zip | 10KB),
KoreTech Profiler Sample(VS2010, zip | 14KB)
Even though it was eventually integrated into the KoreTech framework (and hence also ColorIt), this simple profiler I wrote can be seen as a project of its own. I rolled it around half a year earlier, when I felt the urge to have a look at how some of the stuff I played around with performed. It’s a simple scope-based profiler (meaning it puts a tiny object on the stack for the sole purpose of recording the time that has passed between its creation and its going out of scope) for single-threaded use that lets users hook up their own custom functions for memory allocation and deallocation as well as getting timestamps and printing. Everything the profiler does is wrapped in preprocessor macros so profiling can instantly be disabled if so desired. It requires only two files, but it kind of depends on KORE_ASSERT as I was too lazy to rip out all those handy assertions just for presentation or publishing purposes. Actually doing that, if need be, is left as an exercise to the user.
It comes (just for once…) with fairly thorough documentation by way of comments so don’t sweat reading all of the following, if you really want to use it and are more of a hands-on guy.
The module must be initialized by calling
PROFILE_INITIALIZE(ALLOCATE, GET_CYCLES_FUNCTION, CYCLES_TO_SECONDS_FUNCTION, PRINT_FUNCTION, PRINT_OSD_FUNCTION)
(or kore::profile::profile_initialize(…), if you hate macros) and passing function pointers for allocation and printing. The Alloc function pointer requires malloc-signature and the Print function pointer requires printf-signature. Users must also pass pointers to functions that return the current number of cycles passed since system start (or something like that, it’s only about time increments anyway) and that can convert that number into seconds. The rationale behind this is that I didn’t want to muck around with conversion from cycles to seconds for every value the profiler records but rather store all values as uints of sufficient size and only do the conversion when statistics are evaluated. Once everything is initialized, users can either use
providing a unique name for the scoped object and a message to be printed upon entering and leaving the scope or
providing again a unique name for the scoped object and a profiling identifier as added to the enumeration in profile.h. PROFILE_SCOPE_ONCE will instantly attempt to print the duration the created object was alive in both cycles and seconds, while PROFILE_SCOPE_STATISTICS just records the duration for the specified category this turn and that’s that. Every hundred frames or so, users can call
to have a brief summary printed using the supplied print function. That summary contains for each profile identifier the minimum/maximum and average duration. Users can adjust the number of samples that should be taken for the running average by modifying NUM_SAMPLES_MODIFIER in profile.h. Note that the number of samples must always be a power of two, so that modifier actually results in
When all is said and done, just kill the damn thing by calling
where DEALLOCATE is the pointer to a deallocation function with free-signature and be done with it.
First of all, yes, it’s very simple and has almost no functionality. But sometimes you don’t need much and settle with the tool that will just about do the job. I’ll definitely make the (now global) profiler-state bundled into an object so we can instantiate profilers (the whole thing, not ProfileScopes) in thread-local memory in order to be able to profile multi-threaded stuff. This will have to wait until I get around to implement the job-system for KoreTech, though.
Even though not part of the profiler itself, it should be noted that the kore::timer-thingy in the source folder is just a wrapper around QueryPerformanceCounter/QueryPerformanceFrequency on Windows and thus suffers from the same limitations. Most notably that your process could be migrated between cores and thus readings could not be correct. IF (and I haven’t seen this yet on any of the few PCs I have around) your mainboard vendor has not properly implemented this in their drivers. Most halfway decent vendors seem to use dedicated hardware for getting timestamps (that’s often not clocked as fast the actual CPU, at least on my Intel P58) and that also does not seem to suffer from thread migration issues. But as always, you can’t count on that.
There’s some other stuff like the PrintScreenFunction pointer that can be provided upon PROFILE_INITIALIZE(…) that I use to print a less verbose version of the statistics stuff onto the screen in KoreTech or the possibility to feed it an Allocator object instead of an allocate function that I also use in KoreTech. Omitting that stuff upon init won’t do anything bad. The profiler will just print a message telling the user that it will use the fallback of printing statistics via PrintFunction rather than using the default PrintScreenFunction.