Block/Loop Benchmarking

Type : Benchmarking Tool
References : Posted by arguru[AT]smartelectronix[DOT]com
Notes :
Requires CPU with RDTSC support
Code :
// Block-Process Benchmarking Code using rdtsc
// useful for measure DSP block stuff
// (based on Intel papers)
// 64-bit precission
// VeryUglyCode(tm) by Arguru

// globals
UINT time,time_low,time_high;

// call this just before enter your loop or whatever
void bpb_start()
{
    // read time stamp to EAX
    __asm rdtsc;
    __asm mov time_low,eax;
    __asm mov time_high,edx;
}

// call the following function just after your loop
// returns average cycles wasted per sample
UINT bpb_finish(UINT const num_samples)
{
    __asm rdtsc
    __asm sub eax,time_low;
    __asm sub edx,time_high;
    __asm div num_samples;
    __asm mov time,eax;
    return time;
}

Comments
from : pete[AT]bannister25[DOT]plus[DOT]com
comment : If running windows on a mutliprocessor system, apparently it is worth calling: SetThreadAffinityMask(GetCurrentThread(), 1); to reduce artefacts. (see http://msdn.microsoft.com/visualc/vctoolkit2003/default.aspx?pull=/library/en-us/dv_vstechart/html/optimization.asp)

from : guillaume[DOT]mureu[AT]free[DOT]fr
comment : __asm sub eax,time_low; __asm sub edx,time_high; should be __asm sub eax,time_low __asm SBB edx,time_high // substract with borrow