Dear Yuasa san,
I tried some runs on es1fe.kek.jp (Suiren2) and on blue-fe.kek.jp, of the sample under pzcKernelProfiling. I added calls to clock_gettime(CLOCK_REALTIME, &tt) before and after the call to ComputeAdd_CPU to get the elapsed time on the CPU, in oder to compare with the time obtained from the GetPerformance function. Using the Add1 and Add2 kernels I am getting the times below on Suiren2, and I am wondering if these are as would be expected. Should the CPU elapsed time be compared with the Perf time in sec? If not, how to get the elapsed parallel time?
Another question I have is whether type 'double' can be used in the kernels? So far I have seen 'float', although the documentation includes a 'float2' type. What about higher precision?
If you want to forward my questions to Ishikawa san or Daisaka san that is fine.
Thanks much,
--elise
説明なしですいません。
elise donckerさんから相談がきましたので、ご対応をお願いします。
石川 正
Hello elise-san,
>Should the CPU elapsed time be compared with the Perf time in sec?
>If not, how to get the elapsed parallel time?
The following pseudo code is an example to get the elapsed time from CPU.
Without clFinish(), the elapsed time may not be measured correctly.
clFinish(command_queue);
start_time=clock_time(...);
clEnqueueNDRangeKernel(...);
clFinish(command_queue);
end_time=clock_time(...);
While some overhead in the kernel-call and clFinish() are included in the CPU elapsed time,
it can be compared with the Perf time measured in the kernel.
By the way, I found a bug in the main.cpp of pzcKernelProfiling.
The clock for PEZY-SC2 is not '733e6' but '700e6', so the main.cpp must be fixed as this:
#if defined(DEVICE_SC1)
const double clock = 733e6;
#elif defined(DEVICE_SC2)
const double clock = 700e6;
#endif
Sorry for the inconvenience...
>Another question I have is whether type 'double' can be used in the kernels?
>So far I have seen 'float', although the documentation includes a 'float2' type. What about higher precision?
Yes, 'double' can be used in the kernels.
'double2' is also available. However, it is executed as 2 'double' scalars, and not in a SIMD fassion.
The built-in types and functions are described in /opt/pzsdk.ver4.0/doc/html/index.html.
Best regards,
Hitoshi Ishikawa
I modified the pzcKernelProfiling sample as follows:
( https://gitlab.portal.pezy.jp/pezy/samples/blob/master/pzcKernelProfiling/main.cpp)
cl_uint core_clock = 0;
clGetDeviceInfo(device_id, CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(core_clock), &core_clock, NULL);
printf("clock=%dMHz\n", core_clock);
double sec = perf_count / (double)(core_clock * 1e6);
Now we need not to specify immediate clock frequency in the code.
Best Regards,
Hitoshi Ishikawa