Asi to není úplně k aplikaci, spíše k aplikaci AP vs verze ovladačů.
Výsledek z nVidie čtu tak, že: upravte si aplikaci, drivery upravovat nebudeme.
Alespoň je zajímavé, kde je potíž a úprava aplikace nevypadá složitě.
https://devtalk.nvidia.com/default/topi ... 6/#5303366
Topic: GeForce Drivers 4xx.xx drop more than 2/3 in OpenCL Performance from the 3xx.xx Drivers
There is no point in testing newer drivers; I don't expect any changes in this respect. Changes are required in the application if they want to restore performance with the newer drivers.
Current Scenario in ap26 app:
1. App queries CL_KERNEL_WORK_GROUP_SIZE in order to decide local work group size of either 1024 (seems optimal) or 64 (sub-optimal). If app gets value for query <1024 it reduces local work group size to 64 assuming device doesn't support 1024.
2. Nvidia OpenCL Driver changed return value for CL_KERNEL_WORK_GROUP_SIZE from 1024 to 256.
3. App is not using CL_KERNEL_WORK_GROUP_SIZE returned by driver as is, but just choosing a non-optimal local work-group size (64) based on this query.
What should developers do:
• Query CL_KERNEL_WORK_GROUP_SIZE to get just hint about work group size from driver and use it to launch kernel with that specific value. It need not be optimal for all kernels.
• App is free to choose any value from range [1 , CL_DEVICE_MAX_WORK_GROUP_SIZE] to get best possible work group size for different kernels, irrespective of CL_KERNEL_WORK_GROUP_SIZE returned by driver.
Suggestions specific to ap26:
• App can query CL_DEVICE_MAX_WORK_GROUP_SIZE and set work group size accordingly instead of using CL_KERNEL_WORK_GROUP_SIZE.
• Simplest solution for ap26 would be to use 1024 work group size directly if it comes in range [1 , CL_DEVICE_MAX_WORK_GROUP_SIZE].
I don't know how to best communicate the above information to the developers. If there is a good way to do that, please advise