Dissecting OpenCL code
In my previous post , I got sample OpenCL matrix multiplication kernel to run inside Python on Windows. That was one of the first time I got to work on openCL. The code itself is pretty self explanatory for the most part, but it is interesting to see the comparisons with CUDA, and also how the kernel gets invoked though python. import pyopencl as cl import numpy as np import os os.environ['PYOPENCL_CTX']='0' (n, m, p) = (3, 4, 5) a = np.random.randn(n, m).astype(np.float32) b = np.random.randn(m, p).astype(np.float32) c = np.zeros((n*p), dtype=np.float32) context = cl.create_some_context() queue = cl.CommandQueue(context) mf = cl.mem_flags a_buf = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a) b_buf = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b) c_buf = cl.Buffer(context, mf.WRITE_ONLY, c.nbytes) prg = cl.Program(context, """ __kernel void multiply(ushort n, ushort m, ushort p, __global float *a, __global fl...