This is my post on the F# advent calendar 2016.
As user of F# I love the language but today I’d like to talk about the point that I’ve been struggling more, how to run F# code on GPU. On Fsharp.org there’s a nice list of libraries that can run F# on GPU, but some of them are very old and specially they are not updated since long time ago. Another problem that you can find is that there’s few documentation. For this reason I decided to write a simple example on how to use FSCL (Fsharp to OpenCL), one of the methods to run F# code on GPU.
FSCL can be installed from NuGet and since I prefer to work with scripts, the first thing that you must do it’s to reference all the library files and open FSCL
The library contains a number of functions to see which ones are your OpenCL devices and it’s useful to check before you execute any code if the OpenCL drivers are already installed and OpenCL is ready to run, so before starting we ask to the system how many OpenCL devices we have
Thus, after we have called FSCL and we know that our device is compatible with OpenCL, we can start defining constants and functions which are very similar to the standard ones
As you can see, a function ready to run in OpenCL with FSCL is very similar to one that you’ll write normally with:
- Attributes: ReflectedDefinition and Kernel. ReflectedDefinition is necessary to specify it on all the FSCL functions and Kernel if it’s a Kernel. it’s not always necessary to use the attribute kernel since nested functions are working.
- the parameter called WorkItemInfo that we need to pass to all our FSCL Kernel indicating the global and local size of our kernel.
In this case, both functions used properly will produce the same output, so now we need to define some arrays to operate with them and which one is the size of our kernels:
The last step is to run the kernel and compare how fast is compared with the Fsharp equivalent function and we iterate it many times to see the performance:
Here I show the CPU serial code, but it can be run in parallel with the use of Array.Parallel.iter(). The comparatives for my laptop (i7+Gforce GTX 960M) are:
Real: 00:00:11.043, CPU: 00:00:08.406
Real: 00:03:21.406, CPU: 00:03:21.812
Real: 00:00:55.536, CPU: 00:04:54.671
So using the GPU with FSCL it’s been possible to accelerate the code x5 with respect to the parallel way on CPU, but it’s important to remember that since the code is converted to OpenCL, the use is not restricted to GPU and can be run in many devices, as an example, it also runs well on Intel’s Xeon-Phi.