gstlal-inspiral  0.4.2
 All Classes Namespaces Files Functions Variables Pages
Profiling of gstlal_inspiral on NVIDIA K4000

This page outlines a study to test gstlal_inspiral on a commodity GPU, the NVIDIA K4000.

In previous benchmarks audio resampling was identified as a hot spot taking approximately 30% of the time on Haswell chips

CPU: Intel Haswell microarchitecture, speed 3591.53 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        image name               symbol name
6977399  25.5787  libgstaudioresample.so   resampler_basic_direct_single
901533    3.3050  libgstaudioresample.so   resample_float_resampler_process_interleaved_float

This study attemps to address whether or not the resampling portion of the analysis can be sped up with commodity GPUs. This study involved developing a new resample element that is more easily isolated for putting on a GPU. By understanding the scaling of this portion of the code it provides insight into other portions of the filtering algorithm, which fundamentally work on similar data sizes and perform similar linear algebra operations. The hope is to ascertain whether or not there is promise to port more portions of the code to GPU.

Setup:

Caveats:

  1. The new resampler passes basic sanity checks but has not been developed to a production quality or produced vetted scientific results. Nevertheless, it s probably working well enough to get a sense of the computational cost
  2. I was unable to test this configuration with the standard benchmark. There was insufficient memory in the graphics card to support the normal load of 8 processes. Therefore this benchmark uses taskset to restrict a single multithreaded process to use 1 virtual CPU core (HT is enabled) along with the GPU. For comparison, the CPU only version is run in the same configuration.

Some points to consider:

Results:

GPU_FFT_profile.png
GPU vs CPU throughput timing results