Study of basic vector operations on Intel Xeon Phi and NVIDIA Tesla using OpenCL

  1. Coronado, Edoardo 1
  2. Indalecio, Guillermo 1
  3. Garcia-Loureiro, Antonio 1
  1. 1 Centro Singular de Investigación en Tecnoloxías da Información (CiTIUS) Universidad de Santiago de Compostela
Revista:
Annals of Multicore and GPU Programming: AMGP

ISSN: 2341-3158

Ano de publicación: 2015

Volume: 2

Número: 1

Páxinas: 66-80

Tipo: Artigo

Outras publicacións en: Annals of Multicore and GPU Programming: AMGP

Resumo

The present work is an analysis of the performance of the basic vector operations AXPY, DOT and SpMV using OpenCL. The code was tested on the NVIDIA Tesla S2050 GPU and Intel Xeon Phi 3120A coprocessor. Due to the nature of the AXPY function, only two versions were implemented, the routine to be executed by the CPU and the kernel to be executed on the previously mentioned devices. It was studied how they perform for different vector’s sizes. Their results show the NVIDIA architecture better suited for the smaller vectors sizes and the Intel architecture for the larger vector’s sizes. For the DOT and SpMV functions, there are three versions implemented. The first is the CPU routine, the second one is an OpenCL kernel that uses local memory and the third one is an OpenCL kernel that only uses global memory. The kernels that use local memory are tested by varying the size of the work-group; the kernels that only uses global memory are tested by varying the arrays size. In the case of the first ones, the results show the optimum work-group size and that the NVIDIA architecture benefits from the use of local memory. For the latter kernels, the results show that larger computational loads benefits the Intel architecture.

Referencias bibliográficas

  • Top 500. http://www.top500.org
  • https://www.khronos.org/opencl/
  • Coronado, E., Garcia, A., Indalecio, G., Seoane, N.: Implementation of numerical methods for nanoscaled semiconductor device simulation using opencl. Spanish Conference on Electron Devices (2015)
  • Seoane, N., Garcia-Loureiro, A.J.: Study of parallel numerical methods for semiconductor device simulation. International Journal of Numerical Modelling: Electronic Networks, Devices and Fields 2006(19), 15–32 (2005)
  • Garcia-Loureiro, A.J., Lopez-Gonzalez, J.M., Fernandez-Pena, A.T., Prat-Vin ̃as, L.L.: Numerical analysis of abrupt heterojunction bipolar transitors. International Journal of Numerical Modelling: Electronic Networks, Devices and Fields 11(4), 221–229 (1998)
  • Garcia-Loureiro, A.J., Kalna, K., Asenov, A.: 3D parallel simulations of fluctuation effects in pHEMTs. Journal of Computational Electronics, 369–373
  • Indalecio, G., Garcia-Loureiro, A.J., Aldegunde, M., Kalna, K.: 3D simulation study of work-function variability in a 25 nm metal-gate finfet with curved geometry using voronoi grains. Proc. 17th Int. Conf. Simul. Semicond. Proc. Devices (SISPAD), 149–152
  • Rodriguez, J.A., Otero, P., Vetter, M., Andreu, J., Comesana, E., Garcia-Loureiro, A.J.: Simulation of the effect of p-layer properties on the electrical behaviour of a-si: H thin film solar cells. Electron Devices Conference (CDE) (2011)
  • Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Philadelphia: Society for Industrial and Applied Mathematics, Philadelphia, PA (2003)
  • NVIDA Tesla GFLOPS. http://www.techpowerup.com/gpudb/1538/tesla- s2050.html
  • Intel Xeon GFLOPS. http://www.dellhpcsolutions.com/dellhpcsolutions/static/xeonphi.html
  • Intel Xeon Phi Coprocessor OpenCL Programing Guide. https://software.intel.com/en- us/articles/opencl- design- and- programming- guide- for- the- intel- xeon- phi- coprocessor
  • Scarpino, M.: OpenCL in Action: How to Accelerate Graphics and Computation, 2nd edn. Manning Publications Co., Shelter Island, NY (2012)
  • Gaster, B., Howes, L., Kaeli, D.R., Mistry, P., Schaa, D.: Heterogeneous Computing with OpenCL. MORGAN KAUFMANN, Waltham, MA (2012)
  • Banger, R., Bhattacharyya, K.: OpenCL Programming by Example. Packt Publishing Ltd., Birmingham, UK (2013)
  • Mark Harris: Optimizing Parallel Reduction In CUDA. http://developer.download.nvidia.com/compute/cuda/1.1- Beta/x86_website/projects/reduction/doc/reduction.pd
  • Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, Dec. 2008, 149–152 (2008)
  • http://ark.intel.com/es- es/products/75799/Intel- Xeon- Phi- Coprocessor- 7120P- 16GB- 1_238- GHz- 61- core