Publications by the researcher in collaboration with FRANCISCO MANUEL FERNANDEZ RIVERA (22)

2016

  1. Power and Energy Implications of the Number of Threads Used on the Intel Xeon Phi

    Annals of Multicore and GPU Programming: AMGP, Vol. 3, Núm. 1, pp. 55-65

2015

  1. Power and energy implications of the number of threads used on the Intel Xeon Phi

    Annals of Multicore and GPU Programming: AMGP, Vol. 2, Núm. 1, pp. 55-65

2014

  1. 3DyRM: a dynamic roofline model including memory latency information

    Journal of Supercomputing, Vol. 70, Núm. 2, pp. 696-708

  2. A hardware counter-based toolkit for the analysis of memory accesses in SMPs

    Concurrency Computation Practice and Experience, Vol. 26, Núm. 6, pp. 1328-1341

  3. Multiobjective optimization technique based on monitoring information to increase the performance of thread migration on multicores

    2014 IEEE International Conference on Cluster Computing, CLUSTER 2014

  4. Using an extended Roofline Model to understand data and thread affinities on NUMA systems

    Annals of Multicore and GPU Programming: AMGP, Vol. 1, Núm. 1, pp. 56-67

  5. Using sampled information: Is it enough for the sparse matrix-vector product locality optimization?

    Concurrency Computation Practice and Experience, Vol. 26, Núm. 1, pp. 98-117

2013

  1. A flexible and dynamic page migration infrastructure based on hardware counters

    Journal of Supercomputing, Vol. 65, Núm. 2, pp. 930-948

  2. Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor

    Journal of Parallel and Distributed Computing, Vol. 73, Núm. 12, pp. 1539-1550

2012

  1. A graphical tool for performance analysis of multicore systems based on the Roofline Model

    Proceedings of the 2012 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2012

  2. Experiences with the sparse matrix-vector multiplication on a many-core processor

    Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012

  3. Hardware counters based analysis of memory accesses in SMPs

    Proceedings of the 2012 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2012

  4. Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs

    Microprocessors and Microsystems, Vol. 36, Núm. 2, pp. 65-77

2010

  1. Increasing the locality of iterative methods and its application to the simulation of semiconductor devices

    International Journal of High Performance Computing Applications, Vol. 24, Núm. 2, pp. 136-153

  2. Lessons learnt porting parallelisation techniques for irregular codes to NUMA systems

    Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2010

2009

  1. Increasing data reuse of sparse algebra codes on simultaneous multithreading architectures

    Concurrency Computation Practice and Experience, Vol. 21, Núm. 15, pp. 1838-1856

  2. On the influence of thread allocation for irregular codes in NUMA systems

    Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings

2006

  1. Image segmentation based on merging of sub-optimal segmentations

    Pattern Recognition Letters, Vol. 27, Núm. 10, pp. 1105-1116

2005

  1. A new technique to reduce false sharing in parallel irregular codes based on distance functions

    Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN

  2. Performance optimization of irregular codes based on the combination of reordering and blocking techniques

    Parallel Computing, Vol. 31, Núm. 8-9, pp. 858-876