• Zivanovic, M. Radulovic, G. Llort, D. Zaragoza, J. Strassburg, P. Carpenter, P. Radojkovic, E. Ayguadé Parra:
    “Best Paper Award at MEMSYS 2016”: Large-Memory Nodes for Energy Efficient High-Performance Computing, in Proceedings of MEMSYS 2016.
    [OpenAIRE] [UPCommons].
  • Drebes, A. Pop, K. Heydemann, A. Cohen, N. Drach:
    “Best Paper Award at PACT 2016”: Scalable Task Parallelism for NUMA – A Uniform Abstraction for Coordinated Scheduling and Memory Management, in Proceedings of PACT 2016.
    [OpenAIRE] [HAL-Inria]
  • Drebes, J.-B. Bréjon, A. Pop, K. Heydemann, A. Cohen:
    Language-Centric Performance Analysis of OpenMP Programs with Aftermath, in Proceedings of IWOMP 2016 (LNCS 9903).
    [OpenAIRE] [HAL-UPMC]
  • Drebes, A. Pop, K. Heydemann, A. Cohen:
    Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems, in Proceedings of ISPASS 2016.
    [OpenAIRE] [HAL-Inria]
  • Asifuzzaman, Kazi; Pavlovic, Milan; Radulovic, Milan; Zaragoza, David; Kwon, Ohseong; Ryoo, Kyung-Chang; Radojkovic, Petar:
    Performance Impact of a Slower Main Memory – A case study of STT-MRAM in HPC, in Proceedings of MEMSYS 2016.
    [OpenAIRE] [UPCommons]
  • Daniele Bortolotti; Simone Tinti; Piero Altoé; Andrea Bartolini:
    User-space APIs for dynamic power management in many-core ARMv8 computing nodes. International Conference on High Performance Computing & Simulation (HPCS)
  • A. Pullini, F. Conti, D. Rossi, I. Loi, M. Gautschi, L. Benini:
    A Heterogeneous Multicore System on Chip for Energy Efficient Brain Inspired Computing. IEEE International Symposium on Circuits and Systems (ISCAS).


  • Zivanovic, M. Pavlovic, M. Radulovic, H. Shin, J. Son, S. McKee, P. Carpenter, P. Radojkovic, E. Ayguade:
    Main memory in HPC: do we need more, or could we live with less? ACM Transactions on Architecture and Code Optimization (TACO) vol 14 issue
    [OpenAIRE] [UPCommons]
  • Asifuzzaman, R. Sánchez Verdejo, P. Radojkovic:
    Enabling a Reliable STT-MRAM Main Memory Simulation, in Proceedings of MEMSYS 2017.
    [OpenAIRE] [UPCommons]
  • Rigo, C. Pinto, K. Pouget, D. Raho, D. Dutoit, P.-Y. Martinez, C. Doran, L. Benini, I. Mavroidis, M. Marazakis, V. Bartsch, G. Lonsdale, A. Pop, J. Goodacre, A. Colliot, P. Carpenter, P. Radojkovic, D. Pleiter, D. Drouin, B. Dupont de Dinechin:
    Paving the way towards a highly energy-efficient and highly integrated compute node for the Exascale revolution: the ExaNoDe approach, in Proceedings of EUROMICRO DSD 2017.
    [OpenAIRE] [UPCommons]
  • Conti, R. Schilling, P. D. Schiavone, A. Pullini, D. Rossi, F. K. Gürkaynak, M. Muehlberghuber, M. Gautschi, I. Loi, G. Haugou, S. Mangard, L. Benini:
    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics. IEEE Transactions on Circuits and Systems I: Regular Papers (Vol 64, Issue 9, Sept. 2017)
  • R. Neill, A. Drebes, A. Pop:
    Fuse: Accurate multiplexing of hardware performance counters across executions. ACM Transactions on Architecture and Code Optimization (TACO), 14(4), p.43.
    [DOI] [OpenAIRE]
  • Alfio Di Mauro, Davide Rossi, Antonio Pullini,  Philippe Flatresse, Luca Benini:
    Temperature and process-aware performance monitoring and compensation for an ULP multi-core cluster in 28nm UTBB FD-SOI technology. 27th International Symposium on Power and Timing Modeling, Optimization and Simulation(PATMOS)
  • F. Conti, D. Palossi, R. Andri, M. Magno, L. Benini:
    Accelerated Visual Context Classification on a Low-Power Smartwatch. IEEETransactions on Human-Machine Systems ( Volume: 47 , Issue: 1 , Feb. 2017 )


  • Milan Radulovic, Kazi Asifuzzaman, Paul Carpenter, Petar Radojković, Eduard Ayguadé:
    HPC benchmarking: scaling right and looking beyond the average, in Proceedings of Europar 2018
    [OpenAIRE] [UPCommons]
  • Milan Radulovic, Kazi Asifuzzaman, Darko Zivanovic, Nikola Rajovic, Guillaume Colin de Verdière, Dirk Pleiter, Manolis Marazakis: Mainstream vs. emerging HPC: metrics, trade-offs and lessons learnt.
    [To appear in Proceedings of SBAC-PAD 2018]
  • Conti, L. Cavigelli, G. Paulin, I. Susmelj, L. Benini:
    Chipmunk: A systolically scalable 0.9 mm2, 3.08Gop/s/mW @ 1.2 mW accelerator for near-sensor recurrent neural network inference, in Proceedings of 2018 IEEE Custom Integrated Circuits Conference (CICC)
  • Panagiota Fatourou, Nikolaos D. Kallimanis:
    Lock Oscillation: Boosting the Performance of Concurrent Data Structures, in Proceedings of OPODIS 2017.
    [DOI] [Presentation (PDF)]
  • Przemyslaw Mroszczyk, Vasilis F. Pavlidis:
    Mismatch Compensation Technique for Inverter-Based CMOS Circuits, in Proceedings of 2018 IEEE International Symposium on Circuits and Systems (ISCAS 2018)
    [OpenAIRE] [U. Manchester Repository]
  • Przemyslaw Mroszczyk, Vasilis F. Pavlidis:
    Ultra-Low Swing CMOS Transceiver for 2.5-D Integrated Systems, in Proceedings of 2018 International Symposium on Quality Electronic Design (ISQeD 2018)
    [OpenAIRE] [U. Manchester Repository]
  • Panagiota Fatourou, Nikolaos D. Kallimanis, Thomas Ropars:
    An EfficientWait-free Resizable Hash Table.
    [OpenAIRE] [Author’s copy]
  • Kyunghun Kim et al:
    Toward Developing a Unimem OFI Provider for MPI Support. in EuroMPI 2018 (Poster)
    [Author’s copy]
  • O.S. Simsek, A. Drebes,  A. Pop:
    Leveraging Data-Flow Task Parallelism for Locality-Aware Dynamic Scheduling on Heterogeneous Platforms. 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (pp. 540-549)
  • R. Neill, A. Drebes, A. Pop:
    Automated Analysis of Task-Parallel Execution Behavior Via Artificial Neural Networks. 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  (pp. 647-656) (pp. 540-549)
  • Daniele Cesarini, Andrea Bartolini,  Pietro Bonfà,  Carlo Cavazzoni,  Luca Benini:
    COUNTDOWN – three, two, one, low power! A Run-time Library for Energy Saving in MPI Communication Primitives. CoRR
  • A. Bartolini, A. Borghesi, A. Libri, F. Beneventi, D. Gregori, S. Tinti, C. Gianfreda, P. Altoè:
    The D.A.V.I.D.E. big-data-powered fine-grain power and performance monitoring support. Proceedings of the 15th ACM International Conference on Computing Frontiers, pp 303-308.


  • Milan Radulovic, Rommel Sanchez Verdejo, Paul Carpenter, Petar Radojković, Bruce Jacob, Eduard Ayguadé:
    PROFET: Modeling System Performance and Energy Without Simulating the CPU.  ACM SIGMETRICS 2019
    [Author’s copy]
  • Yann Beilliard, Maxime Godard, Aggelos Ioannou, Astrinos Damianakis, Michael Ligerakis, Iakovos Mavroidis, Pierre-Yves Martinez, David Danovitch, Julien Sylvestre, Dominique Drouin:
    FPGA-based Multi-Chip Module for High-Performance Computing. HiPEAC 2019 / CoRR
  • P. Mroszczyk, J. Goodacre, and V. F. Pavlidis:
    Energy Efficient Flash ADC with PVT Variability Compensation through Advanced Body Biasing. IEEE Transactions on Circuits and Systems II: Express Briefs
    [OpenAIRE] [U. Manchester Repository]

PhD Theses

  • Darko Zivanovic:
    Memory Systems for High-Performance Computing: The Capacity and Reliability Implications. Universitat Politecnica de Catalunya, 2018. Supervisor: Petar Radojković
    [OpenAIRE] [UPC]
  • Milan Radulovic:
    Memory Bandwidth and Latency in HPC: System Requirements and Performance Impact. Universitat Politecnica de Catalunya, 2019. Supervisor: Petar Radojković
    [To appear]
  • Kazi Asifuzzaman:
    Performance evaluation of STT-MRAM main memory in HPC and real-time systems with WCET implications. Universitat Politecnica de Catalunya, 2019.
    Supervisor: Petar Radojković
    [To appear]

Public deliverables

  • D1.4: Data Management Plan for pilot on Open Research Data [PDF]
  • D2.1: Report on the ExaNoDe miniapplications [PDF]
  • D2.2: Report on the ExaNoDe architecture design guidelines [PDF]
  • D2.3: Report and best practices on porting of the mini-applications to the ExaNoDe architecture [PDF]
  • D2.4: ExaNoDe Infrastructure Requirements [PDF]
  • D2.5: Report on the performance bottlenecks of the state-of-the-art HPC platforms [PDF]
  • D3.1: Runtime systems (OmpSs, OpenStream) and communication libraries (GPI, MPI): Analysis of the hardware system characteristics and design of a preliminary software implementation [PDF]
  • D3.2: Runtime systems (OmpSs, OpenStream) and communication libraries (GPI, MPI): Advanced implementation customized for ExaNoDe architecture, interconnect, operating system [PDF]
  • D3.6: Design of the ExaNoDe Firmware (report and initial prototype) [PDF]
  • D3.7: Operating System Support for ExaNoDe [PDF]
  • D5.2: HW-SW integration and tuning [PDF]
  • D6.1: Project External Website, project flyer and social media presence [PDF]
  • D6.2: Dissemination Strategy Document [PDF]
  • D6.3: Initial Project Press Release [PDF]

Selected Presentations

Recent Overview Presentations

  • Manolis Katevenis:
    I/O, today, is Remote (block) Load/Store, and must not be slower than Compute, any more. Invited talk,  PER 2018 workshop  (PERspectives on the Future of Computing), within the HiPEAC Computing Systems Week (CSW), Gothenburg, Sweden, 23. 5.  2018.
    [PDF Document] [YouTube Video]
  • Denis Dutoit:
    Silicon interposer integration combined with novel system architecture for energy-efficient and heterogeneous compute node: the ExaNoDe solution. Workshop on “Post Moore Interconnects”  at ISC 2018, Frankfurt (M), Germany, 28.6.2018
    [PDF Document]

Presentations from the workshop “ExascaleHPC: the ExaNoDe, ExaNeSt, EcoScale, and EuroEXA projects” in January 2018

  • N. Kallimanis, M. Marazakis, E. Skordalaki:
    Use-cases for Remote Memory in the UNIMEM Architecture
  • K. Pouget, A. Mouzakitis, R. Dimitrov, A. Rigo, D. Raho:
    Virtualization for HPC 
  • P.Y. Martinez, D. Dutoit, A. Philippe, P. Vivet, D.Drouin, Y. Beilliard:
    3D-IC Design Solution for Modular Integration of Chiplet over Silicon Interposer 
  • P. Mroszczyk, V. Pavlidis:
    Ultra-Low Swing Transceiver for Energy Efficient Communication in 2.5-D Integrated Systems
  • B. Chalios:
    ExaNoDe Programming Environment to Exploit ARM, UNIMEM and FPGAs

Presentations from the workshop “Towards Exascale HPC systems: Co-design and Technology development within the EuroEXA, ExaNeSt, ExaNoDe and EcoScale projects” in May 2018

  • R. Dimitrov, K. Pouget:
    Virtualization technologies in modern HPC systems
  • N. Kallimanis:
    A Flexible & Efficient Shared Memory Abstraction with Minimal HW Assistance


ExaNoDe Open Source Strategy

ExaNoDe endorsed the EU Commission Open Source strategy and made the contributions described below.

BSC released its OmpSs-2 parallel programming model as part of the Nanos6 runtime. This support for clusters was publicly released under GNU GPLv3 within the 19.06 version of OmpSs-2. The repository is available at this link.
A second open source release regards instead PROFET, an analytical model that predicts how an application’s performance and energy consumption changes when it is executed on different memory systems. The repository exists at this address but the code has not been uploaded yet, it will be available soon.

ETHZ opened many of the hardware intellectual properties (IPs) developed in the project as part of the PULP platform ( under the liberal, Apache-derived SolderPad license. These include among others also many IPs used to build the internal interconnects used within and between the ExaConv clusters (APB, AXI bridges that can be found here and here), DMAs, interrupt controllers. IPs are distributed in the form of synthesizable HDL code in SystemVerilog, C code, and related documentation. The software-only components have been released with Apache 2 license.

Forschungszentrum Jülich has released the mini-applications – namely HydroC, miniFE, miniKKR and BQCD – at the following address under GNU GPLv2, GNU GPL v3, LGPL, CeCILL and BSD license. These are self-contained and based on real-life applications that have been developed and ported to the architecture via the programming models and communication APIs.

Virtual Open Systems developed a QEMU extension for virtual machine periodic checkpointing. A repository including all the changes is available at this address. The code is released under GNU GPLv2. Virtual Open Systems is working on a companion page in its website that instructs on how to compile and reproduce the periodic checkpointing of an ARMv8 virtual machine. The page will be reachable from this address.

Links of interest: