Linked e-resources

Details

Intro
Preface
Organization
Keynote Talks
SpMV: An Embarrassing Kernel for Modern Compute Devices
Low-level Fun with Parallel Runtime Systems
Contents
Energy Efficiency
Energy Efficient Frequency Scaling on GPUs in Heterogeneous HPC Systems
1 Motivation, Problem Statement and Key Contributions
1.1 Motivation
1.2 Problem Statement and Key Contributions
2 Related Work and Background
2.1 Performance and Energy Measurement Tools
2.2 Benchmarks
2.3 Energy Efficiency on Graphics Processing Units
3 Methodology
4 Results

4.1 Minimum Interval Length Between Measurements
4.2 Frequency Scaling
4.3 Frequency vs. Total Energy Consumption
5 Summary, Future Research and Conclusion
References
Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism
1 Introduction
2 Related Work
3 Transport Triggered Architectures
4 Dual-IS Processor
4.1 Instruction Translation
4.2 Micro-operation Sequencing
4.3 Control and Data Hazards
4.4 Mode Switching
5 Evaluation
5.1 Evaluated Designs
5.2 Synthesis Results
5.3 Performance
5.4 Energy Efficiency

5.5 Discussion
6 Conclusions
References
Pasithea-1: An Energy-Efficient Self-contained CGRA with RISC-Like ISA
1 Introduction
1.1 Reconfigurable Computing
1.2 Related Work
1.3 This Work
2 Instruction Set Architecture
2.1 Fragment Instances
2.2 Local Interaction with Target Instruction Pointers (TIPs)
2.3 Global Interaction of Fragment Instances
2.4 What's the RISC?
3 Programming
3.1 Local Programming
3.2 Global Programming
4 Microarchitecture
4.1 Fragment Instance Management
4.2 Tiles and PEs: Fragment Instances on Fabric

4.3 Dormant Fragment Instances
4.4 Memory Subsystem
5 Evaluation Methodology
6 Results
7 Discussion
References
Applied Machine Learning
Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning
1 Introduction
2 Related Work
3 Motivation, Problem, and Solution Overview
3.1 Motivation: Technology Trends
3.2 Problem Definition
3.3 Solution Overview
4 Modeling and Optimization
4.1 Slowdown Estimation for a Given Job Set and Hardware Setup
4.2 Hardware Setup Optimization for a Given Job Set

4.3 Job Sets Selection
5 Evaluation
5.1 Evaluation Setup
5.2 Experimental Results
6 Conclusion
References
FPGA-Based Dynamic Deep Learning Acceleration for Real-Time Video Analytics
1 Introduction
2 Overview of the Proposed System
2.1 Neural Network Architecture Search
2.2 Neural Network Model Compilation
2.3 Software and Hardware Run-Time Management
3 DNN Model Optimisation
3.1 Brief Introduction of OFA
3.2 Model Generation and Optimisation
4 System Hardware/Software Co-design
4.1 Hardware Architecture
4.2 Software Implementation

Browse Subjects

Show more subjects...

Statistics

from
to
Export