Linked e-resources

Details

Intro
Preface
Organization
Contents
Compiler-Assisted Correctness Checking and Performance Optimization for HPC
Preface to the Third Workshop on Compiler-Assisted Correctness Checking and Performance Optimization for HPC (C3PO'22)
1 Introduction
2 Organization
2.1 Organizing Committee
2.2 Program Committee
3 Program
3.1 Invited Talk
3.2 Research Papers
Compiler-Assisted Instrumentation Selection for Large-Scale C++ Codes
1 Introduction
2 Related Work
3 Tailored Instrumentation for OpenFOAM
3.1 Design and Limitations of InstRO

4 The CaPI Instrumentation Toolchain
4.1 Instrumentation Workflow
4.2 Implementation
4.3 Score-P Integration
5 Evaluation on OpenFOAM
6 Usability and Validation Impediments
7 Discussion
8 Conclusion and Future Work
References
Lightweight Array Contraction by Trace-Based Polyhedral Analysis
1 Introduction
2 Background
2.1 Polyhedral Model
2.2 Array Contraction
3 Related Work
4 Our Approach
4.1 Overview of the Approach
4.2 Generating Input Parameter Instances
4.3 Inferring a Mapping on a Trace
4.4 Interpolation
5 Experimental Results

5.1 Experimental Setup
5.2 Results
6 Conclusion
References
Detecting Scale-Induced Overflow Bugs in Production HPC Codes
1 Introduction
2 Tracing Algorithm Extension
2.1 Fortran Support
3 Evaluation
4 Related Work
5 Conclusion
References
HPC on Heterogeneous Hardware (H3)
AI Benchmarking for Science: Efforts from the MLCommons Science Working Group
1 Introduction
2 MLCommons Science Working Group
2.1 About the Working Group
2.2 Science Benchmarking
2.3 Policies for Benchmarking
3 Benchmarks for the First Release

3.1 Cloud Masking (cloud-mask)
3.2 STEMDL (stemdl)
3.3 CANDLE-UNO (candle-uno)
3.4 Time Series Evolution Operator (tevelop)
4 Results from Initial Evaluations
4.1 Results for the cloud-mask Benchmark
4.2 Results for the stemdl Benchmark
4.3 Results for the candle-uno Benchmark
4.4 Results for the tevelop Benchmark
5 Conclusions
References
Performance Analysis of Matrix Multiplication for Deep Learning on the Edge
1 Introduction
2 Blocked Algorithms for GEMM
2.1 The Baseline Algorithm for GEMM
2.2 A Family of Algorithms for GEMM

3 A Performance Simulator for GEMM Algorithms
3.1 IoT Architecture Model
3.2 Validation
4 Performance Analysis
5 Discussion and Future Work
References
Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems
1 Introduction
2 Related Work
3 Background
4 Methodology
4.1 Hybrid-PIPECG-1 Method
4.2 Hybrid-PIPECG-2 Method
4.3 Hybrid-PIPECG-3 Method
5 Experiments and Results
6 Conclusion and Future Work
References
A Multi-Level Platform-Independent GPU API for High-Level Programming Models
1 Introduction

Browse Subjects

Show more subjects...

Statistics

from
to
Export