Linked e-resources

Details

Intro
Preface
Organization
Contents
A Comparative Evaluation of Parallel Programming Python Tools for Particle-in-Cell on Symmetric Multiprocessors
1 Introduction
2 Background
2.1 Particle-in-Cell
2.2 Python Parallel Programming
2.3 Related Work
3 Implementation
3.1 Profiling
3.2 Code Transformation
4 Experimental Results
4.1 Setup
4.2 Experiments
5 Discussion
6 Final Remarks
References
Accelerating GNN Training on CPU+Multi-FPGA Heterogeneous Platform
1 Introduction
2 Background
2.1 GNN Models
2.2 Mini-Batch GNN Training

2.3 Related Work
3 GNN Training on CPU+Multi-FPGA Platform
4 Optimizations
4.1 Graph Partitioning and Workload Balancing
4.2 Optimized GNN Kernels
5 Experiments
5.1 Experimental Setup
5.2 Hardware Parameter Selection and Resource Utilization
5.3 Performance Metrics
5.4 Comparison with Multi-GPU Platform
5.5 Scalability
5.6 Impact of Optimizations
6 Conclusion
References
Implementing a GPU-Portable Field Line Tracing Application with OpenMP Offload
1 Introduction
2 Background
2.1 Directive-Based Programming for Accelerators with OpenMP

2.2 Simulating Plasma Confinement in Stellarator Devices
2.3 Related Work
3 Directive-Based GPU Offloading Implementation
3.1 Breakdown of the Execution Flow
3.2 Data Management for Offloading
3.3 Parallelism Implementation
4 Results
4.1 Experimental Setup
4.2 Baseline Comparison: Single CPU Node Versus Single GPU
4.3 Multi-GPU Scalability
4.4 Economic Analysis
5 Conclusions
References
Quantitative Characterization of Scientific Computing Clusters
1 Introduction
2 Related Work
3 Background
3.1 Cluster Overhead and Coupling

3.2 Cluster Performance Profile
4 Performance Evaluation
4.1 Experimental Setup
4.2 Threats to Validity
4.3 Results
4.4 Clusters Performance Profiles
5 Discussion
6 Conclusion
References
Towards Parameter-Based Profiling for MARE2DEM Performance Modeling
1 Introduction
2 Dataset and Application Background
2.1 CSEM Data
2.2 MARE2DEM
2.3 Refinement Groups
3 Methodology and Experimental Context
4 Results
4.1 Performance Characterization of the Microkernels
4.2 Iterations and Refinement Groups
5 Conclusion
References

Time-Power-Energy Balance of BLAS Kernels in Modern FPGAs
1 Introduction
2 FPGAs and NLA
2.1 BLAS
2.2 FPGAs
3 Evaluated Kernels
3.1 Vitis Libraries
3.2 Matrix-Matrix Multiplication (MMM)
4 Experimental Evaluation
4.1 Setup
4.2 Experimental Results and Discussion
5 Conclusions
References
Improving Boundary Layer Predictions Using Parametric Physics-Aware Neural Networks
1 Introduction
2 Related Work
3 Methodology
3.1 Boundary Layer Problem
3.2 Architecture Design
4 Experimental Results
4.1 First Setting: Reaction-Diffusion Problem

Browse Subjects

Show more subjects...

Statistics

from
to
Export