Linked e-resources


Compilers, Tools and Environments
ALONA: Automatic Loop Nest Approximation with Reconstruction and Space Pruning
Automatic low-overhead load-imbalance detection in MPI applications
Performance and Power Modeling, Prediction and Evaluation
Trace-driven Workload Generation and Execution
Bilas Update on the Asymptotic Optimality of LPT
E2EWatch: An End-to-end Anomaly Diagnosis Framework for Production HPC Systems
Scheduling and Load Balancing
Collaborative GPU Preemption via Spatial Multitasking for Efficient GPU Sharing
A Fixed-Parameter Algorithm for Scheduling Unit dependent Tasks with Unit Communication Delays
Plan-based Job Scheduling for Super computers with Shared Burst Buffers
Taming Tail Latency in Key-Value Stores: a Scheduling Perspective
A log-linear(2+5/6)-approximation algorithm for parallel machine scheduling with a single orthogonal resource
An MPI-Parallel Algorithm for Mapping Complex Networks onto Hierarchical Architectures
Pipelined Model Parallelism: Complexity Results and Memory Considerations
Data Management, Analytics and Machine Learning
Efficient and Systematic Partitioning of Large and Deep Neural Networks for Parallelization
A GPU Architecture Aware Fine-Grain Pruning Technique for Deep Neural Networks
Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUs
Smart Distributed Data Sets for Stream Processing
Cluster, Cloud and Edge Computing
Colony: Parallel Functions as a Service on the Cloud-Edge Continuum
Horizontal Scaling in Cloud using Contextual Bandits
Geo-Distribute Cloud Application at the Edge
A Fault Tolerant and Deadline Constrained Sequence Alignment Application on Cloud-based Spot GPU Instances
Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach
Theory and Algorithms for Parallel and Distributed Processing
Algorithm design for Tensor Units
A Scalable Approximation Algorithm for Weighted Longest Common Subsequence
TSL Queue: An E-cient Lock-free Design for Priority Queues
G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU
Parallel and Distributed Programming, Interfaces, and Languages
Accelerating Graph Applications Using Phased Transactional Memory
Efficient GPU Computation using Task Graph Parallelism
Towards High Performance Resilience using Performance Portable Abstractions
Enhancing Load-Balancing of MPI Applications with Workshare
Particle-In-Cell Simulation using Asynchronous Tasking
Multicore and Manycore Parallelism
Exploiting co-execution with one API: heterogeneity from a modern perspective
Parallel Numerical Methods and Applications
Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems
Fault-tolerant LU factorization is low cost
Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs
Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method
GPU Accelerated Mahalanobis-average Hierarchical Clustering Analysis
High performance architectures and accelerators
PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy
Optimized Implementation of the HPCG Benchmark on Recongurable Hardware.

Browse Subjects

Show more subjects...

