Linked e-resources
Details
Table of Contents
Concurrent Systems: Hybrid Object Implementations and Abortable Objects
Runtime-Aware Architectures
MPI Thread-Level Checking for MPI+OpenMP Applications
Event-Action Mappings for Parallel Tools Infrastructures
Low-Overhead Detection of Memory Access Patterns and Their Time Evolution
Automatic On-line Detection of MPI Application Structure with Event Flow Graphs
Online Automated Reliability Classification of Queueing Models for Streaming Processing Using Support Vector Machines
A Duplicate-Free State-Space Model for Optimal Task Scheduling
On the Heterogeneity Bias of Cost Matrices when Assessing Scheduling Algorithms
Hardware Round-Robin Scheduler for Single-ISA Asymmetric Multi-Core
Moody Scheduling for Speculative Parallelization
Allocating Jobs with Periodic Demands
A Multi-Level Hypergraph Partitioning Algorithm Using Rough Set Clustering
Non-preemptive Throughput Maximization for Speed-Scaling with Power-Down
Scheduling Tasks from Selfish Multi-tasks Agents
Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems
Concurrent Priority Queues Are not Good Priority Schedulers
Load Balancing Prioritized Tasks via Work-Stealing
Optimizing Task Parallelism with Library-Semantics-Aware Compilation
Data Layout Optimization for Portable Performance
Automatic Data Layout Optimizations for GPUs
Performance Impacts with Reliable Parallel File Systems at Exascale Level
Rapid Tomographic Image Reconstruction via Large-Scale Parallelization
Software consolidation as an efficient energy and cost Saving Solution for a SaaS/PaaS Cloud Model
VMPlaceS A Generic Tool to Investigate and Compare VM Placement Algorithms
A Connectivity Model for Agreement in Dynamic Systems
DFEP: Distributed Funding-based Edge Partitioning
PR-STM: Priority Rule Based Software Transactions on the GPU
Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems
A Practical Transactional Memory Interface
A Multicore Parallelization of Continuous Skyline Queries on Data Streams
A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures
A Composable Deadlock-Free Approach to Object-Based Isolation
Scalable Data-Driven PageRank: Algorithms, System Issues & Lessons Learned
How Many Threads Will Be Too Many? On the Scalability of OpenMP Implementations
Efficient Nested Dissection for Multicore Architectures
Scheduling Trees of Malleable Tasks for Sparse Linear Algebra
Elastic Tasks: Unifying Task Parallelism and SPMD Parallelism with an Adaptive Runtime
Semi-discrete Matrix-Free Formulation of 3D Elastic Full Waveform Inversion Modeling
10,000 Performance Models per Minute
Scalability of the UG4 Simulation Framework
Exploiting Task-Based Parallelism in Bayesian Uncertainty Quantification
Parallelization of an Advection-Diffusion Problem Arising in Edge Plasma Physics Using Hybrid MPI/OpenMP Programming
Behavioral Non-Portability in Scientific Numeric Computing
Fast Parallel Suffix Array on the GPU
Effective Barrier Synchronization on Intel Xeon Phi Coprocessor
High Performance Multi-GPU SpMV for Multi-component PDE-based Applications
Accelerating Lattice Boltzmann Applications with OpenACC
High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters
Improving Performance of Convolutional Neural Networks by Separable Filters on GPU
Iterative Sparse Triangular Solves for Preconditioning
Targeting the Parallella
Systematic Fusion of CUDA Kernels for Iterative Sparse Linear System Solvers
Efficient Execution of Multiple CUDA Applications using Transparent Suspend, Resume and Migration.
Runtime-Aware Architectures
MPI Thread-Level Checking for MPI+OpenMP Applications
Event-Action Mappings for Parallel Tools Infrastructures
Low-Overhead Detection of Memory Access Patterns and Their Time Evolution
Automatic On-line Detection of MPI Application Structure with Event Flow Graphs
Online Automated Reliability Classification of Queueing Models for Streaming Processing Using Support Vector Machines
A Duplicate-Free State-Space Model for Optimal Task Scheduling
On the Heterogeneity Bias of Cost Matrices when Assessing Scheduling Algorithms
Hardware Round-Robin Scheduler for Single-ISA Asymmetric Multi-Core
Moody Scheduling for Speculative Parallelization
Allocating Jobs with Periodic Demands
A Multi-Level Hypergraph Partitioning Algorithm Using Rough Set Clustering
Non-preemptive Throughput Maximization for Speed-Scaling with Power-Down
Scheduling Tasks from Selfish Multi-tasks Agents
Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems
Concurrent Priority Queues Are not Good Priority Schedulers
Load Balancing Prioritized Tasks via Work-Stealing
Optimizing Task Parallelism with Library-Semantics-Aware Compilation
Data Layout Optimization for Portable Performance
Automatic Data Layout Optimizations for GPUs
Performance Impacts with Reliable Parallel File Systems at Exascale Level
Rapid Tomographic Image Reconstruction via Large-Scale Parallelization
Software consolidation as an efficient energy and cost Saving Solution for a SaaS/PaaS Cloud Model
VMPlaceS A Generic Tool to Investigate and Compare VM Placement Algorithms
A Connectivity Model for Agreement in Dynamic Systems
DFEP: Distributed Funding-based Edge Partitioning
PR-STM: Priority Rule Based Software Transactions on the GPU
Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems
A Practical Transactional Memory Interface
A Multicore Parallelization of Continuous Skyline Queries on Data Streams
A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures
A Composable Deadlock-Free Approach to Object-Based Isolation
Scalable Data-Driven PageRank: Algorithms, System Issues & Lessons Learned
How Many Threads Will Be Too Many? On the Scalability of OpenMP Implementations
Efficient Nested Dissection for Multicore Architectures
Scheduling Trees of Malleable Tasks for Sparse Linear Algebra
Elastic Tasks: Unifying Task Parallelism and SPMD Parallelism with an Adaptive Runtime
Semi-discrete Matrix-Free Formulation of 3D Elastic Full Waveform Inversion Modeling
10,000 Performance Models per Minute
Scalability of the UG4 Simulation Framework
Exploiting Task-Based Parallelism in Bayesian Uncertainty Quantification
Parallelization of an Advection-Diffusion Problem Arising in Edge Plasma Physics Using Hybrid MPI/OpenMP Programming
Behavioral Non-Portability in Scientific Numeric Computing
Fast Parallel Suffix Array on the GPU
Effective Barrier Synchronization on Intel Xeon Phi Coprocessor
High Performance Multi-GPU SpMV for Multi-component PDE-based Applications
Accelerating Lattice Boltzmann Applications with OpenACC
High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters
Improving Performance of Convolutional Neural Networks by Separable Filters on GPU
Iterative Sparse Triangular Solves for Preconditioning
Targeting the Parallella
Systematic Fusion of CUDA Kernels for Iterative Sparse Linear System Solvers
Efficient Execution of Multiple CUDA Applications using Transparent Suspend, Resume and Migration.