Linked e-resources
Details
Table of Contents
Intro
Preface
Organization
Contents
Part I
Contents
Part II
Crossed-Time Delay Neural Network for Speaker Recognition
1 Introduction
2 Baseline Models
3 Crossed-Time Delay Neural Network
3.1 Crossed-Time Delay Layer
3.2 Statistical Concatenation
4 Experiments
4.1 Preprocessing
4.2 Model Configuration
4.3 Training Parameters Settings
4.4 Embedding Extraction and Verification
5 Results
5.1 VoxCeleb1
5.2 Vcc2016
6 Conclusion
References
An Asymmetric Two-Sided Penalty Term for CT-GAN
1 Introduction
2 Background
2.1 WGAN
2.2 WGAN-GP
2.3 CT-GAN
3 Our Approach
3.1 Asymmetric Two-Sided Penalty
3.2 WGAN with Asymmetric Two-Sided Penalty
4 Experiments
4.1 Datasets and Evaluation
4.2 Results
5 Conclusion
References
Fast Discrete Matrix Factorization Hashing for Large-Scale Cross-Modal Retrieval
1 Introduction
2 Proposed Method
2.1 Problem Formulation
2.2 Fast Discrete Matrix Factorization Hashing
2.3 Optimization Algorithm
2.4 Out-of-Sample Extension
3 Experiment
3.1 Experiment Settings
3.2 Experimental Results
3.3 Parameter Sensitivity Analysis
3.4 Time Cost Analysis
4 Conclusion
References
Fast Optimal Transport Artistic Style Transfer
1 Introduction
2 Related Work
3 Methodology
3.1 Fast Style Transfer Framework
3.2 Learn to Style Transfer via Optimal Transport
3.3 Optimization Objectives
4 Experiments
4.1 Implementation Details
4.2 Qualitative Analysis
4.3 Quantitative Analysis
4.4 Ablation Study
5 Conclusion
References
Stacked Sparse Autoencoder for Audio Object Coding
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Structure of SSAE-SAOC
3.2 Architecture of Stacked Sparse Autoencoder
4 Experimental Evaluation
4.1 Experiments Conditions
4.2 SSAE Model Training
4.3 Test Results and Data Analysis
5 Conclusions
References
A Collaborative Multi-modal Fusion Method Based on Random Variational Information Bottleneck for Gesture Recognition
1 Introduction
2 Related Work
3 Methodology
3.1 Variational Information Bottleneck
3.2 Random Variational Information Bottleneck
4 Experiment
4.1 Data Processing
4.2 Experimental Analysis
5 Conclusion
References
Frame Aggregation and Multi-modal Fusion Framework for Video-Based Person Recognition
1 Introduction
2 Related Work
3 Our Framework
3.1 Overview
3.2 AttentionVLAD for Frame Aggregation
3.3 MLMA for Multi-modal Fusion
4 Experiments
4.1 Dataset
4.2 Results
4.3 Implementation Details
4.4 Ablation Study
5 Conclusion
References
An Adaptive Face-Iris Multimodal Identification System Based on Quality Assessment Network
1 Introduction
2 Proposed System
2.1 Preprocessing
2.2 Feature Extraction
2.3 Matching
2.4 FaceIrisQANet
2.5 Fusion and Decision.
Preface
Organization
Contents
Part I
Contents
Part II
Crossed-Time Delay Neural Network for Speaker Recognition
1 Introduction
2 Baseline Models
3 Crossed-Time Delay Neural Network
3.1 Crossed-Time Delay Layer
3.2 Statistical Concatenation
4 Experiments
4.1 Preprocessing
4.2 Model Configuration
4.3 Training Parameters Settings
4.4 Embedding Extraction and Verification
5 Results
5.1 VoxCeleb1
5.2 Vcc2016
6 Conclusion
References
An Asymmetric Two-Sided Penalty Term for CT-GAN
1 Introduction
2 Background
2.1 WGAN
2.2 WGAN-GP
2.3 CT-GAN
3 Our Approach
3.1 Asymmetric Two-Sided Penalty
3.2 WGAN with Asymmetric Two-Sided Penalty
4 Experiments
4.1 Datasets and Evaluation
4.2 Results
5 Conclusion
References
Fast Discrete Matrix Factorization Hashing for Large-Scale Cross-Modal Retrieval
1 Introduction
2 Proposed Method
2.1 Problem Formulation
2.2 Fast Discrete Matrix Factorization Hashing
2.3 Optimization Algorithm
2.4 Out-of-Sample Extension
3 Experiment
3.1 Experiment Settings
3.2 Experimental Results
3.3 Parameter Sensitivity Analysis
3.4 Time Cost Analysis
4 Conclusion
References
Fast Optimal Transport Artistic Style Transfer
1 Introduction
2 Related Work
3 Methodology
3.1 Fast Style Transfer Framework
3.2 Learn to Style Transfer via Optimal Transport
3.3 Optimization Objectives
4 Experiments
4.1 Implementation Details
4.2 Qualitative Analysis
4.3 Quantitative Analysis
4.4 Ablation Study
5 Conclusion
References
Stacked Sparse Autoencoder for Audio Object Coding
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Structure of SSAE-SAOC
3.2 Architecture of Stacked Sparse Autoencoder
4 Experimental Evaluation
4.1 Experiments Conditions
4.2 SSAE Model Training
4.3 Test Results and Data Analysis
5 Conclusions
References
A Collaborative Multi-modal Fusion Method Based on Random Variational Information Bottleneck for Gesture Recognition
1 Introduction
2 Related Work
3 Methodology
3.1 Variational Information Bottleneck
3.2 Random Variational Information Bottleneck
4 Experiment
4.1 Data Processing
4.2 Experimental Analysis
5 Conclusion
References
Frame Aggregation and Multi-modal Fusion Framework for Video-Based Person Recognition
1 Introduction
2 Related Work
3 Our Framework
3.1 Overview
3.2 AttentionVLAD for Frame Aggregation
3.3 MLMA for Multi-modal Fusion
4 Experiments
4.1 Dataset
4.2 Results
4.3 Implementation Details
4.4 Ablation Study
5 Conclusion
References
An Adaptive Face-Iris Multimodal Identification System Based on Quality Assessment Network
1 Introduction
2 Proposed System
2.1 Preprocessing
2.2 Feature Extraction
2.3 Matching
2.4 FaceIrisQANet
2.5 Fusion and Decision.