Linked e-resources
Details
Table of Contents
Intro
Preface
Contents
Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation
1 Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation
1.1 Response-Based Knowledge Distillation
1.2 Feature-Based Knowledge Distillation
1.3 Relation-Based Knowledge Distillation
2 Distillation Schemes
2.1 Offline Knowledge Distillation
2.2 Online Knowledge Distillation
2.3 Self-knowledge Distillation
2.4 Comprehensive Comparison
3 Distillation Algorithms
3.1 Multi-teacher Distillation
3.2 Cross-Modal Distillation
3.3 Attention-Based Distillation
3.4 Data-Free Distillation
3.5 Adversarial Distillation
4 Conclusion
References
A Geometric Perspective on Feature-Based Distillation
1 Introduction
2 Prior Art on Feature-Based Knowledge Distillation
2.1 Definitions
2.2 Related Work
3 Geometric Considerations on FKD
3.1 Local Manifolds and FKD
3.2 Manifold-Manifold Distance Functions
3.3 Interpretation of Graph Reordering as a Tool Measuring Similarity
4 Formulating Geometric FKD Loss Functions
4.1 Neighboring Pattern Loss
4.2 Affinity Contrast Loss
5 Experimental Verification
5.1 Materials and Methods
5.2 Knowledge Distillation from Large Teacher to Small Student Models
5.3 Comparison with Vanilla Knowledge Distillation
5.4 Knowledge Distillation Between Large Models
5.5 Effects of Neighborhood
6 Case Study: Geometric FKD in Data-Free Knowledge Transfer Between Architectures. An Application in Offline Signature Verification
6.1 Problem Formulation
6.2 Experimental Setup
6.3 Results
7 Discussion
8 Conclusions
References
Knowledge Distillation Across Vision and Language
1 Introduction
2 Vision Language Learning and Contrastive Distillation
2.1 Vision and Language Representation Learning
2.2 Contrastive Learning and Knowledge Distillation
2.3 Contrastive Distillation for Self-supervised Learning
3 Contrastive Distillation for Vision Language Representation Learning
3.1 DistillVLM
3.2 Attention Distribution Distillation
3.3 Hidden Representation Distillation
3.4 Classification Distillation
4 Experiments
4.1 Datasets
4.2 Implementation Details Visual Representation
4.3 VL Pre-training and Distillation
4.4 Transferring to Downstream Tasks
4.5 Experimental Results
4.6 Distillation over Different Losses
4.7 Different Distillation Strategies
4.8 Is VL Distillation Data Efficient?
4.9 Results for Captioning
5 VL Distillation on Unified One-Stage Architecture
5.1 One-Stage VL Architecture
5.2 VL Distillation on One-Stage Architecture
6 Conclusion and Future Works
References
Knowledge Distillation in Granular Fuzzy Models by Solving Fuzzy Relation Equations
1 Introduction
2 Related Works
2.1 Knowledge Granularity in Transfer Learning
Preface
Contents
Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation
1 Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation
1.1 Response-Based Knowledge Distillation
1.2 Feature-Based Knowledge Distillation
1.3 Relation-Based Knowledge Distillation
2 Distillation Schemes
2.1 Offline Knowledge Distillation
2.2 Online Knowledge Distillation
2.3 Self-knowledge Distillation
2.4 Comprehensive Comparison
3 Distillation Algorithms
3.1 Multi-teacher Distillation
3.2 Cross-Modal Distillation
3.3 Attention-Based Distillation
3.4 Data-Free Distillation
3.5 Adversarial Distillation
4 Conclusion
References
A Geometric Perspective on Feature-Based Distillation
1 Introduction
2 Prior Art on Feature-Based Knowledge Distillation
2.1 Definitions
2.2 Related Work
3 Geometric Considerations on FKD
3.1 Local Manifolds and FKD
3.2 Manifold-Manifold Distance Functions
3.3 Interpretation of Graph Reordering as a Tool Measuring Similarity
4 Formulating Geometric FKD Loss Functions
4.1 Neighboring Pattern Loss
4.2 Affinity Contrast Loss
5 Experimental Verification
5.1 Materials and Methods
5.2 Knowledge Distillation from Large Teacher to Small Student Models
5.3 Comparison with Vanilla Knowledge Distillation
5.4 Knowledge Distillation Between Large Models
5.5 Effects of Neighborhood
6 Case Study: Geometric FKD in Data-Free Knowledge Transfer Between Architectures. An Application in Offline Signature Verification
6.1 Problem Formulation
6.2 Experimental Setup
6.3 Results
7 Discussion
8 Conclusions
References
Knowledge Distillation Across Vision and Language
1 Introduction
2 Vision Language Learning and Contrastive Distillation
2.1 Vision and Language Representation Learning
2.2 Contrastive Learning and Knowledge Distillation
2.3 Contrastive Distillation for Self-supervised Learning
3 Contrastive Distillation for Vision Language Representation Learning
3.1 DistillVLM
3.2 Attention Distribution Distillation
3.3 Hidden Representation Distillation
3.4 Classification Distillation
4 Experiments
4.1 Datasets
4.2 Implementation Details Visual Representation
4.3 VL Pre-training and Distillation
4.4 Transferring to Downstream Tasks
4.5 Experimental Results
4.6 Distillation over Different Losses
4.7 Different Distillation Strategies
4.8 Is VL Distillation Data Efficient?
4.9 Results for Captioning
5 VL Distillation on Unified One-Stage Architecture
5.1 One-Stage VL Architecture
5.2 VL Distillation on One-Stage Architecture
6 Conclusion and Future Works
References
Knowledge Distillation in Granular Fuzzy Models by Solving Fuzzy Relation Equations
1 Introduction
2 Related Works
2.1 Knowledge Granularity in Transfer Learning