001467902 000__ 05472cam\\22006377a\4500 001467902 001__ 1467902 001467902 003__ OCoLC 001467902 005__ 20230707003344.0 001467902 006__ m\\\\\o\\d\\\\\\\\ 001467902 007__ cr\un\nnnunnun 001467902 008__ 230515s2023\\\\si\\\\\\o\\\\\101\0\eng\d 001467902 020__ $$a9789819924011$$q(electronic bk.) 001467902 020__ $$a9819924014$$q(electronic bk.) 001467902 020__ $$z9819924006 001467902 020__ $$z9789819924004 001467902 0247_ $$a10.1007/978-981-99-2401-1$$2doi 001467902 035__ $$aSP(OCoLC)1379018201 001467902 040__ $$aYDX$$beng$$cYDX$$dGW5XE 001467902 049__ $$aISEA 001467902 050_4 $$aQA76.9.N38 001467902 08204 $$a006.3/5$$223/eng/20230524 001467902 1112_ $$aNCMMSC (Conference)$$n(17th :$$d2022 :$$cHefei Shi, China) 001467902 24510 $$aMan-machine speech communication :$$b17th national conference, NCMMSC 2022, Hefei, China, December 15-18, 2022 proceedings /$$cLing Zhenhua, Gao Jianqing, Yu Kai, Jia Jia, editors. 001467902 2463_ $$aNCMMSC 2022 001467902 260__ $$aSingapore :$$bSpringer,$$c2023. 001467902 300__ $$a1 online resource 001467902 4901_ $$aCommunications in computer and information science ;$$v1765 001467902 500__ $$aIncludes author index. 001467902 5050_ $$aMCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation -- Baby Cry Recognition Based on Acoustic Segment Model -- A Multi-feature Sets Fusion Strategy with Similar Samples Removal for Snore Sound Classification -- Multi-Hypergraph Neural Networks for Emotion Recognition in Multi-Party Conversations -- Using Emoji as an Emotion Modality in Text-Based Depression Detection -- Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis -- Semantic enhancement framework for robust speech recognition -- Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model -- Predictive AutoEncoders are Context-Aware Unsupervised Anomalous Sound Detectors -- A pipelined framework with serialized output training for overlapping speech recognition -- Adversarial Training Based on Meta-Learning in Unseen Domains for Speaker Verification -- Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement -- Multiple Confidence Gates for Joint Training of SE and ASR -- Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion -- Pre-training Techniques For Improving Text-to-Speech Synthesis By Automatic Speech Recognition Based Data Enhancement -- A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition -- Interplay between prosody and syntax-semantics: Evidence from the prosodic features of Mandarin tag questions -- Improving Fine-grained Emotion Control and Transfer with Gated Emotion Representations in Speech Synthesis -- Violence Detection through Fusing Visual Information to Auditory Scene -- Mongolian Text-to-Speech Challenge under Low-Resource Scenario for NCMMSC2022 -- VC-AUG Voice Conversion based Data Augmentation for Text-Dependent Speaker Verication -- Transformer-based potential emotional relation mining network for emotion recognition in conversation -- FastFoley Non-Autoregressive Foley Sound Generation Based On Visual Semantics -- Structured Hierarchical Dialogue Policy with Graph Neural Networks -- Deep Reinforcement Learning for On-line Dialogue State Tracking -- Dual Learning for Dialogue State Tracking -- Automatic Stress Annotation and Prediction For Expressive Mandarin TTS -- MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset. 001467902 506__ $$aAccess limited to authorized users. 001467902 520__ $$aThis book constitutes the refereed proceedings of the 17th National Conference on ManMachine Speech Communication, NCMMSC 2022, held in China, in December 2022. The 21 full papers and 7 short papers included in this book were carefully reviewed and selected from 108 submissions. They were organized in topical sections as follows: MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation.- Baby Cry Recognition Based on Acoustic Segment Model, MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset. 001467902 588__ $$aOnline resource; title from PDF title page (SpringerLink, viewed May 24, 2023). 001467902 650_0 $$aComputational linguistics$$vCongresses. 001467902 650_0 $$aNatural language processing (Computer science)$$vCongresses. 001467902 650_0 $$aHuman-computer interaction$$vCongresses. 001467902 655_0 $$aElectronic books. 001467902 7001_ $$aZhenhua, Ling. 001467902 7001_ $$aJianqing, Gao. 001467902 7001_ $$aKai, Yu. 001467902 7001_ $$aJia, Jia. 001467902 77608 $$iPrint version: $$z9819924006$$z9789819924004$$w(OCoLC)1374244461 001467902 830_0 $$aCommunications in computer and information science ;$$v1765. 001467902 852__ $$bebk 001467902 85640 $$3Springer Nature$$uhttps://univsouthin.idm.oclc.org/login?url=https://link.springer.com/10.1007/978-981-99-2401-1$$zOnline Access$$91397441.1 001467902 909CO $$ooai:library.usi.edu:1467902$$pGLOBAL_SET 001467902 980__ $$aBIB 001467902 980__ $$aEBOOK 001467902 982__ $$aEbook 001467902 983__ $$aOnline 001467902 994__ $$a92$$bISE