Sound

Authors and titles for recent submissions

Mon, 20 May 2024
Fri, 17 May 2024
Thu, 16 May 2024
Wed, 15 May 2024
Tue, 14 May 2024

[ total of 44 entries: 1-25 | 26-44 ]
[ showing 25 entries per page: fewer | more | all ]

Mon, 20 May 2024

[1] arXiv:2405.10502 (cross-list from cs.HC) [pdf, other]: Title: Enhancing DMI Interactions by Integrating Haptic Feedback for Intricate Vibrato Technique

Authors: Ziyue Piao, Christian Frisson, Bavo Van Kerrebroeck, Marcelo M.Wanderley

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD)

Fri, 17 May 2024

[2] arXiv:2405.10211 [pdf, ps, other]: Title: Building a Luganda Text-to-Speech Model From Crowdsourced Data

Authors: Sulaiman Kagumire, Andrew Katumba, Joyce Nakatumba-Nabende, John Quinn

Comments: Presented at the AfricaNLP workshop at ICLR 2024

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[3] arXiv:2405.09901 [pdf, other]: Title: Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models

Authors: Ziyu Wang, Lejun Min, Gus Xia

Comments: Proceedings of the International Conference on Learning Representations (ICLR 2024)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4] arXiv:2405.10272 (cross-list from cs.CV) [pdf, other]: Title: Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[5] arXiv:2405.10084 (cross-list from eess.AS) [pdf, other]: Title: Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation

Authors: Manh Luong, Khai Nguyen, Nhat Ho, Reza Haf, Dinh Phung, Lizhen Qu

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[6] arXiv:2405.10025 (cross-list from cs.CL) [pdf, other]: Title: Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models

Authors: Yuchen Hu, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng, Ruizhe Li

Comments: 14 pages, Accepted by ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:2405.10022 (cross-list from eess.AS) [pdf, other]: Title: Monaural speech enhancement on drone via Adapter based transfer learning

Authors: Xingyu Chen, Hanwen Bi, Wei-Ting Lai, Fei Ma

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2405.10018 (cross-list from eess.AS) [pdf, other]: Title: Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

Authors: Florian Schmid, Paul Primus, Toni Heittola, Annamaria Mesaros, Irene Martín-Morató, Khaled Koutini, Gerhard Widmer

Comments: Task Description Page: this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2405.09940 (cross-list from eess.AS) [pdf, other]: Title: Robust Singing Voice Transcription Serves Synthesis

Authors: Ruiqi Li, Yu Zhang, Yongqi Wang, Zhiqing Hong, Rongjie Huang, Zhou Zhao

Comments: ACL 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[10] arXiv:2405.09814 (cross-list from cs.GR) [pdf, other]: Title: Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis

Authors: Zeyi Zhang, Tenglong Ao, Yuyao Zhang, Qingzhe Gao, Chuan Lin, Baoquan Chen, Libin Liu

Comments: SIGGRAPH 2024 (Journal Track); Project page: this https URL

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2405.09768 (cross-list from eess.AS) [pdf, other]: Title: Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model

Authors: Siyang Wang, Éva Székely

Comments: 11 pages, 4 figures. Language Resources and Evaluation Conference (LREC) 2024. demo: this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12] arXiv:2405.09589 (cross-list from cs.LG) [pdf, other]: Title: Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Review

Authors: Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, Aman Chadha

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13] arXiv:2405.09570 (cross-list from eess.SP) [pdf, other]: Title: FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time

Authors: Md Jobayer, Md. Mehedi Hasan Shawon, Md Rakibul Hasan, Shreya Ghosh, Tom Gedeon, Md Zakir Hossain

Comments: 8-page main paper and 4-page supplementary material

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 16 May 2024

[14] arXiv:2405.09470 [pdf, other]: Title: Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

Authors: Weifei Jin, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu

Comments: Accepted to SecTL (AsiaCCS Workshop) 2024

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:2405.09241 [pdf, other]: Title: SMUG-Explain: A Framework for Symbolic Music Graph Explanations

Authors: Emmanouil Karystinaios, Francesco Foscarin, Gerhard Widmer

Comments: In Proceedings of the Sound and Music Computing Conference 2024 (SMC2024), Porto, Portugal

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:2405.09224 [pdf, other]: Title: Perception-Inspired Graph Convolution for Music Understanding Tasks

Authors: Emmanouil Karystinaios, Francesco Foscarin, Gerhard Widmer

Comments: Accepted at the 33rd International Joint Conference on Artificial Intelligence (IJCAI-24)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17] arXiv:2405.09171 [pdf, other]: Title: Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis

Authors: Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

Comments: This is accepted to IEEE ICASSP 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:2405.09062 [pdf, other]: Title: Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

Authors: Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Taketo Akama

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19] arXiv:2405.08838 [pdf, other]: Title: PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset

Authors: Yang Hou, Haitao Fu, Chuankai Chen, Zida Li, Haoyu Zhang, Jianjun Zhao

Comments: 13 page, 4 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20] arXiv:2405.09266 (cross-list from cs.CV) [pdf, other]: Title: Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

Authors: Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai

Comments: 11 pages, 6 figures, demo page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2405.09142 (cross-list from eess.AS) [pdf, other]: Title: Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization

Authors: Jenthe Thienpondt, Kris Demuynck

Comments: Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Wed, 15 May 2024 (showing first 4 of 10 entries)

[22] arXiv:2405.08679 [pdf, other]: Title: Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

Comments: Self-supervision in Audio, Speech and Beyond workshop, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[23] arXiv:2405.08596 [src]: Title: EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark

Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao

Comments: This paper need more modification

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2405.08342 [pdf, ps, other]: Title: Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision Transformer

Authors: Whenty Ariyanti, Kai-Chun Liu, Kuan-Yu Chen, Yu Tsao

Comments: Published in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Journal-ref: 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (2023) 1-4

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[25] arXiv:2405.08021 [pdf, other]: Title: Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion

Authors: Zhao Ren, Kevin Scheck, Qinhan Hou, Stefano van Gogh, Michael Wand, Tanja Schultz

Comments: Accepted by EMBC 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mon, 20 May 2024
Fri, 17 May 2024
Thu, 16 May 2024
Wed, 15 May 2024
Tue, 14 May 2024

[ total of 44 entries: 1-25 | 26-44 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2405, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions

Mon, 20 May 2024

Fri, 17 May 2024

Thu, 16 May 2024

Wed, 15 May 2024 (showing first 4 of 10 entries)