https://arxiv.org/abs/2106.06103
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems. In this work, we present a parallel end-to-end TTS method
arxiv.org
vits
https://music-audio-ai.tistory.com/22
[논문리뷰] Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (ICML21)
논문제목: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech 저자: Jaehyeon Kim, Jungil Kong, Juhee Son 소속: Kakao Enterprise, KAIST 발표: ICML 2021 논문: https://arxiv.org/abs/2106.06103 코드: https:
music-audio-ai.tistory.com
vits 논문 리뷰
https://arxiv.org/abs/2111.12203
KUIELab-MDX-Net: A Two-Stream Neural Network for Music Demixing
Recently, many methods based on deep learning have been proposed for music source separation. Some state-of-the-art methods have shown that stacking many layers with many skip connections improve the SDR performance. Although such a deep and complex archit
arxiv.org
mdx-net
Voice Conversion(음성 변환) 관련해서 한국분들이 쓰신 논문이 많네요
AI cover에서 목소리를 입힐 때 RVC를 많이 사용하는데 RVC는 핵심 모델은 vits입니다
원곡의 보컬과 반주를 분리할 때 UVR5를 많이 쓰는데 핵심 모델은 mdx-net입니다
'컴퓨터 > 인공지능' 카테고리의 다른 글
arXiv에 올라온 딥러닝 관련 분야별 논문 통계 볼 수 있는 사이트 - researchtrend.ai (0) | 2025.01.20 |
---|---|
인공지능 공부 방법 2 = 최신 생성형 AI 써보기 (0) | 2024.12.27 |
comfyui group 비활성화 방법 (0) | 2024.12.26 |
이미지 생성형 AI 모델 파라미터 크기, 훈련 시간, 사용 gpu 등등 정리(작성중) (1) | 2024.12.24 |
OmniGen: Unified Image Generation 논문에서 사용한 평가 지표 (1) | 2024.12.24 |