Index-TTS2的安装和使用

439 阅读7分钟

一、参考文档

github.com/index-tts/i…

二、下载源码:

git clone https://github.com/index-tts/index-tts.git
cd index-tts
git lfs pull  # download large repository files

如果下载不了,则去下载zip文件后解压。我是在windows机器上面执行完后采用xftp将indext-tts目录整体传送到Linxu机器上的。

官方强烈建议采用uv工具安装,因此这里也是采用uv安装、运行程序。

conda create -n index-tts2 python=3.10
conda activate index-tts2
pip install -U uv -i https://mirrors.aliyun.com/pypi/simple
如果报错,则直接下载https://astral.sh/uv/install.sh,然后执行chmod +x uv-installer.sh,./uv-installer.sh

#根据配置文件(pyproject.toml、requirements.txt 等),安装所有必需依赖和所有可选依赖组,并生成锁文件确保环境一致性
uv sync --all-extras
如果上面的uv sync --all-extras安装有问题,则用下面的2个命令之一尝试:
uv sync --all-extras --default-index "https://mirrors.aliyun.com/pypi/simple"
uv sync --all-extras --default-index "https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"

二、下载模型:

# 安装 modelscope 工具
uv tool install "modelscope"
如果安装较慢,换如下命令:
uv tool install "modelscope" --default-index "https://mirrors.aliyun.com/pypi/simple"

安装有如下警告信息:
warning: `/root/.local/bin` is not on your PATH. To use installed tools, run `export PATH="/root/.local/bin:$PATH"` or `uv tool update-shell`.

按照提示执行:
export PATH="/root/.local/bin:$PATH"
 
# 下载 IndexTTS-2 模型到 checkpoints 目录
modelscope download --model IndexTeam/IndexTTS-2 --local_dir checkpoints

除了上述模型外,项目首次运行时还会自动下载一些小型模型。如果你的网络环境访问 HuggingFace 较慢,建议在运行代码前先执行以下命令:
export HF_ENDPOINT="https://hf-mirror.com"

这个下载模型时间比较长,慢慢等待下载完毕。

三、环境要求

要求cuda 12.8及更高版本

10G左右的显存

四、使用

要运行脚本,你必须使用 uv run <file.py> 命令,以确保代码在你当前的 "uv" 环境中执行。有时可能还需要将当前目录添加到你的 PYTHONPATH 环境变量中,以帮助程序找到 IndexTTS 模块:

cd /data4/index-tts/

PYTHONPATH="$PYTHONPATH:." uv run indextts/infer_v2.py

PYTHONPATH="$PYTHONPATH:." uv run indextts/infer_cctv.py

第一次运行过程会比较慢,慢慢等待,其会下载模型,比如:model.safttensors (2.32G)

RTF: 0.6080

4.1.使用单个参考音频文件合成新的语音(语音克隆):

from indextts.infer_v2 import IndexTTS2
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", use_fp16=False, use_cuda_kernel=False, use_deepspeed=False)
text = "Translate for me, what is a surprise!"
tts.infer(spk_audio_prompt='examples/voice_01.wav', text=text, output_path="gen.wav", verbose=True)

4.2.使用独立的、带有情感的参考音频文件来调节语音合成:

from indextts.infer_v2 import IndexTTS2
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", use_fp16=False, use_cuda_kernel=False, use_deepspeed=False)
text = "酒楼丧尽天良,开始借机竞拍房间,哎,一群蠢货。"
tts.infer(spk_audio_prompt='examples/voice_07.wav', text=text, output_path="gen.wav", emo_audio_prompt="examples/emo_sad.wav", verbose=True)

4.3.当指定情感参考音频文件时,你可以选择设置 emo_alpha 参数来调整其对合成结果的影响程度。有效取值范围为 0.0 至 1.0,默认值为 1.0(即 100%):

from indextts.infer_v2 import IndexTTS2
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", use_fp16=False, use_cuda_kernel=False, use_deepspeed=False)
text = "酒楼丧尽天良,开始借机竞拍房间,哎,一群蠢货。"
tts.infer(spk_audio_prompt='examples/voice_07.wav', text=text, output_path="gen.wav", emo_audio_prompt="examples/emo_sad.wav", emo_alpha=0.9, verbose=True)

4.4.也可以省略情感参考音频,转而提供一个包含8个浮点数的列表来指定每种情感的强度,顺序如下:[高兴、愤怒、悲伤、害怕、厌恶、忧郁、惊讶、平静]。此外,你还可以使用 use_random 参数在推理过程中引入随机性;默认值为 False,将其设置为 True 即可启用随机性:

from indextts.infer_v2 import IndexTTS2
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", use_fp16=False, use_cuda_kernel=False, use_deepspeed=False)
text = "哇塞!这个爆率也太高了!欧皇附体了!"
tts.infer(spk_audio_prompt='examples/voice_10.wav', text=text, output_path="gen.wav", emo_vector=[0, 0, 0, 0, 0, 0, 0.45, 0], use_random=False, verbose=True)

注意:启用随机采样将降低语音合成的语音克隆保真度

from indextts.infer_v2 import IndexTTS2
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", use_fp16=False, use_cuda_kernel=False, use_deepspeed=False)
text = "哇塞!这个爆率也太高了!欧皇附体了!"
tts.infer(spk_audio_prompt='examples/voice_10.wav', text=text, output_path="gen.wav", emo_vector=[0, 0, 0, 0, 0, 0, 0.45, 0], use_random=False, verbose=True)

4.5.或者,你可以启用 use_emo_text 参数,根据你提供的文本脚本来引导情感表达。随后,你的文本脚本将自动转换为情感向量。建议在使用文本情感模式时,将 emo_alpha 设置为约 0.6(或更低),以使语音听起来更自然。你也可以通过 use_random 参数引入随机性(默认值为 False;设为 True 则启用随机性):

from indextts.infer_v2 import IndexTTS2
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", use_fp16=False, use_cuda_kernel=False, use_deepspeed=False)
text = "快躲起来!是他要来了!他要来抓我们了!"
tts.infer(spk_audio_prompt='examples/voice_12.wav', text=text, output_path="gen.wav", emo_alpha=0.6, use_emo_text=True, use_random=False, verbose=True)

4.6.也可以通过 emo_text 参数直接提供特定的文本情感描述。随后,你的情感文本将自动转换为情感向量。这样,你可以分别控制文本脚本和文本情感描述:

from indextts.infer_v2 import IndexTTS2
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", use_fp16=False, use_cuda_kernel=False, use_deepspeed=False)
text = "快躲起来!是他要来了!他要来抓我们了!"
emo_text = "你吓死我了!你是鬼吗?"
tts.infer(spk_audio_prompt='examples/voice_12.wav', text=text, output_path="gen.wav", emo_alpha=0.6, use_emo_text=True, emo_text=emo_text, use_random=False, verbose=True)

注意:IndexTTS2 仍支持汉字与拼音的混合建模。当需要精确控制发音时,请提供带有特定拼音标注的文本来启用拼音控制功能。请注意,拼音控制并非对所有可能的声母-韵母组合都有效;仅支持有效的汉语拼音情况。完整有效条目列表请参考 checkpoints/pinyin.vocab。

之前你做DE5很好,所以这一次也DEI3做DE2很好才XING2,如果这次目标完成得不错的话,我们就直接打DI1去银行取钱。

六、重装CUDA和cuDNN

运行过程中报错

Traceback (most recent call last):
  File "/data4/index-tts/indextts/infer_v2.py", line 842, in <module>
    tts.infer(spk_audio_prompt=prompt_wav, text=text, output_path="gen.wav", verbose=True)
  File "/data4/index-tts/indextts/infer_v2.py", line 372, in infer
    return list(self.infer_generator(
  File "/data4/index-tts/indextts/infer_v2.py", line 444, in infer_generator
    spk_cond_emb = self.get_emb(input_features, attention_mask)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/data4/index-tts/indextts/infer_v2.py", line 219, in get_emb
    vq_emb = self.semantic_model(
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py", line 1027, in forward
    encoder_outputs = self.encoder(
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py", line 533, in forward
    layer_outputs = layer(
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py", line 452, in forward
    hidden_states = self.conv_module(hidden_states, attention_mask=conv_attention_mask)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py", line 208, in forward
    hidden_states = self.pointwise_conv1(hidden_states)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 371, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/data4/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 366, in _conv_forward
    return F.conv1d(
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

### 6.1.重装cuda 12.9.1:

查看安装的显卡驱动版本:nvidia-smi

Driver Version: 570.181

根据显卡驱动版本安装适配的CUDA版本:

在这个 docs.nvidia.com/cuda/cuda-t…

driver_cuda.png

所以需要安装12.x版本的CUDA。

然后去在developer.nvidia.com/cuda-toolki…

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.9.1/local_installers/cuda-repo-ubuntu2204-12-9-local_12.9.1-575.57.08-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-9-local_12.9.1-575.57.08-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-9-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-9

6.2.重装cudnn

developer.nvidia.com/cudnn-downl… 选择不同的参数其最终安装的命令会不一样:

cuDNN_download.png

wget https://developer.download.nvidia.com/compute/cudnn/9.17.0/local_installers/cudnn-local-repo-ubuntu2204-9.17.0_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2204-9.17.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-9.17.0/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn

若要安装适用于 CUDA 12 的版本,请执行上述配置,但安装 CUDA 12 特定软件包:

sudo apt-get -y install cudnn9-cuda-12

重装CUDA和cuDNN后,重新执行一次uv同步:

uv sync --all-extras --default-index "https://mirrors.aliyun.com/pypi/simple"