By TaaLoo — 18 Feb 2025

使用lmdeploy在2080ti上部署Mini-InternVL-Chat-4B-V1-5

主要遇到的问题：

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py", line 292, in make_cubin
    subprocess.run(cmd, shell=True, check=True)
  File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/opt/conda/lib/python3.10/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_75 /tmp/tmpp0i8ee3z.ptx -o /tmp/tmpp0i8ee3z.ptx.o 2> /tmp/tmpjr5z50bl.log' returned non-zero exit status 255.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 899, in __task_callback
    task.result()
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 857, in _async_loop_background
    await self._async_step_background(
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 727, in _async_step_background
    output = await self._async_model_forward(inputs,
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/utils.py", line 234, in __tmp
    return (await func(*args, **kwargs))
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 625, in _async_model_forward
    ret = await __forward(inputs)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 602, in __forward
    return await self.model_agent.async_forward(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 256, in async_forward
    output = self._forward_impl(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 239, in _forward_impl
    output = model_forward(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 151, in model_forward
    output = model(**input_dict)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/graph_runner.py", line 141, in __call__
    return self.model(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/models/internvl.py", line 368, in forward
    return self.language_model.forward(input_ids=input_ids,
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/models/phi3.py", line 337, in forward
    hidden_states = self.model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/models/phi3.py", line 282, in forward
    hidden_states, residual = decoder_layer(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/models/phi3.py", line 185, in forward
    hidden_states = self.input_layernorm(hidden_states)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/nn/norm.py", line 78, in forward
    return self.impl.forward(x, self.weight, residual)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/norm.py", line 19, in forward
    x = rms_norm(x, weight, self.eps)
  File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/kernels/cuda/rms_norm.py", line 85, in rms_norm
    rms_norm_kernel[grid](
  File "/opt/conda/lib/python3.10/site-packages/triton/runtime/jit.py", line 345, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/triton/runtime/jit.py", line 662, in run
    kernel = self.compile(
  File "/opt/conda/lib/python3.10/site-packages/triton/compiler/compiler.py", line 282, in compile
    next_module = compile_ir(module, metadata)
  File "/opt/conda/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py", line 320, in <lambda>
    stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
  File "/opt/conda/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py", line 297, in make_cubin
    raise RuntimeError(f'Internal Triton PTX codegen error: \n{log}')
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/tmpp0i8ee3z.ptx, line 153; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 153; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 157; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 157; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 161; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 161; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 165; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 165; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 169; error   : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 169; error   : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher```

主要原因2080ti的架构是sm_75，而bfloat16需要sm_80的，所以需要修改配置。

修改Mini-InternVL-Chat-4B-V1-5/config.json 这个文件，将三处"torch_dtype": "bfloat16" 修改为 "torch_dtype": "float16"，两处 "use_bfloat16": true 修改为 "use_bfloat16": false

启动命令为：lmdeploy serve api_server Mini-InternVL-Chat-4B-V1-5 --server-port 23333 --log-level INFO

需要注意，执行命令的目录下要存在 Mini-InternVL-Chat-4B-V1-5 这个目录，可以使用:

git lfs install

git clone https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5

下载这个文件，不要使用huggingface代码自带的下载，不然会下载到~/.cache/huggingface下面，这样存在问题，模型的名字会被修改，不好找config.json这个文件。

ref

https://github.com/InternLM/lmdeploy/issues/1266

主要遇到的问题：

ref

Subscribe to TaaLoo's Blog