使用lmdeploy在2080ti上部署Mini-InternVL-Chat-4B-V1-5
主要遇到的问题:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py", line 292, in make_cubin
subprocess.run(cmd, shell=True, check=True)
File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '/opt/conda/lib/python3.10/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_75 /tmp/tmpp0i8ee3z.ptx -o /tmp/tmpp0i8ee3z.ptx.o 2> /tmp/tmpjr5z50bl.log' returned non-zero exit status 255.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 899, in __task_callback
task.result()
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 857, in _async_loop_background
await self._async_step_background(
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 727, in _async_step_background
output = await self._async_model_forward(inputs,
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/utils.py", line 234, in __tmp
return (await func(*args, **kwargs))
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 625, in _async_model_forward
ret = await __forward(inputs)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 602, in __forward
return await self.model_agent.async_forward(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 256, in async_forward
output = self._forward_impl(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 239, in _forward_impl
output = model_forward(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 151, in model_forward
output = model(**input_dict)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/graph_runner.py", line 141, in __call__
return self.model(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/models/internvl.py", line 368, in forward
return self.language_model.forward(input_ids=input_ids,
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/models/phi3.py", line 337, in forward
hidden_states = self.model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/models/phi3.py", line 282, in forward
hidden_states, residual = decoder_layer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/models/phi3.py", line 185, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/nn/norm.py", line 78, in forward
return self.impl.forward(x, self.weight, residual)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/norm.py", line 19, in forward
x = rms_norm(x, weight, self.eps)
File "/opt/conda/lib/python3.10/site-packages/lmdeploy/pytorch/kernels/cuda/rms_norm.py", line 85, in rms_norm
rms_norm_kernel[grid](
File "/opt/conda/lib/python3.10/site-packages/triton/runtime/jit.py", line 345, in <lambda>
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/triton/runtime/jit.py", line 662, in run
kernel = self.compile(
File "/opt/conda/lib/python3.10/site-packages/triton/compiler/compiler.py", line 282, in compile
next_module = compile_ir(module, metadata)
File "/opt/conda/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py", line 320, in <lambda>
stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
File "/opt/conda/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py", line 297, in make_cubin
raise RuntimeError(f'Internal Triton PTX codegen error: \n{log}')
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/tmpp0i8ee3z.ptx, line 153; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 153; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 157; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 157; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 161; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 161; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 165; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 165; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 169; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/tmpp0i8ee3z.ptx, line 169; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher```
主要原因2080ti的架构是sm_75,而bfloat16需要sm_80的,所以需要修改配置。
修改Mini-InternVL-Chat-4B-V1-5/config.json 这个文件,将三处"torch_dtype": "bfloat16" 修改为 "torch_dtype": "float16",两处 "use_bfloat16": true 修改为 "use_bfloat16": false
启动命令为:lmdeploy serve api_server Mini-InternVL-Chat-4B-V1-5 --server-port 23333 --log-level INFO
需要注意,执行命令的目录下要存在 Mini-InternVL-Chat-4B-V1-5 这个目录,可以使用:
git lfs install
git clone https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5
下载这个文件,不要使用huggingface代码自带的下载,不然会下载到~/.cache/huggingface下面,这样存在问题,模型的名字会被修改,不好找config.json这个文件。