在openi启智社区的dcu bw1000使用llama.cpp推理 stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ（失败）

优质文章学习记录

06 Apr 2026 — 4 min read

openi启智社区的dcu新推出 bw1000计算卡，不耗费积分，可以可劲用！

但是提供的镜像只有一个，感觉用起来很麻烦....

用llmfit看看模型情况

llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

=== stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ ===

Provider: stelterlab
Parameters: 4.6B
Quantization: Q4_K_M
Best Quant: Q8_0
Context Length: 262144 tokens
Use Case: Code generation and completion
Category: Coding
Released: 2025-07-31
Runtime: llama.cpp (est. ~17.2 tok/s)

Score Breakdown:
Overall Score: 66.7 / 100
Quality: 68 Speed: 43 Fit: 61 Context: 100
Estimated Speed: 17.2 tok/s

Resource Requirements:
Min VRAM: 2.4 GB
Min RAM: 2.6 GB (CPU inference)
Recommended RAM: 4.3 GB

MoE Architecture:
Experts: 8 active / 128 total per token
Active VRAM: 0.5 GB (vs 2.4 GB full model)

Fit Analysis:
Status: 🟡 Good
Run Mode: CPU+GPU
Memory Utilization: 0.6% (2.6 / 405.5 GB)

Notes:
MoE: insufficient VRAM for expert offloading
Spilling entire model to system RAM
Performance will be significantly reduced
Best quantization for hardware: Q8_0 (model default: Q4_K_M)
Estimated speed: 17.2 tok/s

安装llama.cpp

下载 llama.cpp源代码

git clone https://gitcode.com/GitHub_Trending/ll/llama.cpp

编译llama.cpp

cd llama.cpp cmake -B build cmake --build build --config Release

加入路径

export PATH=/root/llama.cpp/build/bin:$PATH

或者也可以直击用make install

cd build make install

但是安装好后报错

oot@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-cli llama-cli: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-gguf llama-gguf: error while loading shared libraries: libggml-base.so.0: cannot open shared object file: No such file or directory

原来是没有把路径加入的缘故，加入路径，问题解决：

export PATH=/root/llama.cpp/build/bin:$PATH

模型下载

安装modelscope

pip install modelscope

下载

from modelscope import snapshot_download snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")

推理

用llama-cli推理

llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

报错：

root@crdnotebook-2027598444851879937-denglf-12859:~# llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
srv load_model: failed to load model, 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'

Failed to load the model

看了一下，应该是这个模型： stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

问题是这个模型魔搭没有....

尝试用transformers推理

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Write a quick sort algorithm." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=65536 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True) print("content:", content)

也是失败

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

总结

没调通，先搁置

llama.cpp是因为魔搭没有那个模型，所以模型不匹配

transformers是因为库的问题，需要重新安装torch等库，导致需要的库无法安装上，推理失败。

调试

报错ImportError: Loading an AWQ quantized model requires gptqmodel.

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

安装提示执行

pip install gptqmodel

安装失败，

 Exception: Unable to detect torch version via uv/pip/conda/importlib. Please install torch >= 2.7.1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel

用conda试试

conda install gptqmodel

也失败了。

PackagesNotFoundError: The following packages are not available from current channels:

- gptqmodel

春晚不用抢红包，全在刷AI？豆包和机器人疯传，2026普通人逆袭就靠这“三字经”

节目里的机器人不仅会后空翻，还能听懂蔡明的相声包袱，那一夜，科技的温度第一次盖过了除夕的烟火。当王菲的天籁之音还在演播大厅回荡，当李健的《人间共鸣》刚刚唱罢，2026年的春晚留给观众的，除了熟悉的年味，还有一种“未来已来”的具象冲击。今年春晚的“隐藏主角”不再是某款饮料或电商平台，而是看不见摸不着却无处不在的AI。如果你错过了今年的春晚，你可能不仅仅错过了一台晚会，而是错过了理解接下来五年财富逻辑的关键信号。AI不再是极客手中的玩具，它正在以春晚为原点，迅速“飞入寻常百姓家”。 01、现象复盘：今年的春晚，不只是“看”，更是“用” 今年的春晚，科技感并非只是舞台上的炫酷特效，更是一次全民的AI应用启蒙。首先是无处不在的AI大模型。作为独家AI云合作伙伴，火山引擎的豆包大模型贯穿了晚会全流程-1。在小品《奶奶的最爱》中，蔡明与“数字双胞胎”的互动，以及那些声音稚嫩的机器人小朋友，其声音正是由豆包的语音合成模型生成的-1。节目能精准理解蔡明的“包袱”，靠的正是AI对复杂语义的精准识别。这不仅仅是提前录好的配音，而是现场实时生成的“

FPGA教程系列-Vivado Aurora 8B／10B IP核设置

FPGA教程系列-Vivado Aurora 8B／10B IP核设置 Aurora 8B/10B 是 Xilinx 开发的一种轻量级、链路层的高速串行通信协议。它比单纯的 GT（Transceiver）收发器更高级（因为它帮你处理了对齐、绑定、甚至流控），但比以太网或 PCIe 更简单、延迟更低。手册看的脑袋疼，还是实操一下看看如何使用吧，可能很多部分都是官方写好的，不需要自己去弄，而实际使用可能就是修改一些参数就行了。 1. Physical Layer (物理层设置) 这一部分直接决定了底层的硬件连接和电气特性，必须严格按照板卡设计和对端设备来配置。 Lane Width (Bytes) [通道宽度]： 2 或 4。决定了用户逻辑接口（AXI-Stream）的数据位宽，也直接影响 user_clk 的频率。 * 2 Bytes：

FPGA入门指南：从点亮第一颗LED开始（手把手教程）

文章目录 * 一、到底啥是FPGA？（电子工程师的乐高） * 二、开发环境搭建（Vivado安装避坑指南） * 1. 安装包获取 * 2. 硬件准备（别急着买开发板！） * 3. 第一个工程创建 * 三、Verilog速成秘籍（记住这10个关键词） * 四、实战：LED流水灯（代码+仿真+烧录） * 1. 代码实现（带注释版） * 2. 仿真测试（Modelsim技巧） * 3. 上板验证（真实硬件操作） * 五、学习路线图（避免走弯路！） * 阶段一：数字电路基础 * 阶段二：Verilog进阶 * 阶段三：实战项目 * 推荐学习资源： * 六、新手常见坑点（血泪经验）一、到底啥是FPGA？（电子工程师的乐高）刚接触硬件的同学可能会懵：这货和单片机有啥区别？

dify接入企业微信群聊机器人详细步骤（从零到上线全记录）

第一章：dify接入企业微信群聊机器人详细步骤（从零到上线全记录）准备工作：获取企业微信机器人Webhook URL 在企业微信管理后台创建群聊机器人，获取唯一的 Webhook 地址。该地址用于外部系统向指定群组发送消息。登录企业微信 → 进入“应用管理” → 创建或选择一个自建应用 → 添加“群机器人”，复制生成的 Webhook URL。配置Dify工作流触发外部通知在 Dify 中设置自定义响应后处理逻辑，通过 HTTP 请求将输出内容推送到企业微信群。使用内置的“HTTP 请求”节点，填写以下参数： * Method: POST * URL: 企业微信机器人的 Webhook 地址 * Body (JSON): 包含要发送的消息内容 { "msgtype": "text", "text"