在openi启智社区的dcu bw1000使用llama.cpp推理 stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ(失败)

openi启智社区的dcu新推出 bw1000计算卡,不耗费积分,可以可劲用!

但是提供的镜像只有一个,感觉用起来很麻烦....

用llmfit看看模型情况

llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

=== stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ ===

Provider: stelterlab
Parameters: 4.6B
Quantization: Q4_K_M
Best Quant: Q8_0
Context Length: 262144 tokens
Use Case: Code generation and completion
Category: Coding
Released: 2025-07-31
Runtime: llama.cpp (est. ~17.2 tok/s)

Score Breakdown:
  Overall Score: 66.7 / 100
  Quality: 68  Speed: 43  Fit: 61  Context: 100
  Estimated Speed: 17.2 tok/s

Resource Requirements:
  Min VRAM: 2.4 GB
  Min RAM: 2.6 GB (CPU inference)
  Recommended RAM: 4.3 GB

MoE Architecture:
  Experts: 8 active / 128 total per token
  Active VRAM: 0.5 GB (vs 2.4 GB full model)

Fit Analysis:
  Status: 🟡 Good
  Run Mode: CPU+GPU
  Memory Utilization: 0.6% (2.6 / 405.5 GB)

Notes:
  MoE: insufficient VRAM for expert offloading
  Spilling entire model to system RAM
  Performance will be significantly reduced
  Best quantization for hardware: Q8_0 (model default: Q4_K_M)
  Estimated speed: 17.2 tok/s

安装llama.cpp

下载 llama.cpp源代码

git clone https://gitcode.com/GitHub_Trending/ll/llama.cpp

编译llama.cpp

cd llama.cpp cmake -B build cmake --build build --config Release


加入路径
 

export PATH=/root/llama.cpp/build/bin:$PATH

或者也可以直击用make install

cd build make install 

但是安装好后报错

oot@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-cli llama-cli: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-gguf llama-gguf: error while loading shared libraries: libggml-base.so.0: cannot open shared object file: No such file or directory

原来是没有把路径加入的缘故,加入路径,问题解决:

export PATH=/root/llama.cpp/build/bin:$PATH

模型下载

安装modelscope

pip install modelscope

下载

from modelscope import snapshot_download snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")

推理

用llama-cli推理

llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

报错:

root@crdnotebook-2027598444851879937-denglf-12859:~# llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
srv    load_model: failed to load model, 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
 
Failed to load the model

看了一下,应该是这个模型: stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

问题是这个模型魔搭没有....

尝试用transformers推理

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Write a quick sort algorithm." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=65536 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True) print("content:", content) 

也是失败

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

总结

没调通,先搁置

llama.cpp是因为魔搭没有那个模型,所以模型不匹配

transformers是因为库的问题,需要重新安装torch等库,导致需要的库无法安装上,推理失败。

调试

报错ImportError: Loading an AWQ quantized model requires gptqmodel.

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

安装提示执行

pip install gptqmodel

安装失败,

 Exception: Unable to detect torch version via uv/pip/conda/importlib. Please install torch >= 2.7.1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel

用conda试试

conda install gptqmodel

也失败了。

PackagesNotFoundError: The following packages are not available from current channels:

  - gptqmodel

Read more

ZeroClaw Reflex UI完整搭建流程——ZeroClaw Gateway + LM Studio + Reflex 本地 AI 管理面板

ZeroClaw Reflex UI完整搭建流程——ZeroClaw Gateway + LM Studio + Reflex 本地 AI 管理面板

🦀 ZeroClaw Reflex UI 完整搭建流程 ZeroClaw Gateway + LM Studio + Reflex 本地 AI 管理面板 2026 年 2 月 相似项目部署参考: 【OpenClaw 本地实战 Ep.1】抛弃 Ollama?转向 LM Studio!Windows 下用 NVIDIA 显卡搭建 OpenClaw 本地极速推理服务 【OpenClaw 本地实战 Ep.2】零代码对接:使用交互式向导快速连接本地 LM Studio 用 CUDA GPU 推理 【OpenClaw 本地实战 Ep.3】突破瓶颈:强制修改

Windows安装原生Codex CLI 让你拥有更强力的AI代码助手!【支持GPT5.4、GPT5.3-codex】

Windows安装原生Codex CLI 让你拥有更强力的AI代码助手!【支持GPT5.4、GPT5.3-codex】

文章目录 * 前言 * 一、基础环境 * 1.操作系统 * 2.工具版本 * 二、部署 * 1.安装PowerShell 7 * 1.1以管理员身份运行 cmd * 1.2安装powershell7 * 2.安装Node.js * 2.1 安装nvm * 2.2 安装Node&npm * 3.安装Codex * 4.安装CC-Switch * 5.配置CC-Switch * 5.1获取公益api key * 5.2选择Codex 点击“添加供应商” * 5.3配置测试 * 三、Codex简单使用 * 1.Powershell7中输入codex回车 * 2.Codex

全网都在刷的 AI Skills 怎么用?别死磕 Claude Code,OpenCode 才是国内首选!

全网都在刷的 AI Skills 怎么用?别死磕 Claude Code,OpenCode 才是国内首选!

最近,“Skills”在AI圈子里太火了! 大家都在用它给 AI 加各种“buff”,让它自动写代码、做表格等等 但很多小伙伴看着 GitHub 上那些 Skills 兴奋不已,真到了本地想玩一把时,使用Claude code有很多不便的地方 之前就有很多小伙伴问我OpenCode,整好借着Skills,来聊聊OpenCode的安装部署和使用 很简单,不管你是想用图形界面还是命令行,这篇保姆级教程都能让你轻松上手! 咱们这就开始,带你入门OpenCode玩转 Skills! 目录: 1. 1. ✅ 如何下载安装OpenCode 2. 2. ✅ 如何安装和配置Skills 3. 3. ✅ 环境变量的设置方法 4. 4. ✅ 常用指令和操作技巧 5. 5. ✅ 遇到问题如何解决 6. 6. ✅ 如何创建自己的Skills  一、下载安装,超级简单 下载地址: https:

斯坦福HAI官网完整版《2025 AI Index Report》全面解读

斯坦福HAI官网完整版《2025 AI Index Report》全面解读

一、这份报告真正想说什么 如果把整份《2025 AI Index Report》压缩成一句话,我会这样概括:AI 已经从“技术突破期”进入“系统扩散期”。它一边继续提升性能,一边迅速降本、普及、商业化、制度化;与此同时,风险事件、治理压力、数据约束、社会信任问题也同步上升。换句话说,2025年的AI不是“更神奇了”这么简单,而是开始变成一种会重塑产业结构、教育体系、监管逻辑和公众心理预期的基础能力。这个判断基本贯穿斯坦福官网总览页的 12 条结论与各章节摘要。(斯坦福人工智能研究所) 斯坦福自己对AI Index的定位也很明确:它不是某家公司的宣传册,也不是对未来的主观想象,而是一个收集、整理、浓缩并可视化 AI 数据趋势的观测框架,目的是为政策制定者、研究者、企业与公众提供更全面、客观的判断基础。也正因为如此,这份报告最重要的价值,