在openi启智社区的dcu bw1000使用llama.cpp推理 stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ（失败）

优质文章学习记录

11 Apr 2026 — 4 min read

openi启智社区的dcu新推出 bw1000计算卡，不耗费积分，可以可劲用！

但是提供的镜像只有一个，感觉用起来很麻烦....

用llmfit看看模型情况

llmfit info stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

=== stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ ===

Provider: stelterlab
Parameters: 4.6B
Quantization: Q4_K_M
Best Quant: Q8_0
Context Length: 262144 tokens
Use Case: Code generation and completion
Category: Coding
Released: 2025-07-31
Runtime: llama.cpp (est. ~17.2 tok/s)

Score Breakdown:
Overall Score: 66.7 / 100
Quality: 68 Speed: 43 Fit: 61 Context: 100
Estimated Speed: 17.2 tok/s

Resource Requirements:
Min VRAM: 2.4 GB
Min RAM: 2.6 GB (CPU inference)
Recommended RAM: 4.3 GB

MoE Architecture:
Experts: 8 active / 128 total per token
Active VRAM: 0.5 GB (vs 2.4 GB full model)

Fit Analysis:
Status: 🟡 Good
Run Mode: CPU+GPU
Memory Utilization: 0.6% (2.6 / 405.5 GB)

Notes:
MoE: insufficient VRAM for expert offloading
Spilling entire model to system RAM
Performance will be significantly reduced
Best quantization for hardware: Q8_0 (model default: Q4_K_M)
Estimated speed: 17.2 tok/s

安装llama.cpp

下载 llama.cpp源代码

git clone https://gitcode.com/GitHub_Trending/ll/llama.cpp

编译llama.cpp

cd llama.cpp cmake -B build cmake --build build --config Release

加入路径

export PATH=/root/llama.cpp/build/bin:$PATH

或者也可以直击用make install

cd build make install

但是安装好后报错

oot@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-cli llama-cli: error while loading shared libraries: libmtmd.so.0: cannot open shared object file: No such file or directory root@crdnotebook-2027598444851879937-denglf-12859:~/llama.cpp/build# llama-gguf llama-gguf: error while loading shared libraries: libggml-base.so.0: cannot open shared object file: No such file or directory

原来是没有把路径加入的缘故，加入路径，问题解决：

export PATH=/root/llama.cpp/build/bin:$PATH

模型下载

安装modelscope

pip install modelscope

下载

from modelscope import snapshot_download snapshot_download('tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ', cache_dir="models")

推理

用llama-cli推理

llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

报错：

root@crdnotebook-2027598444851879937-denglf-12859:~# llama-cli -m models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ

Loading model... |gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'
srv load_model: failed to load model, 'models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ'

Failed to load the model

看了一下，应该是这个模型： stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ

问题是这个模型魔搭没有....

尝试用transformers推理

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/root/models/tclf90/Qwen3-Coder-30B-A3B-Instruct-AWQ" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Write a quick sort algorithm." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=65536 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True) print("content:", content)

也是失败

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

总结

没调通，先搁置

llama.cpp是因为魔搭没有那个模型，所以模型不匹配

transformers是因为库的问题，需要重新安装torch等库，导致需要的库无法安装上，推理失败。

调试

报错ImportError: Loading an AWQ quantized model requires gptqmodel.

File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_awq.py:48, in AwqQuantizer.validate_environment(self, **kwargs) 46 def validate_environment(self, **kwargs): 47 if not is_gptqmodel_available(): ---> 48 raise ImportError( 49 "Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`" 50 ) 52 if not is_accelerate_available(): 53 raise ImportError("Loading an AWQ quantized model requires accelerate (`pip install accelerate`)") ImportError: Loading an AWQ quantized model requires gptqmodel. Please install it with `pip install gptqmodel`

安装提示执行

pip install gptqmodel

安装失败，

 Exception: Unable to detect torch version via uv/pip/conda/importlib. Please install torch >= 2.7.1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed to build 'gptqmodel' when getting requirements to build wheel

用conda试试

conda install gptqmodel

也失败了。

PackagesNotFoundError: The following packages are not available from current channels:

- gptqmodel

2026毕业季AIGC检测红线全解读：你的论文AI率达标了吗？

2026毕业季AIGC检测红线全解读：你的论文AI率达标了吗？ 2026年的毕业季，AIGC检测已经从"建议执行"变成了"硬性要求"。如果你还觉得"学校不会真查AI率"，那你可能要吃大亏。从去年下半年开始，越来越多的高校把AIGC检测纳入了论文审核的必经流程，和查重放在同等位置。这篇文章帮你理清三个核心问题：红线是多少、被查出来会怎样、怎么应对。 2026年各学历AIGC检测标准经过整理主流高校的最新政策，大致标准如下：学历层次AI率红线处理方式本科30%超过需修改后重新检测硕士15%-20%超过暂缓答辩，修改后复查博士10%超过取消答辩资格，需重新撰写期刊投稿视期刊而定核心期刊通常要求<10% 需要注意的是，这是目前多数985、211高校的标准。部分双非院校可能还没这么严格，但趋势很明确——标准只会越来越高，不会放松。为什么今年特别严？知网AIGC检测升级到3.0 2025年12月，知网AIGC检测算法从2.0升级到了3.0版本。新算法的检测维度从原来的3个增加到了7个，

llama的Qwen3.5大模型单GPU高效部署与股票筛选应用|附代码教程

全文链接：https://tecdat.cn/?p=45082 原文出处：拓端数据部落公众号在当今AI技术快速迭代的背景下，大模型的能力边界不断被突破，但随之而来的隐私安全、推理成本等问题也逐渐凸显。对于许多企业和研究者而言，将大模型部署在本地环境，既能保证数据隐私，又能灵活控制推理流程，成为了迫切需求。我们团队在近期的一个咨询项目中，就帮助客户完成了Qwen3.5大模型的本地化部署，并基于此开发了一款股票筛选工具，整个方案已通过实际业务校验。本文将从环境准备开始，一步步讲解如何在单GPU上高效运行Qwen3.5，包括llama.cpp的编译、模型下载、服务启动，以及最终的应用开发。希望能为有大模型本地化需求的读者提供一些实用参考。本文内容改编自过往客户咨询项目的技术沉淀并且已通过实际业务校验，该项目完整代码教程已分享至交流社群。阅读原文进群获取更多最新AI见解和行业洞察，可与900+行业人士交流成长；还提供人工答疑，拆解核心原理、代码逻辑与业务适配思路，帮大家既懂怎么做，也懂为什么这么做；遇代码运行问题，更能享24小时调试支持。全文脉络流程图

Whisper.cpp CUDA加速实战：让语音识别速度飙升7倍！

Whisper.cpp CUDA加速实战：让语音识别速度飙升7倍！【免费下载链接】whisper.cppOpenAI 的 Whisper 模型在 C/C++ 中的移植版本。项目地址: https://gitcode.com/GitHub_Trending/wh/whisper.cpp 在语音识别技术快速发展的今天，OpenAI Whisper模型凭借其卓越的准确性和多语言支持能力，已成为行业标杆。然而，传统的CPU计算模式在处理长音频或大型模型时往往力不从心。whisper.cpp作为Whisper的C++实现，通过集成NVIDIA CUDA技术，为开发者提供了突破性的性能提升方案，让语音识别应用真正实现实时响应。快速上手：环境配置与项目准备系统环境检查清单在开始配置前，请确认你的开发环境满足以下要求：硬件配置： * NVIDIA GPU（计算能力≥3.5） * 8GB以上系统内存 * 充足的硬盘存储空间软件依赖： * CUDA

AI绘画：解锁商业设计新宇宙（6/10）

1.AI 绘画：商业领域的潜力新星近年来，AI 绘画技术以惊人的速度发展，从最初简单的图像生成，逐渐演变为能够创造出高度逼真、富有创意的艺术作品。随着深度学习算法的不断优化，AI 绘画工具如 Midjourney、Stable Diffusion 等的出现，更是让这一技术走进了大众的视野，引发了广泛的关注和讨论。这些工具不仅操作简便，而且能够在短时间内生成多种风格的绘画作品，大大降低了绘画创作的门槛。 AI 绘画在商业领域展现出了巨大的潜力。据相关数据显示，2021 年中国 AI 绘画市场规模仅为 0.1 亿元，而预计到 2026 年将激增至 154.66 亿元，年复合增长率高达 244.1%。这一迅猛的增长趋势，反映出 AI 绘画在商业应用中的广阔前景。越来越多的企业开始认识到 AI 绘画的价值，并将其应用到广告、插画、