使用llama.cpp和ollama推理LFM2.5-1.2B模型（速度较快，大约每秒15个token，它是thing模型）

优质文章学习记录

10 Apr 2026 — 10 min read

LFM2.5-1.2B模型是近期的明星模型

LFM2.5-1.2B-Instruct 是 Liquid AI 在 2026 年 1 月推出的‌开源、轻量级、高性能的通用指令微调 AI 模型‌，参数量 12 亿，专为在手机、笔记本等端侧设备上高效运行而设计，内存占用仅约 900MB，推理速度极快。

核心特点

‌高效推理‌：采用混合架构（LIV卷积块 + GQA注意力块），在 CPU/NPU 上实现低延迟、高吞吐量推理，内存占用不到 1GB。
‌强指令遵循‌：通过监督微调和强化学习训练，擅长复杂推理、数学计算和工具使用，性能超越同规模模型。
‌多语言支持‌：支持中、英、日、韩等八种语言，日语优化版 LFM2.5-1.2B-JP 在日语任务上表现突出。
‌开源免费‌：所有权重开源，可免费下载用于研究和商业用途。

使用星河社区Llama.cpp推理

星河社区地址：https://aistudio.baidu.com/

不知道什么原因，我在星河社区的编译的llama.cpp是cpu版的，由于众所周知（其实不是）的原因，星河社区的环境的cpu只有2核，所以cpu推理起来特别慢。简直是上古蜗牛速度。

下载模型

魔搭地址：LFM2.5-1.2B-Instruct-GGUF · 模型库

modelscope download --model LiquidAI/LFM2.5-1.2B-Instruct-GGUF --local_dir ./work

llama.cpp编译

llama.cpp编译见：https://skywalk.blog.ZEEKLOG.net/article/details/157296500

推理

模型下载到~/work目录，直接-m进行推理

#llama-cli -m model.gguf llama-cli -m ~/work/LFM2.5-1.2B-Instruct-Q8_0.gguf

这个速度贼慢啊

> hello Hello! How can I assist you today? [ Prompt: 0.3 t/s | Generation: 0.1 t/s ]

速度太慢，无法忍受，应该是我打开的方法不对。

换用ollama推理试试。

用本机ollama推理试试

本机Windows10 24核 64G内存 NVIDIA Quadro K4200 （4G显存）

由于NVIDIA的‌GPU内存管理技术，主机的64G内存可以被gpu使用，这很强大：

GPU 0

NVIDIA Quadro K4200

   Driver version:   30.0.14.7514
   Driver date:   6/10/2024
   DirectX version:   12 (FL 11.0)
   Physical location:   PCI bus 5, device 0, function 0

   Utilization   2%
   Dedicated GPU memory   0.7/4.0 GB
   Shared GPU memory   0.1/32.0 GB
   GPU Memory   0.8/36.0 GB

升级ollama

到官网或者github下载最新的ollama：Release v0.15.1 · ollama/ollama

windows下比较省心，直接双击安装就行。

下载lfm2.5模型之前需要将ollama升级到最新版本，比如0.15.2版本，低版本连下载模型都不行。

下载模型

ollama pull lfm2.5-thinking:1.2b

启动推理

ollama run lfm2.5-thinking:1.2b

例子：帮我写个python实现8皇后放置的程序

>>> 帮我写个python实现8皇后放置的程序 Thinking... Okay, let's see what the user is asking here. They want a Python program to place 8 queens on a chessboard such that none attack each other. Wait, but the original problem mentioned 10 birds and killing 2, resulting in 8 left. Hmm, maybe the user wants to place 8 queens on a chessboard where none attack each other. First, I need to confirm the problem. The user said "8皇后放置的程序", which translates to placing 8 queens on a chessboard. Since queens attack horizontally, vertically, and diagonally, the classic 8 queens problem is to place them on 8 squares without attacking each other. But a standard chessboard is 8x8, so placing 8 queens there would mean one per row and column, right? Wait, actually, the classic 8 queens problem is exactly that: placing 8 queens on an 8x8 chessboard so none attack each other. So the user probably wants a program that does that. The user might want a program that generates a valid arrangement. Since the problem is classic, there are multiple solutions, but the user just needs any one possible solution. So how to approach writing such a program? Well, one way is to use backtracking: try placing queens one by one, ensuring no conflicts. But implementing that in Python might be a bit involved. Alternatively, maybe provide a precomputed solution or a simple backtracking approach. Alternatively, since the user might just want a simple example, perhaps using a known solution. For example, a common solution is placing queens on squares (1,1), (2,3), (3,5), etc., but ensuring they don't attack each other. Alternatively, a simple backtracking approach. Wait, but the user might expect a code snippet. Let me think of a straightforward way. Let's say we represent the chessboard as a list of lists, and try placing queens one by one. Since the user wants 8 queens on 8x8, maybe a simple backtracking function. Alternatively, since the problem is classic, perhaps provide a known solution. For example, one possible arrangement is placing queens on squares (1,1), (2,5), (3,8), (4,2), (5,4), (6,7), (7,3), (8,6). Wait, but need to check that none attack each other. Alternatively, maybe a simpler arrangement like placing them in a diagonal? Wait no, that would have them attacking each other. Alternatively, a standard solution is placing them on different rows and columns. For example, one possible solution is placing queens on squares (1,1), (2,3), (3,5), (4,7), (5,2), (6,4), (7,6), (8,5). Not sure, but perhaps it's easier to use a known solution. Alternatively, maybe use a predefined list. Alternatively, use a backtracking approach in code. Let me outline the steps: The program should take the number of queens (8) and place them on an 8x8 chessboard without attacking each other. The code can generate a valid arrangement. Implementing backtracking: Define a function that places a queen on a specific row and column, checks if it's safe, then proceeds recursively. Alternatively, since the user might want a simple example, perhaps provide a precomputed solution. Let's pick a simple arrangement. For example, placing queens on all white squares of one color? Wait, but that might not work because of diagonals. Alternatively, a simple arrangement like: Place queens on positions (1,1), (2,4), (3,7), (4,2), (5,5), (6,8), (7,3), (8,6). Wait, need to ensure no two share row, column, or diagonal. Let me check: rows 1-8, columns 1-8. Let's see: Row 1: column 1. Row 2: 4. Row 3:7. Row4:2. Row5:5. Row6:8. Row7:3. Row8:6. Now check columns: all columns 1,4,7,2,5,8,3,6 – all unique. Rows 1-8 all present. Now diagonals: For each pair of queens, check if they are on same diagonal. For example, between (1,1) and (2,4): difference in rows 1, columns 3 → difference 3, same for others. Maybe this works. Alternatively, perhaps a better known solution is better. Alternatively, use a standard solution like: A common solution is placing queens on squares (1,1), (2,5), (3,8), (4,2), (5,4), (6,7), (7,3), (8,6). Let me check: rows 1-8, columns 1,5,8,2,4,7,3,6 – all unique. Diagonals: For example, between (1,1) and (2,5): difference 1 row, 4 column → not same diagonal. (1,1) and (3,8): rows 2 apart, columns 7 apart → not same. Maybe this works. Alternatively, perhaps it's easier to just code a backtracking function. Alternatively, since the user might just want any valid arrangement, perhaps the code can generate a simple one. Let's proceed with writing a backtracking function. So here's the plan: write a Python function that places queens on an 8x8 board, ensuring no two attack each other. The code will use backtracking: try placing a queen in each row, check if it's safe, then proceed. But since the user just wants an example program, perhaps provide a code snippet that uses backtracking. Let me draft the code. The code structure would be something like: def place_queens(board, count): if count == 8: return True for row in range(8): for col in range(8): if is_safe(board, row, col): place queen at (row, col) if place_queens(board, count+1): return True unplace queen return False But implementing this requires representing the board. Alternatively, since the user might want a simple example, perhaps use a list of positions. Alternatively, here's a possible code: Here's a possible approach: We can represent the board as a list of lists, initialized to zeros. Then, recursively try placing queens row by row. Alternatively, here's a simple code example using backtracking: def solve_queens(board, count): if count == 8: return True for col in range(8): if is_safe(board, count, col): board[count][col] = 1 if solve_queens(board, count + 1): return True board[count][col] = 0 return False But initializing the board as a list of 8 lists, each of 8 zeros. Then, starting with count=0, trying each column in each row. Wait, perhaps the user just wants a sample code. Since the user asked for a program that places 8 queens on 8x8 board without attacking, here's a possible code: Here's a possible Python code: def place_queens(): # Create a 8x8 board initialized to 0 board = [[0 for _ in range(8)] for _ in range(8)] def backtrack(row): if row == 8: return True for col in range(8): if is_safe(row, col): board[row][col] = 1 if backtrack(row + 1): return True board[row][col] = 0 backtrack(0) return board But need to define is_safe function. Alternatively, the code can be written with a helper function. Alternatively, here's a complete example: def is_safe(board, row, col, queen): for r in range(8): for c in range(8): if board[r][c] == 1 or board[r][c] == queen: return False return True Then the main function would be: def place_8_queens(): board = [[0 for _ in range(8)] for _ in range(8)] def backtrack(row): if row == 8: return True for col in range(8): if is_safe(board, row, col, 1): # Assuming queen is 1 board[row][col] = 1 if backtrack(row + 1): return True board[row][col] = 0 backtrack(0) return board Then, print the board: board = place_8_queens() for row in board: print(''.join(map(str, row))) This should output a valid arrangement. Since the user might just want the code, I'll present this as the answer. The key points are ensuring no two queens attack each other, which the code does via the is_safe function checking rows, columns, and diagonals. So putting it all together, the code would be as above. The user might just need the code, so I'll present that. ...done thinking. 以下是用Python实现8个皇后放置的程序，确保它们不相互攻击的简单解决方案： ```python def is_safe(board, row, col, queen): for r in range(8): for c in range(8): if board[r][c] == queen or board[r][c] == 1: return False return True def place_queens(): board = [[0 for _ in range(8)] for _ in range(8)] def backtrack(row): if row == 8: return True for col in range(8): if is_safe(board, row, col, 1): board[row][col] = 1 if backtrack(row + 1): return True board[row][col] = 0 backtrack(0) return board # 输出结果 board = place_queens() for row in board: print(''.join(map(str, row))) ``` **解释：** 1. `is_safe` 函数检查当前位置是否与已放置的皇后冲突（同一行、列或对角线）。 2. `place_queens` 使用深度优先搜索（backtracking）逻辑，逐行尝试放置皇后，确保无冲突。 3. 最终输出一个有效的8个皇后排列（如：`101010101 011010011 000000001 111100000` 等，具体取决于生成的解）。 此程序会生成一个符合条件的解。

速度比较快！看看效果

python eight.py 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

解是不对的....所以这个离实用，还是有点距离。

问它天空为什么是蓝色的：

天空看起来蓝色的主要原因是**Rayleigh散射**（Rayleigh scattering）：
1. **光散射原理**：太阳光中含有短波长的蓝光，阳光穿过大气层时，蓝光被更强烈散射（散射力随波长倒数关系，蓝光散射更强
）。
2. **视觉影响**：散射后的蓝光在远处扩散，覆盖整个天空，形成蓝色。
3. **其他颜色差异**：红色、橙色等长波光散射较弱，显得暗，导致天空看似蓝。

不过，天空看似黑色时，可能是因为大气中的颗粒物或云层吸收其他颜色光，蓝光被散射到远处。最终，蓝光占据更多可见光，显得
蓝色。

**注意**：在某些条件（如太阳附近或云层厚）可能看似黑色，但通常蓝天是蓝色的典型表现。

这个回答还行。

总体来说，这个1.2b的模型智商还是在线的，可以用来做任务拆解等前置动作，但是不适合做最终的编程等动作。

总结

llama.cpp可能是我打开不对，也可能是星河社区的cpu拖了后腿，LFM2.5-1.2B模型推理很慢，大约一分钟能输出几个字符，无法使用。

ollama的推理就很不错，当然因为以前用过deepseek的推理，所以LFM2.5-1.2B的速度并没有感觉很惊艳。但是也许在一些手机、笔记本等端侧设备上可能会更好些。我感觉LFM2.5-1.2B的主要优点是内存占用低，速度跟主流比起来其实没有太大的优势。

主要是近1年被deepseek影响很大，它提高了我的预期，导致所有的明星模型都会跟deepseek比较，而且比起来，那些明星模型的优势并没有特别的明显。

但是LFM2.5-1.2B内存占用低确实是个很大的优势，因为这样就可以同时进行多个推理，尤其是在任务拆解、分配步骤。

使用llama.cpp和ollama推理LFM2.5-1.2B模型（速度较快，大约每秒15个token，它是thing模型）

优质文章学习记录

LFM2.5-1.2B模型是近期的明星模型

核心特点

使用星河社区Llama.cpp推理

下载模型

llama.cpp编译

推理

用本机ollama推理试试

升级ollama

下载模型

启动推理

例子：帮我写个python实现8皇后放置的程序

总结

Read more

LazyLLM 多 Agent 应用全流程实践：从源码部署到可视化 Web 调试的低代码方案

低代码结合大模型：中小企业半天构建专属SaaS应用的完整路径

3DMAX VR渲染器局部渲染设置教程

【无人机路径规划】基于粒子群算法PSO融合动态窗口法DWA的无人机三维动态避障路径规划研究（Matlab代码实现）