端侧AI 模型部署实战六(Android多模态大模型)

0 阅读7分钟

上一篇文章《端侧AI 模型部署实战五(Android大模型加载)》实现了Android手机端LLM文本大模型的加载和运行,本文继续实现多模型大模型的加载。

AI Prompt: windows + llama.cpp + Android studio 环境移植Qwen2.5-Omni-3B-Instruct

为什么选择Qwen2.5-Omni-3B-Instruct这个大模型,一个原因是我手机低配,这个模型足够小能跑,根本原因是一次把文本,图片,语音,视频都验证完成节省时间。

一、整体目标

把 llama.cpp + Qwen2.5-Omni-3B 编译成 Android 可调用的 .so 库,在 Android Studio 里做成离线全模态 AI APP。

二、需要准备

  1. Windows PC
  2. Android Studio Hedgehog 或最新版
  3. 安装 NDK (25c 或 26b)
  4. 模型文件(必须 3B)
  • qwen2.5-omni-3b.Q4_K_M.gguf
  • mmproj-Qwen2.5-Omni-3B-f16.gguf

三、核心步骤

1. 下载 llama.cpp 源码

官网下载,命令行执行 git clone github.com/ggml-org/ll… scp 拷贝源码至本地目录。

如果下载太慢可以使用:git clone gitee.com/mirrors/lla… 这个国内镜像,速度很快。

注意: 由于llama.cpp源码的框架最近变动很大,不同时期的编译方式和JNI接口调用都有大量的差异,而且不同版本对大模型的支持也有变动,需要和大模型进行配合。

git describe --tags 命令确认llama.cpp的版本,,比如我这个版本是b8648,如果使用AI编程,一定要预置这个prompt, 不然AI默认以最新的编程接口出现大量接口编译错误。

2. 模型下载与转换

模型下载选择上面llama.cpp b8648支持的多态大模型,下载之前先运行命令确认当前llama.cpp支持哪些模型,避免出现模型下载下来无法量化转换的问题。

python convert_hf_to_gguf.py --print-supported-models

模型下载地址:

huggingface官方完整仓库,huggingface.co/google

国内镜像源hf-mirror.com/ggml-org

魔搭社区 modelscope.cn/models/

上面几个网站都可以直接下载原始未量化的模型,也可以下载量化过的模型。

量化模型下载:

modelscope download --model ggml-org/Qwen2.5-Omni-3B-GGUF --local_dir D:\workspace\AI\lm_studio_models\Qwen2.5-Omni-3B-GGUF

原始模型下载:

modelscope download --model Qwen/Qwen2.5-Omni-3B --local_dir D:\workspace\AI\lm_studio_models\Qwen2.5-Omni-3B

下载之后可以自己量化,量化的方法之前文章有介绍

3. llamap.cpp so编译集成(关键!)

编译集成的目的是为了将llama.cpp中的推理引擎编译成so库集成到手机,实现JNI接口供手机apk调用,所以重点是so的编译生成和JNI的实现。

so库的编译集成主要两种方式实现,方式一:采用预编译llama.cpp,将编译生成的so预置到项目工程中,端侧AI 模型部署实战五中就是将so预置到了app/src/main/jniLibs/arm64-v8a/*.so中。方式二:直接将llama.cpp放到工程目录下,重新编译。

方式1:可以使用下面脚本编译

@echo of
:: ==============================================
:: llama.cpp b8648 Android arm64-v8a 编译脚本
:: 已开启 MTMD(多模态)
:: ==============================================

:: 1. 进入源码根目录
cd /d D:\workspace\AI\tools\llama.cpp-android\llama.cpp

:: 2. 清理旧构建
rmdir /s /q build-android
mkdir build-android
cd build-android

:: 3. NDK 路径
set ANDROID_NDK=D:\workspace\AI\SDK\ndk\28.2.13676358
chcp 65001 > nul

:: 4. CMake 配置(已开启 MTMD)
cmake .. ^
-DCMAKE_TOOLCHAIN_FILE=%ANDROID_NDK%/build/cmake/android.toolchain.cmake ^
-DANDROID_ABI=arm64-v8a ^
-DANDROID_PLATFORM=android-28 ^
-DCMAKE_BUILD_TYPE=Release ^
-DCMAKE_C_FLAGS="-march=armv8-a" ^
-DCMAKE_CXX_FLAGS="-march=armv8-a" ^
-DGGML_OPENMP=OFF ^
-DGGML_NATIVE=OFF ^
-DBUILD_SHARED_LIBS=ON ^
-DLLAMA_ANDROID=ON ^
-DGGML_CUDA=OFF ^
-DGGML_METAL=OFF ^
-DLLAMA_MTMD=ON ^
-DCMAKE_MAKE_PROGRAM=%ANDROID_NDK%/prebuilt/windows-x86_64/bin/make.exe ^
-G "Unix Makefiles"

:: 5. 编译
cmake --build . -j4

echo.
echo ==============================
echo 编译完成 ✅ MTMD 已开启
echo 输出路径:build-android/lib/libllama.so
echo ==============================
pause

本文采用的方式二,将代码llama.cpp全部拷贝到了LlamaTest\app\src\main\cpp\llama,结构如下

使用方式二的原因是本文要验证的是多模态,其代码路径是llama.cpp\tools\mtmd,默认编译配置mtmd的代码不会编译,导致实现多模态JNI口的时大量函数,为了不修改llama.cpp的CMakeLists.txt,通过修改项目CMakeLists.txt实现,项目CMakeLists.txt内容修改如下:

LlamaTest\app\src\main\cpp\CMakeLists.txt

cmake_minimum_required(VERSION 3.22.1)
project(llamatest)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# 编译优化开关
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O0 -fno-openmp -w")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O0 -fno-openmp -w")

# 关闭无用模块
set(LLAMA_BUILD_EXAMPLES OFF CACHE BOOL "" FORCE)
set(LLAMA_BUILD_TESTS OFF CACHE BOOL "" FORCE)

add_subdirectory(llama)

# ==============================================
# 🔥 编译全部 mtmd 相关(包括 helper)
# ==============================================
add_library(
        mtmd
        STATIC
        llama/tools/mtmd/mtmd.cpp
        llama/tools/mtmd/mtmd-image.cpp
        llama/tools/mtmd/mtmd-helper.cpp    
        llama/tools/mtmd/clip.cpp
        llama/tools/mtmd/mtmd-audio.cpp
        llama/tools/mtmd/models/siglip.cpp
        llama/tools/mtmd/models/cogvlm.cpp
        llama/tools/mtmd/models/conformer.cpp
        llama/tools/mtmd/models/gemma4v.cpp
        llama/tools/mtmd/models/glm4v.cpp
        llama/tools/mtmd/models/internvl.cpp
        llama/tools/mtmd/models/kimivl.cpp
        llama/tools/mtmd/models/kimik25.cpp
        llama/tools/mtmd/models/nemotron-v2-vl.cpp
        llama/tools/mtmd/models/llama4.cpp
        llama/tools/mtmd/models/llava.cpp
        llama/tools/mtmd/models/minicpmv.cpp
        llama/tools/mtmd/models/paddleocr.cpp
        llama/tools/mtmd/models/pixtral.cpp
        llama/tools/mtmd/models/qwen2vl.cpp
        llama/tools/mtmd/models/qwen3vl.cpp
        llama/tools/mtmd/models/siglip.cpp
        llama/tools/mtmd/models/whisper-enc.cpp
        llama/tools/mtmd/models/deepseekocr.cpp
        llama/tools/mtmd/models/mobilenetv5.cpp
        llama/tools/mtmd/models/youtuvl.cpp
)

# ==============================================
# 🎯 关键:直接指定 miniaudio 路径!!!
# ==============================================
target_include_directories(
        mtmd PRIVATE
        ${CMAKE_CURRENT_SOURCE_DIR}/llama/include
        ${CMAKE_CURRENT_SOURCE_DIR}/llama/tools/mtmd
        ${CMAKE_CURRENT_SOURCE_DIR}/llama/ggml/src   # ✅ 这里!
        ${CMAKE_CURRENT_SOURCE_DIR}/llama/ggml/include  # ✅ 这里!
        ${CMAKE_CURRENT_SOURCE_DIR}/llama/vendor  # ✅ 这里!
)

target_link_libraries(mtmd PRIVATE llama)

# JNI
add_library(llama_jni SHARED llama_wrapper.cpp)

target_include_directories(
        llama_jni PRIVATE
        ${CMAKE_CURRENT_SOURCE_DIR}/llama/include
        ${CMAKE_CURRENT_SOURCE_DIR}/llama/tools/mtmd
)

target_link_libraries(
        llama_jni
        PRIVATE
        llama
        mtmd
        log
        android
        jnigraphics
)

如果使用AI编程,必须要将mtmd里面的文件丢给AI,让AI知道有哪些接口,让其使用文件中的接口进行编程,不然AI编程会使用新旧接口,来回反复折腾,我在这里吃了大亏来回搞了两天。

4. 全模态接口(文 + 图 + 音)

还是使用之前的LlamaTest工程,主要代码修改如下:

LlamaTest\app\src\main\java\com\example\llamatest\MainActivity.kt

package com.example.llamatest

import android.content.Intent
import android.graphics.Bitmap
import android.graphics.BitmapFactory
import android.os.Bundle
import android.widget.EditText
import android.widget.TextView
import androidx.appcompat.app.AppCompatActivity
import java.io.File
import java.io.FileOutputStream

class MainActivity : AppCompatActivity() {

    private lateinit var tvResult: TextView
    private var modelPath: String? = null
    private var mmprojPath: String? = null
    private var isModelLoaded = false

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        tvResult = findViewById(R.id.tv_result)
        val btnSelectModel = findViewById<android.widget.Button>(R.id.btn_select_model)
        val btnSelectMmproj = findViewById<android.widget.Button>(R.id.btn_select_mmproj)
        val btnLoadModel = findViewById<android.widget.Button>(R.id.btn_load_model)
        val etPrompt = findViewById<EditText>(R.id.et_prompt)
        val btnSendText = findViewById<android.widget.Button>(R.id.btn_send_text)
        val btnSendImage = findViewById<android.widget.Button>(R.id.btn_send_image)

        logMsg("✅ 启动成功,选择模型 + mmproj投影文件")

        // 选择模型
        btnSelectModel.setOnClickListener {
            val intent = Intent(Intent.ACTION_OPEN_DOCUMENT).apply {
                addCategory(Intent.CATEGORY_OPENABLE)
                type = "*/*"
            }
            startActivityForResult(intent, 100)
        }

        // 选择mmproj
        btnSelectMmproj.setOnClickListener {
            val intent = Intent(Intent.ACTION_OPEN_DOCUMENT).apply {
                addCategory(Intent.CATEGORY_OPENABLE)
                type = "*/*"
            }
            startActivityForResult(intent, 101)
        }

        // 加载多模态
        btnLoadModel.setOnClickListener {
            if (modelPath == null || mmprojPath == null) {
                logMsg("❌ 必须同时选模型和mmproj")
                return@setOnClickListener
            }
            logMsg("⏳ 加载Qwen2.5-Omni...")
            Thread {
                val success = llamaBridge.loadMultimodal(modelPath!!, mmprojPath!!)
                runOnUiThread {
                    isModelLoaded = success
                    logMsg(if (success) "🎉 模型+mmproj加载成功" else "❌ 加载失败(检查mmproj/编译LLAMA_MTMD=ON)")
                }
            }.start()
        }

        // 纯文本
        btnSendText.setOnClickListener {
            val input = etPrompt.text.toString().trim()
            if (input.isEmpty() || !isModelLoaded) {
                logMsg(if (input.isEmpty()) "❌ 输入不能为空" else "❌ 先加载模型")
                return@setOnClickListener
            }
            logMsg("\n🧑‍💻:$input")
            logMsg("🤖 思考中...")
            Thread {
                try {
                    val reply = llamaBridge.streamGenerate(input)
                    runOnUiThread { logMsg("🤖:$reply") }
                } catch (e: Exception) {
                    runOnUiThread { logMsg("❌ 文本失败:${e.message}") }
                }
            }.start()
        }

        // 图片+文字(真正传Bitmap)
        btnSendImage.setOnClickListener {
            val input = etPrompt.text.toString().trim()
            if (input.isEmpty() || !isModelLoaded) {
                logMsg(if (input.isEmpty()) "❌ 请输入问题" else "❌ 先加载模型")
                return@setOnClickListener
            }
            val intent = Intent(Intent.ACTION_OPEN_DOCUMENT).apply { type = "image/*" }
            startActivityForResult(intent, 200)
        }
    }

    override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
        super.onActivityResult(requestCode, resultCode, data)
        if (resultCode != RESULT_OK || data == null) return
        val uri = data.data ?: return

        // 复制模型/mmproj
        if (requestCode == 100 || requestCode == 101) {
            val isModel = requestCode == 100
            val target = if (isModel) File(filesDir, "model.gguf") else File(filesDir, "mmproj.gguf")
            Thread {
                contentResolver.openInputStream(uri)?.use { input ->
                    FileOutputStream(target).use { output -> input.copyTo(output) }
                }
                runOnUiThread {
                    if (isModel) modelPath = target.absolutePath
                    else mmprojPath = target.absolutePath
                    logMsg("✅ ${if (isModel) "模型" else "mmproj"}复制完成:${target.name}")
                }
            }.start()
        }

        // 处理图片
        if (requestCode == 200) {
            val prompt = findViewById<EditText>(R.id.et_prompt).text.toString().trim()
            // 缩放448x448(适配Qwen)
            val bitmap = contentResolver.openInputStream(uri)?.use {
                BitmapFactory.decodeStream(it)
            }?.run {
                Bitmap.createScaledBitmap(this, 448, 448, true)
            }

            if (bitmap == null) {
                logMsg("❌ 图片加载失败")
                return
            }

            logMsg("\n🖼️ 正在理解图片+文字:$prompt")
            Thread {
                try {
                    val reply = llamaBridge.chatImage(prompt, bitmap)
                    runOnUiThread {
                        logMsg("🤖:$reply")
                        bitmap.recycle()
                    }
                } catch (e: Exception) {
                    runOnUiThread {
                        logMsg("❌ 图片推理失败:${e.message}")
                        bitmap.recycle()
                    }
                }
            }.start()
        }
    }

    // UI实时日志
    private fun logMsg(msg: String) {
        runOnUiThread { tvResult.append("\n$msg") }
    }

    override fun onDestroy() {
        super.onDestroy()
        llamaBridge.releaseModel()
    }
}

LlamaTest\app\src\main\java\com\example\llamatest\llamaBridge.kt

package com.example.llamatest

object llamaBridge {
    init {
        System.loadLibrary("llama_jni")
    }

    external fun loadMultimodal(modelPath: String, mmprojPath: String): Boolean
    external fun chatImage(prompt: String, bitmap: android.graphics.Bitmap?): String
    external fun streamGenerate(prompt: String): String
    external fun releaseModel()
}

LlamaTest\app\src\main\cpp\llama_wrapper.cpp

#include <jni.h>
#include <string>
#include <vector>
#include <cstring>
#include <android/bitmap.h>
#include <android/log.h>

#define LOG_TAG "LlamaMM"
#define LOGI(...) __android_log_print(ANDROID_LOG_INFO, LOG_TAG, __VA_ARGS__)
#define LOGE(...) __android_log_print(ANDROID_LOG_ERROR, LOG_TAG, __VA_ARGS__)

#ifdef __cplusplus
extern "C" {
#endif

#include "llama.h"
#include "mtmd.h"

#ifdef __cplusplus
}
#endif

static llama_model*      g_model = nullptr;
static llama_context*    g_ctx   = nullptr;
static const llama_vocab* g_vocab = nullptr;
static mtmd_context*     g_mtmd  = nullptr;

static llama_token sample_token(llama_context* ctx) {
    LOGI("sample_token: 开始采样");
    const float* logits = llama_get_logits(ctx);
    int n = llama_vocab_n_tokens(g_vocab);

    int best = 0;
    float max_val = -1e9;
    for (int i = 0; i < n; ++i) {
        if (logits[i] > max_val) {
            max_val = logits[i];
            best = i;
        }
    }
    LOGI("sample_token: 采样结果 = %d", best);
    return (llama_token)best;
}

extern "C" JNIEXPORT jboolean JNICALL
Java_com_example_llamatest_llamaBridge_loadMultimodal(
        JNIEnv* env, jobject /* thiz */,
        jstring modelPath, jstring mmprojPath)
{
    LOGI("==================================================");
    LOGI("loadMultimodal: 【Qwen2.5-Omni-3B】加载开始");
    LOGI("==================================================");

    const char* model_path = env->GetStringUTFChars(modelPath, nullptr);
    const char* mmproj_path = env->GetStringUTFChars(mmprojPath, nullptr);

    llama_model_params mparams = llama_model_default_params();
    g_model = llama_model_load_from_file(model_path, mparams);
    env->ReleaseStringUTFChars(modelPath, model_path);

    if (!g_model) {
        LOGE("模型加载失败 ❌");
        return JNI_FALSE;
    }
    LOGI("模型加载成功 ✅");

    g_vocab = llama_model_get_vocab(g_model);
    if (!g_vocab) {
        LOGE("vocab 获取失败 ❌");
        return JNI_FALSE;
    }
    LOGI("vocab 获取成功 ✅");

    auto mtmd_params = mtmd_context_params_default();
    g_mtmd = mtmd_init_from_file(mmproj_path, g_model, mtmd_params);
    env->ReleaseStringUTFChars(mmprojPath, mmproj_path);

    if (!g_mtmd) {
        LOGE("mmproj 加载失败 ❌");
        return JNI_FALSE;
    }
    LOGI("mmproj 加载成功 ✅");

    bool support_vision = mtmd_support_vision(g_mtmd);
    LOGI("==================================================");
    LOGI("✅ mtmd 是否支持视觉 = %s", support_vision ? "是" : "否");
    LOGI("==================================================");

    llama_context_params cparams = llama_context_default_params();
    cparams.n_ctx      = 32768;
    cparams.n_batch    = 1024;
    cparams.n_ubatch   = 1024;
    cparams.n_threads  = 4;
    cparams.no_perf    = true;

    g_ctx = llama_init_from_model(g_model, cparams);

    if (!g_ctx) {
        LOGE("context 初始化失败 ❌");
        return JNI_FALSE;
    }

    LOGI("==================================================");
    LOGI("loadMultimodal: 全部加载成功 ✅");
    LOGI("==================================================");
    return JNI_TRUE;
}

extern "C" JNIEXPORT jstring JNICALL
Java_com_example_llamatest_llamaBridge_chatImage(
        JNIEnv* env, jobject /* thiz */,
        jstring prompt, jobject bitmap)
{
    LOGI("==================================================");
    LOGI("chatImage: 【图像对话开始】");
    LOGI("==================================================");

    if (!g_ctx || !g_mtmd || !g_vocab || !bitmap) {
        LOGE("错误:模型未初始化 ❌");
        return env->NewStringUTF("模型未初始化");
    }
    LOGI("所有实例正常 ✅");

    const char* prompt_c = env->GetStringUTFChars(prompt, nullptr);

    std::string input_text = "<|im_start|>user\n<__media__>\n";
    input_text += prompt_c;
    input_text += "<|im_end|>\n<|im_start|>assistant\n";

    env->ReleaseStringUTFChars(prompt, prompt_c);

    LOGI("最终 Prompt:\n%s", input_text.c_str());

    LOGI("开始读取 Bitmap");
    AndroidBitmapInfo info{};
    void* pixels = nullptr;
    AndroidBitmap_getInfo(env, bitmap, &info);
    AndroidBitmap_lockPixels(env, bitmap, &pixels);

    int w = info.width;
    int h = info.height;
    LOGI("图片大小: %d x %d", w, h);

    const int TARGET_SIZE = 224;
    std::vector<uint8_t> rgb(TARGET_SIZE * TARGET_SIZE * 3);
    uint32_t* src = (uint32_t*)pixels;

    for (int y = 0; y < TARGET_SIZE; y++) {
        for (int x = 0; x < TARGET_SIZE; x++) {
            int sx = x * w / TARGET_SIZE;
            int sy = y * h / TARGET_SIZE;
            uint32_t c = src[sy * info.width + sx];
            rgb[(y*TARGET_SIZE+x)*3+0] = (c >> 16) & 0xFF;
            rgb[(y*TARGET_SIZE+x)*3+1] = (c >> 8)  & 0xFF;
            rgb[(y*TARGET_SIZE+x)*3+2] = c & 0xFF;
        }
    }

    AndroidBitmap_unlockPixels(env, bitmap);
    LOGI("Bitmap → RGB 完成,已缩放到 224x224 ✅");

    mtmd_bitmap* img = mtmd_bitmap_init(TARGET_SIZE, TARGET_SIZE, rgb.data());
    if (!img) {
        LOGE("mtmd_bitmap_init 失败 ❌");
        return env->NewStringUTF("图片处理失败");
    }
    LOGI("mtmd_bitmap_init 成功 ✅");

    mtmd_input_chunks* chunks = mtmd_input_chunks_init();
    LOGI("mtmd_input_chunks_init 成功 ✅");

    mtmd_input_text text_cfg{};
    text_cfg.text         = input_text.c_str();
    text_cfg.add_special  = true;
    text_cfg.parse_special = true;

    const mtmd_bitmap* bitmaps[] = { img };

    LOGI("==== 开始 mtmd_tokenize ====");
    mtmd_tokenize(g_mtmd, chunks, &text_cfg, bitmaps, 1);

    size_t chunk_cnt = mtmd_input_chunks_size(chunks);
    LOGI("==== mtmd_tokenize 完成 ====");
    LOGI("chunk 数量 = %zu", chunk_cnt);

    llama_pos pos = 0;
    int embd_dim = llama_model_n_embd_inp(g_model);
    LOGI("embd_dim = %d", embd_dim);

    for (size_t i = 0; i < chunk_cnt; i++) {
        const mtmd_input_chunk* chunk = mtmd_input_chunks_get(chunks, i);
        int type = mtmd_input_chunk_get_type(chunk);

        LOGI("--------------------------------");
        LOGI("处理 chunk %zu | type = %d", i, type);

        if (type == MTMD_INPUT_CHUNK_TYPE_TEXT) {
            size_t ntok = 0;
            const llama_token* tok = mtmd_input_chunk_get_tokens_text(chunk, &ntok);
            LOGI("文本 token 数量 = %zu", ntok);

            for (size_t j = 0; j < ntok; j++) {
                llama_token t = tok[j];
                LOGI("token [%zu] = %d", j, (int)t);

                llama_batch b = llama_batch_get_one((llama_token*)&t, pos++);
                int ret = llama_decode(g_ctx, b);
                LOGI("文本 llama_decode ret = %d | pos = %d", ret, (int)pos);
            }
        }
        else if (type == MTMD_INPUT_CHUNK_TYPE_IMAGE) {
            LOGI("图像 chunk → 开始编码(224x224)");

            mtmd_encode_chunk(g_mtmd, chunk);

            LOGI("图像编码完成 ✅");
            LOGI("开始解码图像 embedding ...");

            float* embd = mtmd_get_output_embd(g_mtmd);
            if (embd == nullptr) {
                LOGE("❌ 严重错误:mtmd_get_output_embd 返回空指针,跳过图像解码");
                continue;
            }
            LOGI("✅ mtmd_get_output_embd 获取成功");

            size_t n_tokens = mtmd_input_chunk_get_n_tokens(chunk);
            LOGI("图像 token 数量 = %zu", n_tokens);

            // ==========================
            // ✅ 修复:正确初始化 batch(老版本兼容)
            // ==========================
            for (size_t j = 0; j < n_tokens; j++) {
                // 官方API初始化,杜绝野指针崩溃
                llama_batch batch = llama_batch_init(1, embd_dim, 0);

                batch.n_tokens = 1;
                // 按维度赋值,兼容你的老版本llama.cpp
                for (int d = 0; d < embd_dim; d++) {
                    batch.embd[d] = embd[j * embd_dim + d];
                }
                batch.pos[0] = pos;

                int ret = llama_decode(g_ctx, batch);
                LOGI("图像 decode ret = %d | idx=%zu pos=%d", ret, j, (int)pos);

                // 释放内存
                llama_batch_free(batch);
                pos++;
            }
            LOGI("✅ 所有图像 token 解码完成");
        }
        else {
            LOGI("未知 chunk 类型");
        }
    }

    LOGI("========================================");
    LOGI("所有 chunk 解码完成,开始生成回答 🚀");
    LOGI("========================================");

    std::string output;
    const int MAX_GEN = 1024;
    llama_token eos = llama_vocab_eos(g_vocab);
    LOGI("EOS token = %d", (int)eos);

    for (int i = 0; i < MAX_GEN; i++) {
        LOGI("开始生成第 %d 个 token", i);

        llama_token t = sample_token(g_ctx);

        if (t == eos || t == 0) {
            LOGI("生成结束:到达 EOS");
            break;
        }

        char buf[256] = {0};
        llama_token_to_piece(g_vocab, t, buf, sizeof(buf)-1, 0, false);
        output += buf;
        LOGI("生成内容: %s", buf);

        llama_batch b = llama_batch_get_one(&t, pos++);
        llama_decode(g_ctx, b);
    }

    LOGI("========================================");
    LOGI("最终结果: %s", output.c_str());
    LOGI("========================================");

    mtmd_input_chunks_free(chunks);
    mtmd_bitmap_free(img);

    return env->NewStringUTF(output.c_str());
}

extern "C" JNIEXPORT void JNICALL
Java_com_example_llamatest_llamaBridge_releaseModel(JNIEnv*, jobject)
{
    LOGI("释放模型资源...");
    if (g_ctx)   { llama_free(g_ctx);   LOGI("ctx 释放"); }
    if (g_model) { llama_model_free(g_model); LOGI("model 释放"); }
    if (g_mtmd)  { mtmd_free(g_mtmd);  LOGI("mtmd 释放"); }

    g_ctx   = nullptr;
    g_model = nullptr;
    g_mtmd  = nullptr;
    g_vocab = nullptr;

    LOGI("全部释放完成 ✅");
}

static llama_token sample_token_text() {
    float* logits = llama_get_logits_ith(g_ctx, -1);
    int n_vocab = llama_vocab_n_tokens(g_vocab);
    int best = 0;
    float max = -1e9;
    for (int i = 0; i < n_vocab; i++) {
        if (logits[i] > max) {
            max = logits[i];
            best = i;
        }
    }
    return (llama_token)best;
}

// 流式生成(安全版)
extern "C" JNIEXPORT jstring JNICALL
Java_com_example_llamatest_llamaBridge_streamGenerate(
        JNIEnv* env, jobject /* thiz */, jstring prompt)
{
    if (!g_ctx || !g_model || !g_vocab) {
        return env->NewStringUTF("ERROR: 模型未加载");
    }

    const char* prompt_c = env->GetStringUTFChars(prompt, nullptr);
    std::string input = "<start_of_turn>user\n";
    input += prompt_c;
    input += "<end_of_turn>\n<start_of_turn>model\n";
    env->ReleaseStringUTFChars(prompt, prompt_c);

    std::vector<llama_token> tokens(512);
    int n_tokens = llama_tokenize(g_vocab, input.c_str(), input.size(), tokens.data(), 512, true, false);
    if (n_tokens <= 0) return env->NewStringUTF("分词失败");

    // 一次性输入
    llama_batch batch = llama_batch_get_one(tokens.data(), n_tokens);
    llama_decode(g_ctx, batch);

    std::string full_output;
    const int MAX_GEN = 40;
    const llama_token eos = llama_vocab_eos(g_vocab);

    for (int i = 0; i < MAX_GEN; i++) {
        llama_token tok = sample_token_text();
        if (tok == eos || tok == 0) break;

        char buf[256] = {0};
        llama_token_to_piece(g_vocab, tok, buf, sizeof(buf)-1, 0, false);
        full_output += buf;

        // 推理下一个
        llama_batch b = llama_batch_get_one(&tok, 1);
        llama_decode(g_ctx, b);
    }

    return env->NewStringUTF(full_output.c_str());
}

四、APP 运行要求

  • arm64-v8a 手机(现在 99% 安卓机)
  • 内存 ≥ 6GB(推荐 8GB)
  • 模型占用 ≈ 2.5GB
  • 离线运行,不联网

运行结果如下:

文本可以正常运行,视频如下:

live.csdn.net/v/522336

手机内存限制,图片和文件同步处理过程中出现死机,调试半天效果,还是crash在视频embedding ..., 下一步需要进行优化。

	Line 17053: 04-18 12:33:28.660 26737 27804 I LlamaMM : loadMultimodal: 【Qwen2.5-Omni-3B】加载开始
	Line 17054: 04-18 12:33:28.660 26737 27804 I LlamaMM : ==================================================
	Line 17132: 04-18 12:33:32.550 26737 27804 I LlamaMM : 模型加载成功 ✅
	Line 17133: 04-18 12:33:32.550 26737 27804 I LlamaMM : vocab 获取成功 ✅
	Line 17337: 04-18 12:33:37.240 26737 27804 I LlamaMM : mmproj 加载成功 ✅
	Line 17338: 04-18 12:33:37.240 26737 27804 I LlamaMM : ==================================================
	Line 17339: 04-18 12:33:37.240 26737 27804 I LlamaMM : ✅ mtmd 是否支持视觉 = 是
	Line 17340: 04-18 12:33:37.240 26737 27804 I LlamaMM : ==================================================
	Line 17424: 04-18 12:33:38.022 26737 27804 I LlamaMM : ==================================================
	Line 17425: 04-18 12:33:38.022 26737 27804 I LlamaMM : loadMultimodal: 全部加载成功 ✅
	Line 17426: 04-18 12:33:38.022 26737 27804 I LlamaMM : ==================================================
	Line 20939: 04-18 12:34:04.671 26737 28036 I LlamaMM : ==================================================
	Line 20940: 04-18 12:34:04.671 26737 28036 I LlamaMM : chatImage: 【图像对话开始】
	Line 20941: 04-18 12:34:04.671 26737 28036 I LlamaMM : ==================================================
	Line 20942: 04-18 12:34:04.671 26737 28036 I LlamaMM : 所有实例正常 ✅
	Line 20943: 04-18 12:34:04.671 26737 28036 I LlamaMM : 最终 Prompt:
	Line 20944: 04-18 12:34:04.671 26737 28036 I LlamaMM : <|im_start|>user
	Line 20945: 04-18 12:34:04.671 26737 28036 I LlamaMM : <__media__>
	Line 20946: 04-18 12:34:04.671 26737 28036 I LlamaMM : 识别图片文字<|im_end|>
	Line 20947: 04-18 12:34:04.671 26737 28036 I LlamaMM : <|im_start|>assistant
	Line 20948: 04-18 12:34:04.671 26737 28036 I LlamaMM : 开始读取 Bitmap
	Line 20949: 04-18 12:34:04.671 26737 28036 I LlamaMM : 图片大小: 448 x 448
	Line 20958: 04-18 12:34:04.681 26737 28036 I LlamaMM : BitmapRGB 完成,已缩放到 224x224Line 20960: 04-18 12:34:04.683 26737 28036 I LlamaMM : mtmd_bitmap_init 成功 ✅
	Line 20961: 04-18 12:34:04.683 26737 28036 I LlamaMM : mtmd_input_chunks_init 成功 ✅
	Line 20962: 04-18 12:34:04.683 26737 28036 I LlamaMM : ==== 开始 mtmd_tokenize ====
	Line 21048: 04-18 12:34:04.742 26737 28036 I LlamaMM : ==== mtmd_tokenize 完成 ====
	Line 21049: 04-18 12:34:04.742 26737 28036 I LlamaMM : chunk 数量 = 3
	Line 21050: 04-18 12:34:04.743 26737 28036 I LlamaMM : embd_dim = 2048
	Line 21051: 04-18 12:34:04.743 26737 28036 I LlamaMM : --------------------------------
	Line 21052: 04-18 12:34:04.743 26737 28036 I LlamaMM : 处理 chunk 0 | type = 0
	Line 21053: 04-18 12:34:04.743 26737 28036 I LlamaMM : 文本 token 数量 = 9
	Line 21054: 04-18 12:34:04.743 26737 28036 I LlamaMM : token [0] = 151644
	Line 21055: 04-18 12:34:04.743 26737 28036 I LlamaMM : 文本 llama_decode ret = -1 | pos = 1
	Line 21056: 04-18 12:34:04.743 26737 28036 I LlamaMM : token [1] = 872
	Line 23190: 04-18 12:34:13.791 26737 28036 I LlamaMM : 文本 llama_decode ret = 0 | pos = 2
	Line 23191: 04-18 12:34:13.792 26737 28036 I LlamaMM : token [2] = 198
	Line 23926: 04-18 12:34:27.729 26737 28036 I LlamaMM : 文本 llama_decode ret = 0 | pos = 3
	Line 23927: 04-18 12:34:27.729 26737 28036 I LlamaMM : token [3] = 27
	Line 33500: 04-18 12:35:20.801 26737 28036 I LlamaMM : 文本 llama_decode ret = 0 | pos = 4
	Line 33501: 04-18 12:35:20.801 26737 28036 I LlamaMM : token [4] = 91
	Line 33503: 04-18 12:35:20.805 26737 28036 I LlamaMM : 文本 llama_decode ret = -1 | pos = 5
	Line 33504: 04-18 12:35:20.805 26737 28036 I LlamaMM : token [5] = 13013
	Line 33505: 04-18 12:35:20.805 26737 28036 I LlamaMM : 文本 llama_decode ret = -1 | pos = 6
	Line 33506: 04-18 12:35:20.805 26737 28036 I LlamaMM : token [6] = 4906
	Line 33507: 04-18 12:35:20.805 26737 28036 I LlamaMM : 文本 llama_decode ret = -1 | pos = 7
	Line 33508: 04-18 12:35:20.805 26737 28036 I LlamaMM : token [7] = 91
	Line 33509: 04-18 12:35:20.805 26737 28036 I LlamaMM : 文本 llama_decode ret = -1 | pos = 8
	Line 33510: 04-18 12:35:20.805 26737 28036 I LlamaMM : token [8] = 29
	Line 33511: 04-18 12:35:20.805 26737 28036 I LlamaMM : 文本 llama_decode ret = -1 | pos = 9
	Line 33512: 04-18 12:35:20.806 26737 28036 I LlamaMM : --------------------------------
	Line 33513: 04-18 12:35:20.806 26737 28036 I LlamaMM : 处理 chunk 1 | type = 1
	Line 33514: 04-18 12:35:20.806 26737 28036 I LlamaMM : 图像 chunk → 开始编码(224x224Line 76739: 04-18 12:41:05.321 26737 28036 I LlamaMM : 图像编码完成 ✅
	Line 76740: 04-18 12:41:05.322 26737 28036 I LlamaMM : 开始解码图像 embedding ...
	Line 76741: 04-18 12:41:05.322 26737 28036 I LlamaMM : 图像 token 数量 = 64
	--------- beginning of crash
04-18 12:41:05.322 26737 28036 F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 28036 (Thread-12), pid 26737 (ample.llamatest)
04-18 12:41:05.485  1431  1431 D vendor.oplus.hardware.charger-V6-service: get_ui_charge_icon_type 0 0 0 0 0
04-18 12:41:05.488  1431  1431 D vendor.oplus.hardware.charger-V6-service: oplus_chg_get_cpa_power 2500 charge_online 1 retention_topic 0
04-18 12:41:05.500  1818  4063 I OplusHansManager : uid=10116, pkg=com.baidu.input_oppo cannot transition from R to M, importance=input
04-18 12:41:05.504  1818  4137 I OplusCpuLimitManager: OplusCpuHighLoad: newLoadLevel=1, newMngLevel=1, (nl_load=1, nl_thermal=1, nl_scene=1), curMngLevel=1, CL=54, curWinCL=54, curShortWinCL= 56, latestCL=52, latestThermalLevel=8, expHlScene=false
04-18 12:41:05.505  6682   495 D ShellSubscriberWorkerThread: the first 8 bytes empty, and RestData length > 8 
04-18 12:41:05.547  4057  4400 D CommonApp-559-10432: DefaultState EVENT_SCENARIOS_CHANGE
04-18 12:41:05.547  4057  4400 D CommonApp-559-10432: scene=COMMON pn=com.example.llamatest
04-18 12:41:05.650 30525 30525 E crash_dump64: failed to get the guest state header for thread 26737: Bad address
04-18 12:41:05.651 30525 30525 E crash_dump64: failed to get the guest state header for thread 26747: Bad address
04-18 12:41:05.652 30525 30525 E crash_dump64: failed to get the guest state header for thread 26748: Bad address
04-18 12:41:05.652 30525 30525 E crash_dump64: failed to get the guest state header for thread 26749: Bad address
04-18 12:41:05.652 30525 30525 E crash_dump64: failed to get the guest state header for thread 26750: Bad address
04-18 12:41:05.652 30525 30525 E crash_dump64: failed to get the guest state header for thread 26751: Bad address
04-18 12:41:05.653 30525 30525 E crash_dump64: failed to get the guest state header for thread 26752: Bad address
04-18 12:41:05.653 30525 30525 E crash_dump64: failed to get the guest state header for thread 26753: Bad address
04-18 12:41:05.653 30525 30525 E crash_dump64: failed to get the guest state header for thread 26755: Bad address
04-18 12:41:05.654 30525 30525 E crash_dump64: failed to get the guest state header for thread 26757: Bad address
04-18 12:41:05.654 30525 30525 E crash_dump64: failed to get the guest state header for thread 26758: Bad address
04-18 12:41:05.654 30525 30525 E crash_dump64: failed to get the guest state header for thread 26765: Bad address
04-18 12:41:05.655 30525 30525 E crash_dump64: failed to get the guest state header for thread 26836: Bad address
04-18 12:41:05.655 30525 30525 E crash_dump64: failed to get the guest state header for thread 26848: Bad address
04-18 12:41:05.655 30525 30525 E crash_dump64: failed to get the guest state header for thread 26895: Bad address
04-18 12:41:05.656 30525 30525 E crash_dump64: failed to get the guest state header for thread 26913: Bad address
04-18 12:41:05.657 30525 30525 E crash_dump64: failed to get the guest state header for thread 26934: Bad address
04-18 12:41:05.657 30525 30525 E crash_dump64: failed to get the guest state header for thread 26940: Bad address
04-18 12:41:05.657 30525 30525 E crash_dump64: failed to get the guest state header for thread 26942: Bad address
04-18 12:41:05.657 30525 30525 E crash_dump64: failed to get the guest state header for thread 26959: Bad address
04-18 12:41:05.657 30525 30525 E crash_dump64: failed to get the guest state header for thread 26964: Bad address
04-18 12:41:05.658 30525 30525 E crash_dump64: failed to get the guest state header for thread 27288: Bad address
04-18 12:41:05.658 30525 30525 E crash_dump64: failed to get the guest state header for thread 27294: Bad address
04-18 12:41:05.659 30525 30525 E crash_dump64: failed to get the guest state header for thread 27507: Bad address
04-18 12:41:05.659 30525 30525 E crash_dump64: failed to get the guest state header for thread 28036: Bad address
04-18 12:41:05.659 30525 30525 E crash_dump64: failed to get the guest state header for thread 28042: Bad address
04-18 12:41:05.659 30525 30525 E crash_dump64: failed to get the guest state header for thread 28043: Bad address
04-18 12:41:05.659 30525 30525 E crash_dump64: failed to get the guest state header for thread 28044: Bad address
04-18 12:41:05.660 30525 30525 E crash_dump64: failed to get the guest state header for thread 28624: Bad address
04-18 12:41:05.661 30525 30525 E crash_dump64: failed to get the guest state header for thread 28982: Bad address
04-18 12:41:05.661 30525 30525 E crash_dump64: failed to get the guest state header for thread 29612: Bad address
04-18 12:41:05.662 30525 30525 E crash_dump64: failed to get the guest state header for thread 30279: Bad address
04-18 12:41:05.662 30525 30525 E crash_dump64: failed to get the guest state header for thread 30430: Bad address
04-18 12:41:07.112 30525 30525 F DEBUG   : Process name is com.example.llamatest, uid is 10432, not key_process
04-18 12:41:07.112 30525 30525 F DEBUG   : keyProcess: 0
04-18 12:41:07.112 30525 30525 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
04-18 12:41:07.112 30525 30525 F DEBUG   : Build fingerprint: 'OPPO/PJC110/OP591D:15/AP3A.240617.008/T.221035c-7152f-1:user/release-keys'
04-18 12:41:07.112 30525 30525 F DEBUG   : Revision: '0'
04-18 12:41:07.112 30525 30525 F DEBUG   : ABI: 'arm64'
04-18 12:41:07.112 30525 30525 F DEBUG   : Timestamp: 2026-04-18 12:41:05.910132574+0800
04-18 12:41:07.112 30525 30525 F DEBUG   : Process uptime: 492s
04-18 12:41:07.112 30525 30525 F DEBUG   : Cmdline: com.example.llamatest
04-18 12:41:07.112 30525 30525 F DEBUG   : pid: 26737, tid: 28036, name: Thread-12  >>> com.example.llamatest <<<
04-18 12:41:07.112 30525 30525 F DEBUG   : uid: 10432
04-18 12:41:07.112 30525 30525 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000000
04-18 12:41:07.112 30525 30525 F DEBUG   : Cause: null pointer dereference
04-18 12:41:07.113 30525 30525 F DEBUG   :     x0  0000000000000001  x1  b4000077551e7480  x2  00000076da1a3cd8  x3  0000000000000003
04-18 12:41:07.113 30525 30525 F DEBUG   :     x4  0000000000000000  x5  8080000000000000  x6  000000006574616d  x7  0000000000008080
04-18 12:41:07.113 30525 30525 F DEBUG   :     x8  0000000000000000  x9  0000000000000000  x10 b4000076de36f240  x11 0000000000000000
04-18 12:41:07.113 30525 30525 F DEBUG   :     x12 00000076da1a3d10  x13 0000000000000019  x14 00000076da1a5048  x15 0000000034155555
04-18 12:41:07.113 30525 30525 F DEBUG   :     x16 0000000000000001  x17 00000077dfca74ac  x18 00000074d8184000  x19 b40000773a2c9c00
04-18 12:41:07.113 30525 30525 F DEBUG   :     x20 0000000000000000  x21 b40000773a2c9cd0  x22 0000000000000003  x23 0000000000000003
04-18 12:41:07.113 30525 30525 F DEBUG   :     x24 00000077e4b4e430  x25 0000000000000000  x26 00000077e4b4e430  x27 0000000000000000
04-18 12:41:07.113 30525 30525 F DEBUG   :     x28 00000076da1a59b0  x29 00000076da1a5980
04-18 12:41:07.113 30525 30525 F DEBUG   :     lr  00000076bb53569c  sp  00000076da1a54a0  pc  00000076bb535728  pst 0000000080001000
04-18 12:41:07.113 30525 30525 F DEBUG   : 24 total frames
04-18 12:41:07.113 30525 30525 F DEBUG   : backtrace:
04-18 12:41:07.113 30525 30525 F DEBUG   :       #00 pc 0000000000076728  /data/app/~~lQupF08BDgVSXtqvfKhhSA==/com.example.llamatest-bKMVgzwXB9VK2ogYVV0EWg==/base.apk!libllama_jni.so (offset 0xc0000) (Java_com_example_llamatest_llamaBridge_chatImage+2440) (BuildId: 2eac2bbe892efb7db4758ef558a9b2ffbff17079)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #01 pc 0000000000534170  /apex/com.android.art/lib64/libart.so (art_quick_generic_jni_trampoline+144) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #02 pc 000000000051d974  /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+612) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #03 pc 000000000051b298  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+2192) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #04 pc 0000000000673388  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #05 pc 00000000005367d8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #06 pc 0000000000001534  /data/app/~~lQupF08BDgVSXtqvfKhhSA==/com.example.llamatest-bKMVgzwXB9VK2ogYVV0EWg==/base.apk (com.example.llamatest.MainActivity.onActivityResult$lambda$21+0)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #07 pc 0000000000512a7c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.6886435955882106544)+68) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #08 pc 0000000000512c34  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.6886435955882106544)+384) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #09 pc 000000000051b5e8  /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall<false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+3040) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #10 pc 0000000000673388  /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp<false>(art::interpreter::SwitchImplContext*)+16624) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #11 pc 00000000005367d8  /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #12 pc 0000000000001100  /data/app/~~lQupF08BDgVSXtqvfKhhSA==/com.example.llamatest-bKMVgzwXB9VK2ogYVV0EWg==/base.apk (com.example.llamatest.MainActivity$$ExternalSyntheticLambda3.run+0)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #13 pc 0000000000512a7c  /apex/com.android.art/lib64/libart.so (art::interpreter::ExecuteSwitch(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool) (.__uniq.112435418011751916792819755956732575238.llvm.6886435955882106544)+68) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #14 pc 0000000000512c34  /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.6886435955882106544)+384) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #15 pc 000000000051433c  /apex/com.android.art/lib64/libart.so (artQuickToInterpreterBridge+532) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #16 pc 0000000000534298  /apex/com.android.art/lib64/libart.so (art_quick_to_interpreter_bridge+88) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #17 pc 000000000202a840  /memfd:jit-cache (deleted) (offset 0x2000000) (java.lang.Thread.run+144)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #18 pc 000000000051d974  /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+612) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #19 pc 00000000004a2d70  /apex/com.android.art/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+140) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #20 pc 0000000000bb0c4c  /apex/com.android.art/lib64/libart.so (art::detail::ShortyTraits<(char)86>::Type art::ArtMethod::InvokeInstance<(char)86>(art::Thread*, art::ObjPtr<art::mirror::Object>, art::detail::ShortyTraits<>::Type...)+64) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #21 pc 00000000006a380c  /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+788) (BuildId: 6bcc64d9d5f20015626d4ef502940ca3)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #22 pc 00000000000a3ce8  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+196) (BuildId: 1d6558a3b88dbb195284ac1e713c1e3c)
04-18 12:41:07.113 30525 30525 F DEBUG   :       #23 pc 000000000009614c  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: 1d6558a3b88dbb195284ac1e713c1e3c)
04-18 12:41:07.169   832   832 E tombstoned: Tombstone written to: tombstone_18

五、一句话总结

Windows + Android Studio + llama.cpp + Qwen2.5-Omni-3B 全流程移植 完全可行文本对话正常,图片+文字多模态处理场景下手机硬件吃不消,后期考虑升级硬件,使用异构计算优化。