Global Military Intelligence Powered by GLM4GLM4-MILINT: Glo

GLM4-MILINT: Global Military Intelligence Powered by GLM4

项目简介

GLM4-MILINT 是一个基于公开领域高质量军事情报采集与分析的平台，利用GLM4（General Language Model 4）模型对全球军事情报进行结构化处理。该项目旨在通过自然语言处理、图像识别和多模态分析技术，自动化地将海量军事数据转化为精准、有用的情报信息。

功能特色

全球情报采集：从公开领域（新闻、政府声明、卫星图像、社交媒体等）中自动化收集军事数据。
- 以下是一些在公开领域获取军事情报和相关信息的推荐网站，包括新闻、政府声明、卫星图像和社交媒体平台：

新闻网站

BBC News: bbc.com/news
- 提供全球新闻，涵盖政治、军事、国际关系等话题。
CNN: cnn.com
- 提供详细的国际新闻和军事报道。
Reuters: reuters.com
- 提供全球新闻和最新的军事动态。
The New York Times: nytimes.com
- 详细的国际新闻报道和军事分析。
Al Jazeera: aljazeera.com
- 提供中东及全球新闻，包括军事冲突和安全问题。

政府声明与官方信息

美国国防部（DoD）: defense.gov
- 提供关于美国军事政策、声明、新闻发布等信息。
英国国防部（MOD）: gov.uk/government/…
- 提供关于英国军队的新闻和公告。
中国国防部: mod.gov.cn
- 提供中国军事新闻和官方声明。
欧盟军事安全委员会（EUMS）: eeas.europa.eu
- 提供有关欧盟军事和安全政策的信息。

卫星图像服务

Google Earth: earth.google.com
- 提供全球卫星图像和地理信息。
Sentinel Hub: sentinel-hub.com
- 提供Sentinel卫星的高分辨率图像和数据服务。
Maxar Technologies: maxar.com
- 提供高分辨率商业卫星图像和地理空间数据。
NASA Worldview: worldview.earthdata.nasa.gov
- 提供实时卫星图像和环境数据。

社交媒体平台

Twitter: twitter.com
- 关注军事专家、新闻机构和政府部门的官方账号，获取即时更新和情报。
Reddit: reddit.com/r/Military
- 提供讨论和分享军事新闻和情报的社区。
Facebook: facebook.com
- 关注军事新闻页面和相关组织，以获取最新信息。
LinkedIn: linkedin.com
- 关注军事分析师和国防专家的动态，获取行业内的专业见解。

综合情报平台

Jane's Defence: janes.com
- 提供全球军事和防务新闻、分析和数据库。
GlobalSecurity.org: globalsecurity.org
- 提供军事、国防和安全相关的综合信息。

这些资源涵盖了广泛的军事和安全信息，可以帮助获取、分析和验证军事情报。根据需要，可以选择最适合的来源进行数据采集和分析。

基于GLM4的情报分析：使用GLM4大模型进行文本处理与图像识别，提取军事动向、战略分析等关键信息。
结构化情报输出：生成知识图谱、情报报告、地缘政治分析、军事趋势预测等，提供可视化展示。
实时更新：持续采集和分析新数据，保证情报的时效性和准确性。

使用的技术

模型: GLM4 (General Language Model 4)
自然语言处理 (NLP): 提取军事实体、事件、关系，生成情报摘要和报告。
计算机视觉 (CV): 基于卫星图像识别军事设施、装备与活动。
多模态融合: 综合处理文本与图像数据，生成更全面的情报。
数据可视化: 使用地图、图表、时间轴等形式展示军事情报。

安装与部署

安装国内下载模型及qwen2-7B-Instruct所需要的依赖包

pip install modelscope
pip install transformers==4.41.2  
pip install accelerate

国内用户下载模型代码

from modelscope import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "qwen/Qwen2-7B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen2-7B-Instruct")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

安装 xfastertransformer

pip install xfastertransformer

模型转换成xft格式

python -c 'import xfastertransformer as xft; xft.Qwen2Convert().convert("/root/.cache/modelscope/hub/qwen/Qwen2-7B-Instruct","/data/qwen/Qwen2-7B-Instruct-xft")'

转换后利用以下代码进行推理

import xfastertransformer
from transformers import AutoTokenizer, TextStreamer
# Assume huggingface model dir is `/data/chatglm-6b-hf` and converted model dir is `/data/chatglm-6b-xft`.
MODEL_PATH="/data/qwen/Qwen2-7B-Instruct-xft"
TOKEN_PATH="/root/.cache/modelscope/hub/qwen/Qwen2-7B-Instruct"

INPUT_PROMPT = "Once upon a time, there existed a little girl who liked to have adventures."
tokenizer = AutoTokenizer.from_pretrained(TOKEN_PATH, use_fast=False, padding_side="left", trust_remote_code=True)
streamer = TextStreamer(tokenizer, skip_special_tokens=True, skip_prompt=False)

input_ids = tokenizer(INPUT_PROMPT, return_tensors="pt", padding=False).input_ids
model = xfastertransformer.AutoModel.from_pretrained(MODEL_PATH, dtype="bf16")
generated_ids = model.generate(input_ids, max_length=200, streamer=streamer)

异常情况及解决方案

[INFO] SeqLen > FLASH_ATTN_THRESHOLD(8192) will enable FlashAttn.
[INFO] ENABLE_TUNED_COMM is enabled for faster reduceAdd.
[INFO] ENABLE_KV_TRANS is enabled for faster decoding.
[INFO] SINGLE_INSTANCE MODE.
Illegal instruction (core dumped)

工程师给出的回复是需要4代以上的英特尔CPU处理器所以本次实验更换了16 vCPU Intel(R) Xeon(R) Platinum 8481C展开实验

先决条件

Python 3.8+
PyTorch 或 TensorFlow
CUDA（用于GPU加速）
相关的Python依赖包（见requirements.txt）

安装步骤

克隆本项目：

git clone https://github.com/yourusername/GLM4-MILINT.git
cd GLM4-MILINT

安装依赖：
```
pip install -r requirements.txt
```
配置GLM4模型：
- 下载GLM4模型权重并配置路径。
- 请参考GLM4官方文档进行详细的模型设置。
数据采集模块配置：
- 配置爬虫与数据采集API密钥（如卫星图像服务、新闻API等）。
- 修改config.yaml文件中的采集源与频率设置。
运行项目： 1 . 基于xfastertransformer xpu环境的验证代码：
```
python xfastertransformer_military_intelligence_extractor.py
```
2 . 基于xfastertransformer xpu环境的情报解析入库代码：
```
python xfastertransformer_military_intelligence_parser.py
```
3 . 通过pipeline进行问答，用户输入问题，加入指令、MongoDB数据字段结构，生成指令，基于glm4进行推理，验证MongoDB命令是否正确如果正确返回答案。
```
python xft_intel_pipeline.py
```

项目结构

GLM4-MILINT/
│
├── data/               # 数据存储目录
├── model/              # GLM4模型相关代码
├── config.yaml         # 项目配置文件
└── README.md           # 项目说明文档

使用示例

1. 情报采集

python run_milint.py --collect-data

自动从多个公开来源采集军事情报数据。

2. 情报分析与生成报告

python run_milint.py --analyze

运行GLM4模型，对采集到的数据进行分析并生成情报报告。

军事知识图谱抽取效果截图.png

3. 查看情报报告

生成的报告将保存在reports/目录中，以PDF或Markdown格式输出，供用户浏览。

贡献指南

欢迎贡献代码和改进项目！请通过以下步骤提交贡献：

Fork本项目。
创建新分支并进行开发。
提交PR请求，描述所做的改进。