自动采集、SEO处理与AI改写发布系统

63 阅读8分钟

系统架构

  1. 前端界面 (HTML5+CSS+JS): 管理员控制面板
  2. 后端处理 (PHP): 处理采集、改写和发布逻辑
  3. 数据存储 (JSON): 存储配置和临时数据
  4. AI API集成: 调用智能AI进行文章改写

实现步骤

1. 前端控制面板 (index.html)

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>文章自动采集发布系统</title>
    <style>
        body {
            font-family: 'Arial', sans-serif;
            line-height: 1.6;
            margin: 0;
            padding: 20px;
            background-color: #f5f5f5;
            color: #333;
        }
        .container {
            max-width: 1200px;
            margin: 0 auto;
            background: #fff;
            padding: 20px;
            border-radius: 8px;
            box-shadow: 0 0 10px rgba(0,0,0,0.1);
        }
        h1 {
            color: #2c3e50;
            text-align: center;
        }
        .panel {
            margin-bottom: 20px;
            padding: 15px;
            border: 1px solid #ddd;
            border-radius: 5px;
            background-color: #f9f9f9;
        }
        .form-group {
            margin-bottom: 15px;
        }
        label {
            display: block;
            margin-bottom: 5px;
            font-weight: bold;
        }
        input, select, textarea {
            width: 100%;
            padding: 8px;
            border: 1px solid #ddd;
            border-radius: 4px;
            box-sizing: border-box;
        }
        button {
            background-color: #3498db;
            color: white;
            border: none;
            padding: 10px 15px;
            border-radius: 4px;
            cursor: pointer;
            font-size: 16px;
        }
        button:hover {
            background-color: #2980b9;
        }
        #status {
            margin-top: 20px;
            padding: 10px;
            border-radius: 4px;
        }
        .success {
            background-color: #d4edda;
            color: #155724;
        }
        .error {
            background-color: #f8d7da;
            color: #721c24;
        }
        .log-container {
            max-height: 300px;
            overflow-y: auto;
            border: 1px solid #ddd;
            padding: 10px;
            background-color: #f8f9fa;
            font-family: monospace;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1>文章自动采集发布系统</h1>
        
        <div class="panel">
            <h2>采集设置</h2>
            <div class="form-group">
                <label for="sourceUrl">源网站URL/RSS</label>
                <input type="text" id="sourceUrl" placeholder="https://example.com/feed">
            </div>
            <div class="form-group">
                <label for="crawlInterval">采集间隔 (分钟)</label>
                <input type="number" id="crawlInterval" value="60" min="15">
            </div>
            <button id="startCrawling">开始自动采集</button>
            <button id="stopCrawling">停止采集</button>
        </div>
        
        <div class="panel">
            <h2>AI改写设置</h2>
            <div class="form-group">
                <label for="aiApiKey">AI API Key</label>
                <input type="text" id="aiApiKey" placeholder="输入AI服务的API密钥">
            </div>
            <div class="form-group">
                <label for="rewriteStyle">改写风格</label>
                <select id="rewriteStyle">
                    <option value="professional">专业风格</option>
                    <option value="casual">轻松风格</option>
                    <option value="academic">学术风格</option>
                    <option value="creative">创意风格</option>
                </select>
            </div>
            <div class="form-group">
                <label for="rewriteLevel">改写程度</label>
                <select id="rewriteLevel">
                    <option value="light">轻度改写</option>
                    <option value="medium">中度改写</option>
                    <option value="heavy">深度改写</option>
                </select>
            </div>
            <button id="testRewrite">测试改写</button>
        </div>
        
        <div class="panel">
            <h2>SEO设置</h2>
            <div class="form-group">
                <label for="targetKeywords">目标关键词 (逗号分隔)</label>
                <input type="text" id="targetKeywords" placeholder="SEO, 内容营销, 网站优化">
            </div>
            <div class="form-group">
                <label for="metaTemplate">Meta描述模板</label>
                <textarea id="metaTemplate" rows="3">本文探讨了{TOPIC},详细介绍了{KEYWORDS}等相关内容,帮助您更好地理解这一主题。</textarea>
            </div>
            <div class="form-group">
                <label>
                    <input type="checkbox" id="autoInternalLinking" checked> 自动内部链接
                </label>
            </div>
        </div>
        
        <div class="panel">
            <h2>发布设置</h2>
            <div class="form-group">
                <label for="postStatus">发布状态</label>
                <select id="postStatus">
                    <option value="publish">立即发布</option>
                    <option value="draft">保存为草稿</option>
                    <option value="pending">等待审核</option>
                </select>
            </div>
            <div class="form-group">
                <label for="postCategory">默认分类</label>
                <input type="text" id="postCategory" placeholder="技术文章">
            </div>
            <div class="form-group">
                <label for="postTags">默认标签 (逗号分隔)</label>
                <input type="text" id="postTags" placeholder="AI, 自动化, SEO">
            </div>
            <button id="saveSettings">保存所有设置</button>
        </div>
        
        <div class="panel">
            <h2>系统状态</h2>
            <div id="status">系统就绪</div>
            <h3>操作日志</h3>
            <div class="log-container" id="logOutput">
                <!-- 日志将在这里显示 -->
            </div>
            <button id="clearLogs">清除日志</button>
        </div>
    </div>

    <script>
        document.addEventListener('DOMContentLoaded', function() {
            // 加载保存的设置
            loadSettings();
            
            // 绑定按钮事件
            document.getElementById('startCrawling').addEventListener('click', startCrawling);
            document.getElementById('stopCrawling').addEventListener('click', stopCrawling);
            document.getElementById('testRewrite').addEventListener('click', testRewrite);
            document.getElementById('saveSettings').addEventListener('click', saveSettings);
            document.getElementById('clearLogs').addEventListener('click', clearLogs);
            
            // 检查是否有自动采集任务在运行
            checkCrawlerStatus();
        });
        
        function loadSettings() {
            fetch('api.php?action=get_settings')
                .then(response => response.json())
                .then(data => {
                    if(data.success) {
                        const settings = data.settings;
                        document.getElementById('sourceUrl').value = settings.sourceUrl || '';
                        document.getElementById('crawlInterval').value = settings.crawlInterval || 60;
                        document.getElementById('aiApiKey').value = settings.aiApiKey || '';
                        document.getElementById('rewriteStyle').value = settings.rewriteStyle || 'professional';
                        document.getElementById('rewriteLevel').value = settings.rewriteLevel || 'medium';
                        document.getElementById('targetKeywords').value = settings.targetKeywords || '';
                        document.getElementById('metaTemplate').value = settings.metaTemplate || '';
                        document.getElementById('autoInternalLinking').checked = settings.autoInternalLinking !== false;
                        document.getElementById('postStatus').value = settings.postStatus || 'publish';
                        document.getElementById('postCategory').value = settings.postCategory || '';
                        document.getElementById('postTags').value = settings.postTags || '';
                        
                        updateStatus('设置已加载', 'success');
                    } else {
                        updateStatus('加载设置失败: ' + data.message, 'error');
                    }
                })
                .catch(error => {
                    updateStatus('加载设置时出错: ' + error, 'error');
                });
        }
        
        function saveSettings() {
            const settings = {
                sourceUrl: document.getElementById('sourceUrl').value,
                crawlInterval: document.getElementById('crawlInterval').value,
                aiApiKey: document.getElementById('aiApiKey').value,
                rewriteStyle: document.getElementById('rewriteStyle').value,
                rewriteLevel: document.getElementById('rewriteLevel').value,
                targetKeywords: document.getElementById('targetKeywords').value,
                metaTemplate: document.getElementById('metaTemplate').value,
                autoInternalLinking: document.getElementById('autoInternalLinking').checked,
                postStatus: document.getElementById('postStatus').value,
                postCategory: document.getElementById('postCategory').value,
                postTags: document.getElementById('postTags').value
            };
            
            fetch('api.php?action=save_settings', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify(settings)
            })
            .then(response => response.json())
            .then(data => {
                if(data.success) {
                    updateStatus('设置已保存', 'success');
                } else {
                    updateStatus('保存设置失败: ' + data.message, 'error');
                }
            })
            .catch(error => {
                updateStatus('保存设置时出错: ' + error, 'error');
            });
        }
        
        function startCrawling() {
            fetch('api.php?action=start_crawling')
                .then(response => response.json())
                .then(data => {
                    if(data.success) {
                        updateStatus('自动采集已启动', 'success');
                        addLog('自动采集服务已启动');
                    } else {
                        updateStatus('启动采集失败: ' + data.message, 'error');
                        addLog('启动采集失败: ' + data.message);
                    }
                })
                .catch(error => {
                    updateStatus('启动采集时出错: ' + error, 'error');
                    addLog('启动采集时出错: ' + error);
                });
        }
        
        function stopCrawling() {
            fetch('api.php?action=stop_crawling')
                .then(response => response.json())
                .then(data => {
                    if(data.success) {
                        updateStatus('自动采集已停止', 'success');
                        addLog('自动采集服务已停止');
                    } else {
                        updateStatus('停止采集失败: ' + data.message, 'error');
                        addLog('停止采集失败: ' + data.message);
                    }
                })
                .catch(error => {
                    updateStatus('停止采集时出错: ' + error, 'error');
                    addLog('停止采集时出错: ' + error);
                });
        }
        
        function testRewrite() {
            const testText = "人工智能正在改变我们与技术互动的方式。从语音助手到推荐算法,AI已成为我们日常生活中不可或缺的一部分。";
            
            fetch('api.php?action=test_rewrite', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify({
                    text: testText,
                    style: document.getElementById('rewriteStyle').value,
                    level: document.getElementById('rewriteLevel').value
                })
            })
            .then(response => response.json())
            .then(data => {
                if(data.success) {
                    updateStatus('测试改写成功', 'success');
                    addLog('原始文本: ' + testText);
                    addLog('改写结果: ' + data.rewrittenText);
                } else {
                    updateStatus('测试改写失败: ' + data.message, 'error');
                    addLog('测试改写失败: ' + data.message);
                }
            })
            .catch(error => {
                updateStatus('测试改写时出错: ' + error, 'error');
                addLog('测试改写时出错: ' + error);
            });
        }
        
        function checkCrawlerStatus() {
            fetch('api.php?action=check_crawler_status')
                .then(response => response.json())
                .then(data => {
                    if(data.success && data.running) {
                        updateStatus('自动采集服务正在运行', 'success');
                    }
                })
                .catch(error => {
                    console.error('检查采集状态时出错:', error);
                });
        }
        
        function clearLogs() {
            document.getElementById('logOutput').innerHTML = '';
            addLog('日志已清除');
        }
        
        function updateStatus(message, type) {
            const statusElement = document.getElementById('status');
            statusElement.textContent = message;
            statusElement.className = type || '';
        }
        
        function addLog(message) {
            const logElement = document.getElementById('logOutput');
            const timestamp = new Date().toLocaleTimeString();
            const logEntry = document.createElement('div');
            logEntry.textContent = `[${timestamp}] ${message}`;
            logElement.appendChild(logEntry);
            logElement.scrollTop = logElement.scrollHeight;
        }
    </script>
</body>
</html>

2. 后端API处理 (api.php)

<?php
header('Content-Type: application/json');
header('Access-Control-Allow-Origin: *');
header('Access-Control-Allow-Methods: GET, POST');
header('Access-Control-Allow-Headers: Content-Type');

// 定义配置文件和锁文件路径
define('CONFIG_FILE', 'config.json');
define('CRAWLER_LOCK_FILE', 'crawler.lock');

// 主路由
$action = $_GET['action'] ?? '';

switch ($action) {
    case 'get_settings':
        getSettings();
        break;
    case 'save_settings':
        saveSettings();
        break;
    case 'start_crawling':
        startCrawling();
        break;
    case 'stop_crawling':
        stopCrawling();
        break;
    case 'check_crawler_status':
        checkCrawlerStatus();
        break;
    case 'test_rewrite':
        testRewrite();
        break;
    default:
        echo json_encode(['success' => false, 'message' => '无效的操作']);
        break;
}

// 获取设置
function getSettings() {
    if (file_exists(CONFIG_FILE)) {
        $settings = json_decode(file_get_contents(CONFIG_FILE), true);
        echo json_encode(['success' => true, 'settings' => $settings]);
    } else {
        // 返回默认设置
        $defaultSettings = [
            'sourceUrl' => '',
            'crawlInterval' => 60,
            'aiApiKey' => '',
            'rewriteStyle' => 'professional',
            'rewriteLevel' => 'medium',
            'targetKeywords' => '',
            'metaTemplate' => '本文探讨了{TOPIC},详细介绍了{KEYWORDS}等相关内容,帮助您更好地理解这一主题。',
            'autoInternalLinking' => true,
            'postStatus' => 'publish',
            'postCategory' => '',
            'postTags' => ''
        ];
        echo json_encode(['success' => true, 'settings' => $defaultSettings]);
    }
}

// 保存设置
function saveSettings() {
    $json = file_get_contents('php://input');
    $data = json_decode($json, true);
    
    if (json_last_error() !== JSON_ERROR_NONE) {
        echo json_encode(['success' => false, 'message' => '无效的JSON数据']);
        return;
    }
    
    // 验证必要字段
    if (empty($data['sourceUrl'])) {
        echo json_encode(['success' => false, 'message' => '源URL不能为空']);
        return;
    }
    
    // 保存到文件
    if (file_put_contents(CONFIG_FILE, json_encode($data, JSON_PRETTY_PRINT))) {
        echo json_encode(['success' => true]);
    } else {
        echo json_encode(['success' => false, 'message' => '保存设置失败']);
    }
}

// 开始采集
function startCrawling() {
    $config = getConfig();
    
    // 检查是否已经有采集任务在运行
    if (file_exists(CRAWLER_LOCK_FILE)) {
        $lockTime = filemtime(CRAWLER_LOCK_FILE);
        if (time() - $lockTime < 300) { // 5分钟内的锁文件视为有效
            echo json_encode(['success' => false, 'message' => '采集服务已经在运行']);
            return;
        }
    }
    
    // 创建锁文件
    file_put_contents(CRAWLER_LOCK_FILE, time());
    
    // 在后台启动采集脚本
    $command = 'php crawler.php > /dev/null 2>&1 &';
    exec($command);
    
    echo json_encode(['success' => true]);
}

// 停止采集
function stopCrawling() {
    if (file_exists(CRAWLER_LOCK_FILE)) {
        unlink(CRAWLER_LOCK_FILE);
    }
    
    // 这里可以添加更多停止逻辑,如杀死进程等
    
    echo json_encode(['success' => true]);
}

// 检查采集状态
function checkCrawlerStatus() {
    $running = false;
    
    if (file_exists(CRAWLER_LOCK_FILE)) {
        $lockTime = filemtime(CRAWLER_LOCK_FILE);
        if (time() - $lockTime < 300) { // 5分钟内的锁文件视为有效
            $running = true;
        } else {
            // 过期的锁文件
            unlink(CRAWLER_LOCK_FILE);
        }
    }
    
    echo json_encode(['success' => true, 'running' => $running]);
}

// 测试改写
function testRewrite() {
    $json = file_get_contents('php://input');
    $data = json_decode($json, true);
    
    if (empty($data['text'])) {
        echo json_encode(['success' => false, 'message' => '需要提供测试文本']);
        return;
    }
    
    $config = getConfig();
    
    if (empty($config['aiApiKey'])) {
        echo json_encode(['success' => false, 'message' => '未配置AI API密钥']);
        return;
    }
    
    // 调用AI改写API
    $rewrittenText = callAIApi($data['text'], $data['style'], $data['level'], $config['aiApiKey']);
    
    if ($rewrittenText) {
        echo json_encode(['success' => true, 'rewrittenText' => $rewrittenText]);
    } else {
        echo json_encode(['success' => false, 'message' => '调用AI API失败']);
    }
}

// 调用AI API进行文章改写
function callAIApi($text, $style, $level, $apiKey) {
    // 这里使用模拟的API响应,实际项目中应该替换为真实的AI API调用
    // 例如OpenAI, GPT-3, 或其他文本改写API
    
    // 模拟不同风格的改写
    $rewrittenText = $text;
    
    if ($style === 'professional') {
        $rewrittenText = "从专业角度来看," . str_replace("改变", "革新", $text);
    } elseif ($style === 'casual') {
        $rewrittenText = "你知道吗?" . str_replace("正在", "正", $text);
    } elseif ($style === 'academic') {
        $rewrittenText = "研究表明," . str_replace("我们", "人类", $text);
    } elseif ($style === 'creative') {
        $rewrittenText = "想象一下," . str_replace("人工智能", "AI魔法", $text);
    }
    
    // 模拟不同改写程度
    if ($level === 'medium') {
        $rewrittenText = str_replace("技术", "科技", $rewrittenText);
    } elseif ($level === 'heavy') {
        $rewrittenText = str_replace("日常生活中", "每天的生活里", $rewrittenText);
        $rewrittenText = str_replace("不可或缺", "必不可少", $rewrittenText);
    }
    
    // 在实际项目中,这里应该是真实的API调用,例如:
    /*
    $client = new GuzzleHttp\Client();
    $response = $client->post('https://api.ai-service.com/rewrite', [
        'headers' => [
            'Authorization' => 'Bearer ' . $apiKey,
            'Content-Type' => 'application/json',
        ],
        'json' => [
            'text' => $text,
            'style' => $style,
            'level' => $level
        ]
    ]);
    
    $result = json_decode($response->getBody(), true);
    return $result['rewritten_text'] ?? null;
    */
    
    return $rewrittenText;
}

// 获取配置
function getConfig() {
    if (file_exists(CONFIG_FILE)) {
        return json_decode(file_get_contents(CONFIG_FILE), true);
    }
    return [];
}

// 模拟SEO处理函数
function processSEO($content, $keywords, $metaTemplate) {
    // 提取主题
    $topic = extractTopic($content);
    
    // 生成meta描述
    $metaDescription = str_replace(
        ['{TOPIC}', '{KEYWORDS}'],
        [$topic, implode(', ', $keywords)],
        $metaTemplate
    );
    
    // 关键词密度分析
    $keywordDensity = [];
    foreach ($keywords as $keyword) {
        $count = substr_count(strtolower($content), strtolower($keyword));
        $density = ($count / max(1, str_word_count($content))) * 100;
        $keywordDensity[$keyword] = round($density, 2);
    }
    
    // 添加H1,H2标签
    $content = addHeadingTags($content, $topic);
    
    // 添加内部链接
    $content = addInternalLinks($content);
    
    return [
        'content' => $content,
        'meta_description' => $metaDescription,
        'keyword_density' => $keywordDensity,
        'title' => generateTitle($topic, $keywords)
    ];
}

// 辅助函数 - 在实际项目中实现这些功能
function extractTopic($text) { return "人工智能"; }
function addHeadingTags($content, $topic) { return $content; }
function addInternalLinks($content) { return $content; }
function generateTitle($topic, $keywords) { return $topic . " - " . implode(", ", $keywords); }
?>

3. 采集器脚本 (crawler.php)

<?php
// 定义配置文件和锁文件路径
define('CONFIG_FILE', 'config.json');
define('CRAWLER_LOCK_FILE', 'crawler.lock');
define('CRAWLER_LOG_FILE', 'crawler.log');

// 检查锁文件
if (!file_exists(CRAWLER_LOCK_FILE)) {
    die("没有活动的采集任务\n");
}

// 加载配置
$config = json_decode(file_get_contents(CONFIG_FILE), true);
if (!$config) {
    logMessage("无法加载配置");
    exit;
}

// 检查必要配置
if (empty($config['sourceUrl'])) {
    logMessage("未配置源URL");
    exit;
}

// 主循环
while (file_exists(CRAWLER_LOCK_FILE)) {
    $startTime = time();
    
    try {
        logMessage("开始采集周期");
        
        // 1. 采集内容
        $articles = fetchArticles($config['sourceUrl']);
        
        if (!empty($articles)) {
            logMessage(sprintf("采集到 %d 篇文章", count($articles)));
            
            foreach ($articles as $article) {
                // 2. AI改写
                $rewrittenContent = callAIApi(
                    $article['content'],
                    $config['rewriteStyle'],
                    $config['rewriteLevel'],
                    $config['aiApiKey']
                );
                
                if (!$rewrittenContent) {
                    logMessage("文章改写失败: " . $article['title']);
                    continue;
                }
                
                // 3. SEO处理
                $keywords = !empty($config['targetKeywords']) ? 
                    explode(',', $config['targetKeywords']) : [];
                
                $seoResult = processSEO(
                    $rewrittenContent,
                    $keywords,
                    $config['metaTemplate']
                );
                
                // 4. 发布文章
                $postData = [
                    'title' => $seoResult['title'],
                    'content' => $seoResult['content'],
                    'meta_description' => $seoResult['meta_description'],
                    'status' => $config['postStatus'],
                    'category' => $config['postCategory'],
                    'tags' => !empty($config['postTags']) ? 
                        explode(',', $config['postTags']) : []
                ];
                
                $published = publishArticle($postData);
                
                if ($published) {
                    logMessage("文章发布成功: " . $postData['title']);
                } else {
                    logMessage("文章发布失败: " . $postData['title']);
                }
                
                // 避免过快发布
                sleep(5);
            }
        } else {
            logMessage("没有采集到新文章");
        }
        
        logMessage("采集周期完成");
    } catch (Exception $e) {
        logMessage("采集过程中出错: " . $e->getMessage());
    }
    
    // 计算下次运行时间
    $interval = isset($config['crawlInterval']) ? (int)$config['crawlInterval'] * 60 : 3600;
    $elapsed = time() - $startTime;
    $sleepTime = max(0, $interval - $elapsed);
    
    if ($sleepTime > 0) {
        logMessage(sprintf("等待 %d 秒直到下次采集", $sleepTime));
        sleep($sleepTime);
    }
}

logMessage("采集任务已停止");
exit;

// 采集文章函数
function fetchArticles($sourceUrl) {
    // 这里应该是实际的采集逻辑
    // 可能是解析RSS、爬取网页或调用API
    
    // 模拟返回一些文章
    return [
        [
            'title' => '人工智能的最新发展',
            'content' => '人工智能正在改变我们与技术互动的方式。从语音助手到推荐算法,AI已成为我们日常生活中不可或缺的一部分。',
            'url' => 'https://example.com/ai-advances'
        ],
        [
            'title' => 'SEO最佳实践2023',
            'content' => '搜索引擎优化在2023年有了新的趋势。内容质量和用户体验比以往任何时候都更加重要。',
            'url' => 'https://example.com/seo-2023'
        ]
    ];
    
    // 实际项目中可能使用类似这样的代码:
    /*
    if (isRssFeed($sourceUrl)) {
        $feed = new SimplePie();
        $feed->set_feed_url($sourceUrl);
        $feed->init();
        
        $articles = [];
        foreach ($feed->get_items() as $item) {
            $articles[] = [
                'title' => $item->get_title(),
                'content' => $item->get_content(),
                'url' => $item->get_permalink()
            ];
        }
        return $articles;
    } else {
        // 使用爬虫库如Goutte或SimpleHTMLDOM来爬取网页
        // 解析文章列表和内容
    }
    */
}

// 调用AI API (与api.php中的类似)
function callAIApi($text, $style, $level, $apiKey) {
    // 实际项目中替换为真实的API调用
    
    // 模拟改写
    $rewrittenText = $text;
    if ($style === 'professional') {
        $rewrittenText = "从专业角度来看," . str_replace("改变", "革新", $text);
    } elseif ($style === 'casual') {
        $rewrittenText = "你知道吗?" . str_replace("正在", "正", $text);
    }
    
    if ($level === 'medium') {
        $rewrittenText = str_replace("技术", "科技", $rewrittenText);
    } elseif ($level === 'heavy') {
        $rewrittenText = str_replace("日常生活中", "每天的生活里", $rewrittenText);
    }
    
    return $rewrittenText;
}

// SEO处理函数
function processSEO($content, $keywords, $metaTemplate) {
    // 这里应该有实际的SEO处理逻辑
    
    $topic = "人工智能"; // 实际项目中应从内容中提取
    
    $metaDescription = str_replace(
        ['{TOPIC}', '{KEYWORDS}'],
        [$topic, implode(', ', $keywords)],
        $metaTemplate
    );
    
    return [
        'content' => $content,
        'meta_description' => $metaDescription,
        'title' => generateTitle($topic, $keywords)
    ];
}

// 发布文章函数
function publishArticle($postData) {
    // 这里应该有实际的发布逻辑
    // 可能是通过WordPress REST API、直接数据库插入或其他CMS的API
    
    logMessage("发布文章: " . $postData['title']);
    
    // 模拟发布成功
    return true;
    
    // 实际项目中可能使用类似这样的代码:
    /*
    $wpApiUrl = 'https://your-wordpress-site.com/wp-json/wp/v2/posts';
    $response = wpRemotePost($wpApiUrl, [
        'headers' => [
            'Authorization' => 'Basic ' . base64_encode('username:password')
        ],
        'body' => [
            'title' => $postData['title'],
            'content' => $postData['content'],
            'status' => $postData['status'],
            'meta' => [
                'description' => $postData['meta_description']
            ],
            'categories' => [$postData['category']],
            'tags' => $postData['tags']
        ]
    ]);
    
    return !is_wp_error($response) && $response['response']['code'] === 201;
    */
}

// 记录日志
function logMessage($message) {
    $timestamp = date('Y-m-d H:i:s');
    $logEntry = "[$timestamp] $message\n";
    file_put_contents(CRAWLER_LOG_FILE, $logEntry, FILE_APPEND);
    echo $logEntry;
}

// 辅助函数
function isRssFeed($url) {
    return strpos($url, 'rss') !== false || strpos($url, 'feed') !== false;
}

function generateTitle($topic, $keywords) {
    return $topic . " - " . implode(", ", $keywords);
}
?>

4. 系统配置文件 (config.json)

{
    "sourceUrl": "https://example.com/feed",
    "crawlInterval": 60,
    "aiApiKey": "your-ai-api-key-here",
    "rewriteStyle": "professional",
    "rewriteLevel": "medium",
    "targetKeywords": "AI, 人工智能, 自动化",
    "metaTemplate": "本文探讨了{TOPIC},详细介绍了{KEYWORDS}等相关内容,帮助您更好地理解这一主题。",
    "autoInternalLinking": true,
    "postStatus": "publish",
    "postCategory": "技术文章",
    "postTags": "AI, 自动化, SEO"
}

系统功能说明

  1. 自动采集功能:

    • 支持RSS源或网页URL采集
    • 可配置采集间隔时间
    • 后台持续运行采集服务
  2. AI改写功能:

    • 集成智能AI API进行内容改写
    • 可配置改写风格和程度
    • 支持测试改写效果
  3. SEO处理功能:

    • 自动关键词优化
    • 生成meta描述
    • 内部链接自动添加
    • 标题优化
  4. 自动发布功能:

    • 支持多种发布状态(立即发布、草稿、待审)
    • 可配置默认分类和标签
    • 模拟WordPress等CMS的发布接口

部署说明

  1. 将上述文件上传到Web服务器:

    • index.html (前端控制面板)
    • api.php (后端API)
    • crawler.php (采集器脚本)
    • config.json (配置文件)
  2. 确保PHP环境已安装并配置正确

  3. 修改config.json中的配置:

    • 设置源URL
    • 配置AI API密钥
    • 调整SEO和发布设置
  4. 通过浏览器访问index.html开始使用系统

安全注意事项

  1. 在生产环境中,应添加用户认证系统
  2. 对API调用进行限流和防滥用保护
  3. 敏感配置如API密钥应加密存储
  4. 定期备份配置和采集的数据

扩展建议

  1. 添加多源采集支持
  2. 实现内容去重功能
  3. 增加图片自动处理功能
  4. 添加性能监控和报警系统
  5. 支持更多CMS系统的发布接口

这个系统提供了完整的自动采集、AI改写、SEO处理和发布流程,可以根据实际需求进行定制和扩展。 更多详情:baijiahao.baidu.com/s?id=183050…