解决抓取的微信公众号推文图片资源不显示问题

2,013 阅读2分钟

0. 需求描述

抓取某一公众号下所有的推文并落库
在己方展示推文,保持公众号推文的样式

1. 抓取公众号源数据

  • 接口返回的推文访问地址是一个临时地址,需要帮内容抓取到库里才能永久访问
  • 调用微信公众号推文分页接口,保存微信原始的推文内容和thumb_url到自己的数据库(暂不要替换里面的url)
    • 腾讯推文分页接口有bug,我传递size = 20条,每次取20条,还没有结束接口只返回我19条数据,但是它们腾讯的接口却写的size=20,所以我们判断有没有完成时候不能根据返回的data size来判断,要用是否为空去判断还有没有数据

2. 推文header增加referrer

  • <meta name="referrer" content="never">
  • 用php替换推文header
<?php 
function process($content) {
    $content = str_replace('<head>', '<head><meta name="referrer" content="never">', $content);
    return $content;
}

3. 配置前端访问腾讯图片的代理服务器

server {
        listen 80;
#        listen 443 ssl http2;
# 此域名最好配置前端的域名,如果配置后端接口域名需要两次代理
        server_name  {你的前端域名};
#        ssl_certificate  /root/.acme.sh/admin.test.ceibsmoment.com/admin.test.ceibsmoment.com.cer;
#        ssl_certificate_key /root/.acme.sh/admin.test.ceibsmoment.com/admin.test.ceibsmoment.com.key;
#        ssl_session_timeout 5m;
#        ssl_protocols  TLSv1 TLSv1.1 TLSv1.2;
#        ssl_ciphers       EECDH+CHACHA20:EECDH+CHACHA20-draft:EECDH+AES128:RSA+AES128:EECDH+AES256:RSA+AES256:EECDH+3DES:RSA+3DES:!MD5;
#        ssl_prefer_server_ciphers on;

        error_log /opt/log/nginx/web.ceibsmoment.com.error.log;
        access_log /opt/log/nginx/web.ceibsmoment.com.access.log main;


        add_header Access-Control-Allow-Origin "*";
        add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
        add_header 'Access-Control-Allow-Headers' 'DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization';
        add_header 'Access-Control-Allow-Credentials' 'true';

                                             
        location / {
              root /opt/www/mp-website/;
              index   index.php index.html index.htm;
        }

        location ^~ /sz_mmbiz_jpg {
            proxy_connect_timeout 30;
            proxy_send_timeout 30;
            proxy_read_timeout 30;
            proxy_pass https://mmbiz.qpic.cn;
            index  index.html;
        }

        location ^~ /sz_mmbiz_gif {
            proxy_connect_timeout 30;
            proxy_send_timeout 30;
            proxy_read_timeout 30;
            proxy_pass https://mmbiz.qpic.cn;
            index  index.html;
        }


        location ^~ /sz_mmbiz_png {
            proxy_connect_timeout 30;
            proxy_send_timeout 30;
            proxy_read_timeout 30;
            proxy_pass https://mmbiz.qpic.cn;
            index  index.html;
        }


        location ^~ /mmbiz_png {
            proxy_connect_timeout 30;
            proxy_send_timeout 30;
            proxy_read_timeout 30;
            proxy_pass https://mmbiz.qpic.cn;
            index  index.html;
        }


        location ^~ /mmbiz_jpg {
            proxy_connect_timeout 30;
            proxy_send_timeout 30;
            proxy_read_timeout 30;
            proxy_pass https://mmbiz.qpic.cn;
            index  index.html;
        }

        location ^~ /mmbiz_gif {
            proxy_connect_timeout 30;
            proxy_send_timeout 30;
            proxy_read_timeout 30;
            proxy_pass https://mmbiz.qpic.cn;
            index  index.html;
        }



        location ^~ /mp {
            proxy_connect_timeout 30;
            proxy_send_timeout 30;
            proxy_read_timeout 30;
            proxy_pass https://mp.weixin.qq.com;
            index  index.html;
        }

        location ^~ /proxy/url {
            proxy_connect_timeout 30;
            proxy_send_timeout 30;
            proxy_read_timeout 30;
            proxy_pass https://mp.weixin.qq.com/s;
            index  index.html;
        }
}

4. 接口返回推文内容及thumb_url时替换腾讯图片地址为代理服务器地址

<?php

function replaceUrl($content = "") {
   $content = str_replace('https://mmbiz.qpic.cn/mmbiz_jpg', env('WECHAT_MP_PROXY_GPJ_URL'), $content);
   $content = str_replace('http://mmbiz.qpic.cn/mmbiz_jpg', env('WECHAT_MP_PROXY_GPJ_URL'), $content);
   $content = str_replace('https://mmbiz.qpic.cn/mmbiz_png', env('WECHAT_MP_PROXY_GPJ_PNG'), $content);
   $content = str_replace('http://mmbiz.qpic.cn/mmbiz_png', env('WECHAT_MP_PROXY_GPJ_PNG'), $content);
   $content = str_replace('https://mmbiz.qpic.cn/mmbiz_gif', env('WECHAT_MP_PROXY_GPJ_GIF'), $content);
   $content = str_replace('http://mmbiz.qpic.cn/mmbiz_gif', env('WECHAT_MP_PROXY_GPJ_GIF'), $content);
   $content = str_replace('https://mmbiz.qpic.cn/sz_mmbiz_jpg', env('WECHAT_MP_PROXY_GPJ_SZ_URL'), $content);
   $content = str_replace('http://mmbiz.qpic.cn/sz_mmbiz_jpg', env('WECHAT_MP_PROXY_GPJ_SZ_URL'), $content)
   $content = str_replace('https://mmbiz.qpic.cn/sz_mmbiz_png', env('WECHAT_MP_PROXY_GPJ_SZ_PNG'), $content);
   $content = str_replace('http://mmbiz.qpic.cn/sz_mmbiz_png', env('WECHAT_MP_PROXY_GPJ_SZ_PNG'), $content);
   $content = str_replace('https://mmbiz.qpic.cn/sz_mmbiz_gif', env('WECHAT_MP_PROXY_GPJ_SZ_GIF'), $content);
   $content = str_replace('http://mmbiz.qpic.cn/sz_mmbiz_gif', env('WECHAT_MP_PROXY_GPJ_SZ_GIF'), $content);
   $content = str_replace('https://mp.weixin.qq.com/s', env('WECHAT_MP_PROXY_S_URL'), $content);
   
   return $content;
}

注意事项

  • 前端在只展示微信推文列表时,需要在前端页面header注入<meta name="referrer" content="never">
  • 前端在显示推文内容时应该以iframe形式嵌入推文内容,因为推文内容中包含一些懒加载的js效果