复现系列-1：5 分钟实现「视频检索」基于内容理解，无需任何标签今天开始开启一个全新系列，叫《复现》，顾名思义，就是把有

今天开始开启一个全新系列，叫《复现》，顾名思义，就是把有趣的项目复现一下。

网上有很多推文，看着是一步步的教学很详细，但是实际做的时候，就会出现问题，推文里可能有很多代码是抄来抄去的，抄着抄着就错了，导致没法进一步跑通。本系列的目的就是，手动复现这些项目，并一一排雷，纠正错误，最终让项目得以一次性通过。

今天的项目是 5 分钟实现「视频检索」：基于内容理解，无需任何标签 (qq.com)

直接开搞！

前几步没问题的直接过，在mac上进行尝试的需要安装一下 git-lfs

brew install git-lfs
git lfs install

在ubuntu上运行的，就直接按照原教程里的命令来即可

sudo apt-get install git & git-lfs  
git lfs install

来到了 提取特征，导入向量 这一part，根据自己使用的环境，如果是CPU运行的，就将device设置为cpu即可。

device = 'cpu'

如果出现"invalid load key v" 或者 "unpickling"之类的错误，可以看看是不是 towhee 下载operator的时候没下载完，出现异常了，可以去 ~/.towhee 路径，删除里面的文件，重新运行代码，重新下载。这时候如果你没有安装上一步的git-lfs，肯定会失败的。

原来帖子里的代码是有语法错误的，

import os  
import towhee  
  
device = 'cuda:0'  
# device = 'cpu'  
  
# For the first time you run this line,   
# it will take some time   
# because towhee will download operator with weights on backend.  
dc = (  
    towhee.read_csv(test_sample_csv_path).unstream()  
      .runas_op['video_id', 'id'](func=lambda x: int(x[-4:] "'video_id', 'id'"))  
      .video_decode.ffmpeg['video_path', 'frames'](sample_type='uniform_temporal_subsample', args={'num_samples': 12} "'video_path', 'frames'") \  
      .runas_op['frames', 'frames'](func=lambda x: [y for y in x] "'frames', 'frames'") \  
      .video_text_embedding.clip4clip['frames', 'vec'](model_name='clip_vit_b32', modality='video', device=device "'frames', 'vec'") \  
      .to_milvus['id', 'vec'](collection=collection, batch=30 "'id', 'vec'")  
)

应该改为：

import os
import towhee

# device = 'cuda:0'
device = 'cpu'

# For the first time you run this line, 
# it will take some time 
# because towhee will download operator with weights on backend.
dc = (
    towhee.read_csv(test_sample_csv_path).unstream()
      .runas_op['video_id', 'id'](func=lambda x: int(x[-4:]))
      .video_decode.ffmpeg['video_path', 'frames'](sample_type='uniform_temporal_subsample', args={'num_samples': 12})
      .runas_op['frames', 'frames'](func=lambda x: [y for y in x])
      .video_text_embedding.clip4clip['frames', 'vec'](model_name='clip_vit_b32', modality='video', device=device)
      .to_milvus['id', 'vec'](collection=collection, batch=30)
)

系统评估 这一部分，原帖子里的代码也是有语法错误的

dc = (  
    towhee.read_csv(test_sample_csv_path).unstream()  
      .video_text_embedding.clip4clip['sentence','text_vec'](model_name='clip_vit_b32', modality='text', device=device "'sentence','text_vec'")  
      .milvus_search['text_vec', 'top10_raw_res'](collection=collection, limit=10 "'text_vec', 'top10_raw_res'")  
      .runas_op['video_id', 'ground_truth'](func=lambda x : [int(x[-4:] "'video_id', 'ground_truth'")])  
      .runas_op['top10_raw_res', 'top1'](func=lambda res: [x.id for i, x in enumerate(res "'top10_raw_res', 'top1'") if i < 1])  
      .runas_op['top10_raw_res', 'top5'](func=lambda res: [x.id for i, x in enumerate(res "'top10_raw_res', 'top5'") if i < 5])  
      .runas_op['top10_raw_res', 'top10'](func=lambda res: [x.id for i, x in enumerate(res "'top10_raw_res', 'top10'") if i < 10])  
)

应当改为：

dc = (
    towhee.read_csv(test_sample_csv_path).unstream()
      .video_text_embedding.clip4clip['sentence','text_vec'](model_name='clip_vit_b32', modality='text', device=device)
      .milvus_search['text_vec', 'top10_raw_res'](collection=collection, limit=10)
      .runas_op['video_id', 'ground_truth'](func=lambda x : [int(x[-4:])])
      .runas_op['top10_raw_res', 'top1'](func=lambda res: [x.id for i, x in enumerate(res) if i < 1])
      .runas_op['top10_raw_res', 'top5'](func=lambda res: [x.id for i, x in enumerate(res) if i < 5])
      .runas_op['top10_raw_res', 'top10'](func=lambda res: [x.id for i, x in enumerate(res) if i < 10])
)

接下来来到了错误最多的一个部分，在线 Demo

原帖子里的代码是：

milvus_search_function = (  
         api.video_text_embedding.clip4clip(model_name='clip_vit_b32', modality='text', device=device)  
            .milvus_search(collection=collection, limit=show_num)  
            .runas_op(func=lambda res: [os.path.join(raw_video_path, 'video' + str(x.id) + '.mp4') for x in res])  
            .as_function()  
    )
import gradio  
  
interface = gradio.Interface(milvus_search_function,   
                             inputs=[gradio.Textbox()],  
                             outputs=[gradio.Video(format='mp4') for _ in range(show_num)]  
                            )  
  
interface.launch(inline=True, share=True)

这里也是有语法错误，修改之后，正确运行的代码应当是：

import gradio

show_num = 3
with towhee.api() as api:
    milvus_search_function = (
         api.video_text_embedding.clip4clip(model_name='clip_vit_b32', modality='text', device=device)
            .milvus_search(collection=collection, limit=show_num)
            .runas_op(func=lambda res: [os.path.join(raw_video_path, 'video' + str(x.id) + '.mp4') for x in res])
            .as_function()
    )

interface = gradio.Interface(milvus_search_function, 
                             inputs=[gradio.Textbox()],
                             outputs=[gradio.Video(format='mp4') for _ in range(show_num)]
                            )

interface.launch(inline=True, share=False)

仔细对比你会发现，原帖子里并没有定义 api，如果直接照着帖子里的代码来运行，根本不可能跑通。最后一行代码，原帖子是使用的 share=True，如果你只是本地试试，不想公开到网上，可以设置为False。

至此，本项目已经可以完整顺利地跑完了。

原帖子说5分钟实现，我统计了一下，复现加debug所有过程加起来，花了一个多小时。

哦对了，在网上找到了一个[英文版](Search Video by Description (towhee.io))，不知道是谁模仿了谁，不过英文版的bug相对少一些。