kube-ovn dpdk kubevirt 问题梳理

206 阅读3分钟

1. 目前遇到一个环境 虚拟机使用 dpdk 后,无法收发包

问题表现:

虚拟机内部 ping 网关 ping 不通,

ovs-vsctl list interface 看到的 dpdk vhost unix sock 对应的 port 基于 ovs-tcpdump 根本就抓不到包,但是 ovs 服务正常,流表正常。

在 虚拟机内部 保持 ping 网关,但是 ping 不通, 而且 ifconfig 根本就没有 错误计数,错误计数始终为 0

目前使用的镜像是: docker pull 599230270/kube-ovn:v1.12.0-dpdk-no-avx512

该镜像是 Fedora 构建的, Fedora 本身是不支持 avx512 的。 由于 kubevirt 也是 红帽的,基于这个原因,不使用 avx512 估计是为了避坑

Fedora 是商业化的Red Hat Enterprise Linux发行版的上游源码。

该镜像是在 Fedora 支持多队列网卡的虚拟机中构建的,而我的宿主机是 基于 centos 的。 本身也是比较接近的。

centos fedora redhat 三者之间的关系:

RedHat Enterprise 发行之前:

image.png

RedHat Enterprise 发行后, 4.0 版本开始:

image.png

CentOS 停止,centos stream 发起之后:

image.png

参考: juejin.cn/post/724519…

fedora 作为 centos stream 的上游,编译出来的东西, centos 应该也是能用的。

1.1 目前存在两个环境 kube-ovn ovs-dpdk 版本是一致的,但是有一个环境不通。

通过对比 xml 发现 是没有配置共享内存,导致 dpdk vhost 基于 共享内存转发包,导致网络不通。

在定位中 发现 dpdk vhost unix sock 文件 和 linux 的 内核态 unix sock 存在不同。基于内核态的编码抓包是抓不到 dpdk vhost unix sock io 的数据流的

# 好的 虚拟机的 xml


<domain type='kvm' id='1'>
  <name>default_vm-fedora-dpdk-1</name>
  <uuid>83d8713e-a51a-5d06-ab4e-f11a670b5e98</uuid>
  <metadata>
    <kubevirt xmlns="http://kubevirt.io">
      <uid/>
    </kubevirt>
  </metadata>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <hugepages/>
    <source type='memfd'/> # 这里有配置共享内存
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <iothreads>1</iothreads>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>KubeVirt</entry>
      <entry name='product'>None</entry>
      <entry name='uuid'>83d8713e-a51a-5d06-ab4e-f11a670b5e98</entry>
      <entry name='family'>KubeVirt</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-q35-rhel9.2.0'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Broadwell</model>
    <vendor>Intel</vendor>
    <topology sockets='1' dies='1' cores='4' threads='1'/>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='umip'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='abm'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='require' name='pschange-mc-no'/>
    <numa>
      <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk' model='virtio-non-transitional'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' discard='unmap'/>
      <source file='/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2' index='1'/>
      <backingStore type='file' index='2'>
        <format type='qcow2'/>
        <source file='/var/run/kubevirt/container-disks/disk_0.img'/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='ua-containerdisk'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='none'>
      <alias name='usb'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-non-transitional'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0' model='virtio-non-transitional'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x16'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x17'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x18'/>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='vhostuser'>
      <mac address='00:00:00:2c:ac:b6'/>
      <source type='unix' path='/var/run/vm/dpdk/pod6c270ef2f25' mode='server'/>
      <model type='virtio-non-transitional'/>
      <driver name='vhost' queues='4' rx_queue_size='1024' tx_queue_size='1024'/>
      <alias name='ua-net1'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='unix'>
      <source mode='bind' path='/var/run/kubevirt-private/5950bc0a-fde5-4b96-9958-e7ad9b87e11e/virt-serial0'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='unix'>
      <source mode='bind' path='/var/run/kubevirt-private/5950bc0a-fde5-4b96-9958-e7ad9b87e11e/virt-serial0'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-1-default_vm-fedora-dp/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <graphics type='vnc' socket='/var/run/kubevirt-private/5950bc0a-fde5-4b96-9958-e7ad9b87e11e/virt-vnc' sharePolicy='ignore'>
      <listen type='socket' socket='/var/run/kubevirt-private/5950bc0a-fde5-4b96-9958-e7ad9b87e11e/virt-vnc'/>
    </graphics>
    <audio id='1' type='none'/>
    <video>
      <model type='vga' vram='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <memballoon model='virtio-non-transitional' freePageReporting='off'>
      <stats period='10'/>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

# 坏的 虚拟机的 xml

<domain type='kvm' id='1'>
  <name>default_vm-dpdk</name>
  <uuid>662f44f6-ba16-5d08-95e7-75b1ca2ff10e</uuid>
  <metadata>
    <kubevirt xmlns="http://kubevirt.io">
      <uid/>
    </kubevirt>
  </metadata>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memoryBacking>
    <hugepages/> # 没有配置共享内存,导致包转发无法使用共享内存
  </memoryBacking>
  <vcpu placement='static'>2</vcpu>
  <iothreads>1</iothreads>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>KubeVirt</entry>
      <entry name='product'>None</entry>
      <entry name='uuid'>662f44f6-ba16-5d08-95e7-75b1ca2ff10e</entry>
      <entry name='family'>KubeVirt</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-q35-rhel9.2.0'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Cascadelake-Server</model>
    <vendor>Intel</vendor>
    <topology sockets='1' dies='1' cores='2' threads='1'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='umip'/>
    <feature policy='require' name='pku'/>
    <feature policy='require' name='md-clear'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='xsaves'/>
    <feature policy='require' name='rdctl-no'/>
    <feature policy='require' name='ibrs-all'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='require' name='mds-no'/>
    <feature policy='require' name='pschange-mc-no'/>
    <feature policy='disable' name='erms'/>
    <feature policy='disable' name='avx512vnni'/>
    <feature policy='disable' name='xgetbv1'/>
    <feature policy='disable' name='pdpe1gb'/>
    <feature policy='disable' name='mpx'/>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='block' device='disk' model='virtio-non-transitional'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native' discard='unmap'/>
      <source dev='/dev/root-device' index='2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='ua-root-device'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='disk' model='virtio-non-transitional'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' discard='unmap'/>
      <source file='/var/run/kubevirt-ephemeral-disks/cloud-init-data/default/vm-dpdk/configdrive.iso' index='1'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <alias name='ua-cloudinitdisk'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='none'>
      <alias name='usb'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-non-transitional'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0' model='virtio-non-transitional'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x16'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x17'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <interface type='vhostuser'>
      <mac address='00:00:00:9d:24:bf'/>
      <source type='unix' path='/var/run/kubevirt-private/dpdk01/sock' mode='server'/>
      <model type='virtio-non-transitional'/>
      <driver>
        <host mrg_rxbuf='off'/>
      </driver>
      <alias name='ua-dpdk01'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='00:00:00:fa:29:11'/>
      <source type='unix' path='/var/run/kubevirt-private/dpdk02/sock' mode='server'/>
      <model type='virtio-non-transitional'/>
      <driver>
        <host mrg_rxbuf='off'/>
      </driver>
      <alias name='ua-dpdk02'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </interface>
    <serial type='unix'>
      <source mode='bind' path='/var/run/kubevirt-private/ca990405-9b47-4c8f-bc3a-cb63a4700980/virt-serial0'/>
      <log file='/var/run/kubevirt-private/ca990405-9b47-4c8f-bc3a-cb63a4700980/virt-consolelog0.log' append='off'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='unix'>
      <source mode='bind' path='/var/run/kubevirt-private/ca990405-9b47-4c8f-bc3a-cb63a4700980/virt-serial0'/>
      <log file='/var/run/kubevirt-private/ca990405-9b47-4c8f-bc3a-cb63a4700980/virt-consolelog0.log' append='off'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-1-default_vm-dpdk/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <graphics type='vnc' socket='/var/run/kubevirt-private/ca990405-9b47-4c8f-bc3a-cb63a4700980/virt-vnc'>
      <listen type='socket' socket='/var/run/kubevirt-private/ca990405-9b47-4c8f-bc3a-cb63a4700980/virt-vnc'/>
    </graphics>
    <audio id='1' type='none'/>
    <video>
      <model type='vga' vram='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <memballoon model='virtio-non-transitional'>
      <stats period='10'/>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>


libvirt内存备份有三种类型:file、anonymous和memfd。它们的详细区别如下:

  • file:将内存备份保存在文件中。这种方式可以通过文件共享来实现内存共享。
  • anonymous:默认的内存备份类型。它将内存备份保存在匿名内存中,无法直接共享。
  • memfd:使用Linux的memfd机制来创建内存备份,可以通过文件描述符共享内存。

这些备份类型的选择取决于具体的需求和环境。