kubeovn 热迁移修复

3 阅读6分钟

KubeVirt VM Live Migration LSP Options Fix

Issue Reference

Problem Statement

KubeVirt VM live migration fails with error:

ovs interface xxx is not ready after 30s

Root Cause: During consecutive migrations (e.g., A→B, then B→A), the code may read stale migration state from vmi.Status.MigrationState that belongs to the previous migration. This causes incorrect node detection and skipping of LSP migration options setup.

Evidence from Issue #6220 Logs

# Controller log shows node info from PREVIOUS migration:
status Scheduling, target Node k8s-worker-02, source Node k8s-worker-01

# Code incorrectly determines source == target:
VM pod migration setup skipped, source node: k8s-worker-01, target node: k8s-worker-01

User @SkalaNetworks confirmed: "Kube-OVN thinks the source pod is the destination and the destination pod the source. So it decides to do nothing."

Root Cause Analysis (CORRECTED)

Key Understanding: Where Does Stale Data Come From?

Misconception: I initially thought stale data came from vmiMigration.Status.MigrationState. This is INCORRECT.

Truth: Each migration creates a NEW VirtualMachineInstanceMigration object. The stale data comes from vmi.Status.MigrationState (the VMI object).

KubeVirt Data Flow

ObjectFieldWhen Populated
vmiMigration.Status.MigrationStateSourceNode/TargetNodeONLY after migration completes (copied from vmi when IsFinal)
vmi.Status.MigrationStateSourceNode/TargetNodeDuring MigrationScheduled phase (handleTargetPodHandoff)
vmi.Status.MigrationState.MigrationUIDUIDSet when migration controller starts processing

The Bug: Missing MigrationUID Validation

Master branch code reads from vmi.Status.MigrationState (correct source), BUT it doesn't validate that the MigrationUID matches the current migration:

// Master branch - Missing UID validation
if vmi.Status.MigrationState != nil {
    srcNodeName = vmi.Status.MigrationState.SourceNode   // May be STALE from previous migration!
    targetNodeName = vmi.Status.MigrationState.TargetNode
}

Timeline of Bug:

  1. First migration A→B succeeds, vmi.Status.MigrationState contains {Source: A, Target: B, MigrationUID: uid1}
  2. Second migration B→A starts, new vmiMigration object created with uid2
  3. Before KubeVirt updates vmi.Status.MigrationState with new info...
  4. Kube-OVN reads vmi.Status.MigrationState → gets stale data from first migration!
  5. Code sees stale {Source: A, Target: B}, but VM is now on B!
  6. With incorrect source node, code detects source == targetSKIP!

The Fix: MigrationUID Validation

// Fixed code - Validate MigrationUID before using state
if vmi.Status.MigrationState != nil && vmi.Status.MigrationState.MigrationUID == vmiMigration.UID {
    // Only use vmi.Status.MigrationState if MigrationUID matches current migration
    srcNodeName = vmi.Status.MigrationState.SourceNode
    targetNodeName = vmi.Status.MigrationState.TargetNode
}

This ensures we only use migration state that belongs to the current migration, not stale data from a previous one.

Key Facts Summary

Fact 1️⃣: Each Migration Creates a NEW Migration Object ✅

  • Every migration creates a NEW VirtualMachineInstanceMigration object
  • The new object's Status.MigrationState is initially nil
  • Does NOT inherit data from previous migrations

Fact 2️⃣: Source of Stale Data 🎯

Stale data comes from vmi.Status.MigrationState (the VMI object), NOT from vmiMigration.Status.MigrationState (the Migration object)!

Why?

  • After the first migration succeeds, vmi.Status.MigrationState still contains {Source: A, Target: B, MigrationUID: uid1}
  • When the second migration (B→A) starts, if Kube-OVN reads vmi.Status.MigrationState before KubeVirt updates the VMI status
  • It will read stale data from the previous migration!

Fact 3️⃣: Master Branch Already Reads from Correct Data Source ✅

Master branch already reads from vmi.Status.MigrationState (VMI object), not from vmiMigration.Status.MigrationState.

The Actual Bug 🐛

Master branch lacks MigrationUID validation:

// Master branch - Missing UID validation
if vmi.Status.MigrationState != nil {
    srcNodeName = vmi.Status.MigrationState.SourceNode   // May be STALE from previous migration!
}

The Fix ✅

The fix-migration branch adds MigrationUID validation:

// Fixed - Validate MigrationUID
if vmi.Status.MigrationState != nil &&
   vmi.Status.MigrationState.MigrationUID == vmiMigration.UID {
    // Only use when UID matches
}

This ensures we only use state from the current migration, not stale data from a previous one.

Background

LSP Migration Options in OVN

OVN supports VM live migration through special LSP options:

  • requested-chassis: Specifies which chassis(es) the port should be bound to
  • activation-strategy=rarp: Triggers port activation on the target chassis when RARP packet is received

Kube-OVN LSP Operations

FunctionPurposeLSP Changes
SetLogicalSwitchPortMigrateOptionsMigration startrequested-chassis=src,target, activation-strategy=rarp
ResetLogicalSwitchPortMigrateOptions(failed=false)Migration succeededrequested-chassis=target (removes activation-strategy)
ResetLogicalSwitchPortMigrateOptions(failed=true)Migration failedrequested-chassis=src (removes activation-strategy)
CleanLogicalSwitchPortMigrateOptionsCleanupRemoves all migration options

KubeVirt Migration Internals

Migration Phase Flow

MigrationPhaseUnset
    ↓
MigrationPending          ← Migration object created
    ↓
MigrationScheduling       ← Target pod scheduling starts
    ↓
MigrationScheduled        ← Target pod scheduled, handleTargetPodHandoff() called*** vmi.Status.MigrationState.SourceNode/TargetNode SET HERE ***
MigrationPreparingTarget  ← Target virt-handler preparing
    ↓
MigrationTargetReady      ← Target ready for migration
    ↓
MigrationRunning          ← QEMU migration in progress
    ↓
MigrationSucceeded/Failed ← Final state
                             *** vmiMigration.Status.MigrationState COPIED FROM vmi HERE ***

Key Data Structures

VirtualMachineInstance (VMI)
type VirtualMachineInstanceStatus struct {
    NodeName       string                              // Current node where VMI runs
    MigrationState *VirtualMachineInstanceMigrationState
}

type VirtualMachineInstanceMigrationState struct {
    SourceNode  string     // Set in handleTargetPodHandoff()
    TargetNode  string     // Set in handleTargetPodHandoff()
    TargetPod   string
    MigrationUID types.UID
    StartTimestamp *metav1.Time
    EndTimestamp   *metav1.Time
    Completed   bool
    Failed      bool
    // ...
}
VirtualMachineInstanceMigration (VMIMigration)
type VirtualMachineInstanceMigrationStatus struct {
    Phase          VirtualMachineInstanceMigrationPhase
    MigrationState *VirtualMachineInstanceMigrationState  // Copied from VMI when IsFinal()
}

Critical Timing Analysis

When is vmi.Status.MigrationState.SourceNode/TargetNode set?

In handleTargetPodHandoff() (migration.go#L1183-L1201):

func (c *Controller) handleTargetPodHandoff(migration, vmi, pod) error {
    vmiCopy.Status.MigrationState.TargetNode = pod.Spec.NodeName
    vmiCopy.Status.MigrationState.SourceNode = vmi.Status.NodeName
    // ...
}

This happens during MigrationScheduled phase, when target pod is ready.

When is vmiMigration.Status.MigrationState populated?

In updateStatus() (migration.go#L548-L563):

func (c *Controller) updateStatus(migration, vmi, pods, syncError) error {
    if migration.IsFinal() {
        if vmi.IsMigrationSynchronized(migration) &&
           migration.UID == vmi.Status.MigrationState.MigrationUID {
            // ONLY copied when migration is FINAL (Succeeded/Failed)
            migrationCopy.Status.MigrationState = vmi.Status.MigrationState
        }
    }
}

Key Finding: vmiMigration.Status.MigrationState is ONLY populated after migration completes!

Data Availability Matrix

Phasevmi.Status.MigrationStatevmiMigration.Status.MigrationState
Pendingnil or stalenil
Schedulingnil or stalenil
ScheduledSET (handleTargetPodHandoff)nil
PreparingTargetSETnil
TargetReadySETnil
RunningSETnil
Succeeded/FailedSETCOPIED from vmi

Current Implementation Analysis

Current Code (Simplified Version)

func (c *Controller) handleAddOrUpdateVMIMigration(key string) error {
    vmiMigration, _ := c.config.KubevirtClient.VirtualMachineInstanceMigration(namespace).Get(...)

    if vmiMigration.Status.MigrationState == nil {
        return nil  // Wait for next event
    }

    srcNodeName := vmiMigration.Status.MigrationState.SourceNode
    targetNodeName := vmiMigration.Status.MigrationState.TargetNode

    if srcNodeName == "" || targetNodeName == "" {
        return nil  // Wait for next event
    }

    switch vmiMigration.Status.Phase {
    case MigrationScheduling:
        SetLogicalSwitchPortMigrateOptions(portName, srcNodeName, targetNodeName)
    case MigrationSucceeded:
        ResetLogicalSwitchPortMigrateOptions(portName, srcNodeName, targetNodeName, false)
    case MigrationFailed:
        ResetLogicalSwitchPortMigrateOptions(portName, srcNodeName, targetNodeName, true)
    }
}

Problem with Current Code

MigrationScheduling Phase Issue:

  • At MigrationScheduling, vmiMigration.Status.MigrationState is nil (not yet copied from vmi)
  • Even if not nil, SourceNode/TargetNode would be empty
  • Code returns nil waiting for next event, but the data will NEVER be available until migration completes
  • SetLogicalSwitchPortMigrateOptions will never be called during MigrationScheduling!

MigrationSucceeded/Failed Phase:

  • At final phases, vmiMigration.Status.MigrationState IS available (copied from vmi)
  • Reset operations will work correctly

Why PR #6066 Introduced Pod-based Approach

PR #6066 recognized that during early migration phases, node information is not available in vmiMigration.Status.MigrationState. The solution was to:

  1. Get sourceNode from vmi.Status.NodeName (VMI's current location)
  2. Get targetNode from the target Pod's Spec.NodeName (via label selector)
// PR #6066 approach (simplified)
case MigrationScheduling:
    vmi, _ := c.config.KubevirtClient.VirtualMachineInstance(namespace).Get(vmiName)
    sourceNode := vmi.Status.NodeName

    pods, _ := c.config.KubeClient.CoreV1().Pods(namespace).List(
        ListOptions{LabelSelector: "kubevirt.io/migration-job-name=" + migrationName})
    targetNode := pods[0].Spec.NodeName

    SetLogicalSwitchPortMigrateOptions(portName, sourceNode, targetNode)

Proposed Solutions

Option 1: Restore Pod-based Approach for MigrationScheduling

Pros:

  • Proven to work (was in production via PR #6066)
  • Gets correct node info at the right time

Cons:

  • Additional API calls (get vmi, list pods)
  • More complex code

Option 2: Use Different Phases

Instead of MigrationScheduling, use later phases where vmiMigration.Status.MigrationState is available:

PhaseAction
MigrationScheduled or laterSet (if not already set)
MigrationSucceededReset(false)
MigrationFailedReset(true)

Problem: By MigrationScheduled, traffic should already be able to flow to target. Setting LSP options too late may cause packet loss.

Option 3: Query vmi.Status.MigrationState with MigrationUID Validation (Implemented)

This is the approach implemented in the current fix:

func (c *Controller) handleAddOrUpdateVMIMigration(key string) error {
    vmiMigration, _ := Get(...)
    vmi, _ := Get(vmiName)

    var srcNodeName, targetNodeName string
    // CRITICAL: Only use vmi.Status.MigrationState if MigrationUID matches current migration
    if vmi.Status.MigrationState != nil &&
       vmi.Status.MigrationState.MigrationUID == vmiMigration.UID {
        srcNodeName = vmi.Status.MigrationState.SourceNode
        targetNodeName = vmi.Status.MigrationState.TargetNode
    }

    switch vmiMigration.Status.Phase {
    case MigrationScheduling:
        // For early phases, get target from Pod if MigrationState not yet populated
        if srcNodeName == "" {
            srcNodeName = vmi.Status.NodeName  // VMI's current location
        }
        pods, _ := List(LabelSelector: MigrationJobLabel=vmiMigration.UID)
        targetNode := pods[0].Spec.NodeName

        SetLogicalSwitchPortMigrateOptions(portName, srcNodeName, targetNode)

    case MigrationSucceeded:
        ResetLogicalSwitchPortMigrateOptions(..., false)

    case MigrationFailed:
        ResetLogicalSwitchPortMigrateOptions(..., true)
    }
}

Key Points:

  • MigrationUID validation prevents using stale state from previous migrations
  • During MigrationScheduling, uses Pod label selector to get target node
  • Falls back to vmi.Status.NodeName if SourceNode not yet set

Implementation Summary

The fix implements a multi-layer defense approach:

Layer 1: MigrationUID Validation (kubevirt.go)

Validate that vmi.Status.MigrationState.MigrationUID matches the current migration before using the state:

if vmi.Status.MigrationState != nil && vmi.Status.MigrationState.MigrationUID == vmiMigration.UID {
    // MigrationUID matches - safe to use this state
    migrationStateValid = true
    srcNodeName = vmi.Status.MigrationState.SourceNode
    targetNodeName = vmi.Status.MigrationState.TargetNode
} else {
    // MigrationUID mismatch - state is STALE from previous migration
    // Clean up any residual migrate options immediately
    CleanLogicalSwitchPortMigrateOptions(portName)
}

Layer 2: Stale State Cleanup (kubevirt.go)

When stale state is detected, immediately clean LSP migrate options to prevent inconsistencies:

if vmiMigrationUID != migrationUID {
    klog.Warningf("Migration %s - VMI MigrationState is STALE (UID mismatch), cleaning residual migrate options", key)
    c.OVNNbClient.CleanLogicalSwitchPortMigrateOptions(portName)
}

Layer 3: Conflict Detection (ovn-nb-logical_switch_port.go)

SetLogicalSwitchPortMigrateOptions now validates that no conflicting migration is in progress:

// Check 1: If requested-chassis has two nodes (migration in progress)
if src != "" && target != "" {
    if src != srcNodeName || target != targetNodeName {
        return fmt.Errorf("conflicting migrate options: current=%s,%s but trying to set %s,%s",
            src, target, srcNodeName, targetNodeName)
    }
}

// Check 2: If requested-chassis has single node (previous migration completed)
// The single node must equal the new migration's source
if currentChassis != "" && currentChassis != srcNodeName {
    return fmt.Errorf("inconsistent state: current requested-chassis=%s but new migration source=%s",
        currentChassis, srcNodeName)
}

Layer 4: Pod Deletion Cleanup (pod.go - existing)

When VM Pod is deleted, LSP migrate options are cleaned:

if isVMPod && c.config.EnableKeepVMIP {
    for _, port := range ports {
        c.OVNNbClient.CleanLogicalSwitchPortMigrateOptions(port.Name)
    }
}

Migration Lifecycle Markers

Clear log markers for debugging:

MarkerMeaning
>>> [MIGRATION START]New migration started (MigrationPending)
--- [MIGRATION PROGRESS]Migration in progress (intermediate phases)
>>> [MIGRATION LSP SET]Setting LSP migrate options
<<< [MIGRATION LSP RESET]Resetting LSP migrate options
<<< [MIGRATION END]Migration completed (Succeeded/Failed)

Validation Matrix

Scenariorequested-chassisActionResult
First migration(empty)Set node1,node2✅ Success
Idempotentnode1,node2Set node1,node2✅ Skip (already set)
Migration in progressnode1,node2Set node2,node1❌ Error: conflicting
After migration (consistent)node2Set node2,node1✅ Success (source matches)
After migration (inconsistent)node2Set node1,node3❌ Error: inconsistent state

Testing Considerations

  1. Unit Tests: Mock KubeVirt client responses for different phases
  2. E2E Tests:
    • Single migration: A→B
    • Consecutive migrations: A→B→A (key scenario for this bug)
    • Failed migration recovery
    • VM Pod force deletion and reschedule

How This Fix Solves Issue #6220

The Bug Scenario

1. Migration A (node1 → node2) succeeds
   - vmi.Status.MigrationState = {Source: node1, Target: node2, UID: uid1}
   - LSP requested-chassis = "node2"

2. Migration B (node2 → node1) starts immediately
   - New vmiMigration created with uid2
   - KubeVirt hasn't updated vmi.Status.MigrationState yet
   - Kube-OVN reads STALE state: {Source: node1, Target: node2, UID: uid1}

3. OLD CODE: Uses stale data without validation
   - source=node1, target=node2 (WRONG! should be node2, node1)
   - Detects "source == target" incorrectly → SKIPS LSP setup!

4. RESULT: OVS interface not ready → migration fails

The Fix

1. MigrationUID Validation
   - vmi.Status.MigrationState.MigrationUID (uid1) != vmiMigration.UID (uid2)
   - Stale state detected!

2. Immediate Cleanup
   - CleanLogicalSwitchPortMigrateOptions() called
   - Removes stale requested-chassis

3. Continue with Current Migration
   - Get source from vmi.Status.NodeName = node2 (correct!)
   - Get target from Pod label selector = node1 (correct!)
   - SetLogicalSwitchPortMigrateOptions(node2, node1) ← CORRECT!

4. RESULT: LSP options set correctly → migration succeeds

References

  • KubeVirt migration controller: pkg/virt-controller/watch/migration/migration.go
  • KubeVirt VMI types: staging/src/kubevirt.io/api/core/v1/types.go
  • OVN LSP migration: OVN documentation on live migration
  • PR #6066: Original Pod-based approach for timing issues