对比迭代器接口理解C#的yield语法糖,轻松实现复杂迭代逻辑

225 阅读6分钟

以前写python时了解过yield语法,但是没有太深入的理解,现在写C#,发现C#中也有一样的用法,故研究研究

然后学习网上的文章,大概提到两方面

  1. 能实现懒加载,节约性能
  2. 作为迭代器接口的语法糖,能简化代码
  • 关于第一点,其实讨论的不是yield的特性,而是迭代器接口相对于集合的特性,不用yield,用迭代器也一样能实现懒加载,即便是需要异步迭代,也一样可以实现IAsyncEnumerable接口,不是非得用yield
  • 关于第二点,到底什么使用场景yield有明显优势呢?下面就来讨论下~

原理

yield关键字是个迭代器接口的语法糖,不使用yield时,要实现一个迭代器接口,需要以下工作:

  1. IEnumerator<class>接口,需要实现MoveNext()CurrentReset()Dispose()方法
  2. 将状态显式地放在类字段中管理

且如果要用foreach进行遍历的话,还需再包一层IEnumerable<class>接口的实现,所以代码量比较大

使用yield关键字后,可以将上述工作简化到一个函数中,在函数中,编写多个yield return

  1. 每两个yield return之间的代码等效于依次调用原有迭代器接口中MoveNext()所进行的状态变更
  2. yield return所返回的结果相当于原有迭代器接口中Current方法(get属性)返回的状态值
  3. 函数在遇到yield return返回后,函数状态会得到保留,因此状态管理由函数的局部变量进行隐式实现
  4. 走完最后一个yield return后,相当于原有迭代器接口MoveNext()方法返回false,提示迭代结束
  5. 最后一个yield return之后,可以写原有迭代器接口Dispose()的逻辑,进行资源释放

注意点

yield关键字不支持Reset()方法,无法进行状态归重置,会抛出NotSupportedException异常

举例说明yield的在复杂迭代逻辑中的优势

yield关键字的优点在于:

  1. 能减少迭代器实现的模板代码数量
  2. 由于 状态改变的逻辑状态返回值(状态改变的结果) 这两个关系紧密的概念是连在一起写的,因此编写和阅读代码时,是顺直觉的
  3. 隐式管理状态也能减少很多中间状态变量

第2、3点优势在实现复杂,不规律的迭代逻辑时,能够很大程度上简化代码,大大提高可读性和可扩展性

举例说明:

  • 现在假如有这样一个需求:给定迭代次数,依次输出一个矩阵的列号、行号和计数,例如迭代17次,每列有5行:
COL: [01] ROW: [01] COUNT: [01]
COL: [01] ROW: [02] COUNT: [02]
COL: [01] ROW: [03] COUNT: [03]
COL: [01] ROW: [04] COUNT: [04]
COL: [01] ROW: [05] COUNT: [05]
COL: [01] ROW: [01] COUNT: [06]
COL: [01] ROW: [02] COUNT: [07]
COL: [01] ROW: [03] COUNT: [08]
COL: [01] ROW: [04] COUNT: [09]
COL: [01] ROW: [05] COUNT: [10]
COL: [01] ROW: [01] COUNT: [11]
COL: [01] ROW: [02] COUNT: [12]
COL: [01] ROW: [03] COUNT: [13]
COL: [01] ROW: [04] COUNT: [14]
COL: [01] ROW: [05] COUNT: [15]
COL: [01] ROW: [01] COUNT: [16]
COL: [01] ROW: [02] COUNT: [17]

这个如果用迭代器,可以这么实现:

public class MyEnumerable : IEnumerable<string>
{
    private readonly int _eleNum;
    private readonly int _colEleNum;

    public MyEnumerableSimple(int eleNum, int colEleNum)
    {
        _eleNum = eleNum;
        _colEleNum = colEleNum;
    }

    public IEnumerator<string> GetEnumerator()
    {
        return new MyEnumerator(_eleNum, _colEleNum);
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return new MyEnumerator(_eleNum, _colEleNum);
    }

    private class MyEnumerator : IEnumerator<string>
    {
        private readonly int _eleNum;
        private readonly int _colEleNum;
        private int _count = 0;
        private int _col = 1;
        private int _row = 0;

        public string Current => $"COL: [{_col:D2}] ROW: [{_row:D2}] COUNT: [{_count:D2}]";
        object IEnumerator.Current => $"COL: [{_col:D2}] ROW: [{_row:D2}] COUNT: [{_count:D2}]";

        public MyEnumerator(int eleNum, int colEleNum)
        {
            _eleNum = eleNum;
            _colEleNum = colEleNum;
        }

        public void Dispose()
        {
            Console.WriteLine("释放资源...");
        }

        public bool MoveNext()
        {
            _count++;
            if (_count > _eleNum)
                return false;

            _row++;
            if (_row > _colEleNum)
            {
                _row = 1;
                _col++;
            }
            return true;
        }

        // yield语法糖不支持Reset,故这里也不写了
        public void Reset()
        {
            throw new NotSupportedException();
        }
    }
}

虽然有一大堆模板语法,但是逻辑看起来还是比较简单对吧

那么,如果加一个需求,需要在迭代之后再输出一行COMPLETE呢,像这样:

COL: [01] ROW: [01] COUNT: [01]
COL: [01] ROW: [02] COUNT: [02]
COL: [01] ROW: [03] COUNT: [03]
COL: [01] ROW: [04] COUNT: [04]
COL: [01] ROW: [05] COUNT: [05]
COL: [01] ROW: [01] COUNT: [06]
COL: [01] ROW: [02] COUNT: [07]
COL: [01] ROW: [03] COUNT: [08]
COL: [01] ROW: [04] COUNT: [09]
COL: [01] ROW: [05] COUNT: [10]
COL: [01] ROW: [01] COUNT: [11]
COL: [01] ROW: [02] COUNT: [12]
COL: [01] ROW: [03] COUNT: [13]
COL: [01] ROW: [04] COUNT: [14]
COL: [01] ROW: [05] COUNT: [15]
COL: [01] ROW: [01] COUNT: [16]
COL: [01] ROW: [02] COUNT: [17]
COMPLETE

可以对IEnumerator接口这么进行修改:

    private class MyEnumerator : IEnumerator<string>
    {
        private readonly int _eleNum;
        private readonly int _colEleNum;
        private int _count = 0;
        private int _col = 1;
        private int _row = 0;

        public string Current => _count <= _eleNum ? $"COL: [{_col:D2}] ROW: [{_row:D2}] COUNT: [{_count:D2}]" : "COMPLETE";
        object IEnumerator.Current => _count <= _eleNum ? $"COL: [{_col:D2}] ROW: [{_row:D2}] COUNT: [{_count:D2}]" : "COMPLETE";

        public MyEnumerator(int eleNum, int colEleNum)
        {
            _eleNum = eleNum;
            _colEleNum = colEleNum;
        }

        public void Dispose()
        {
            Console.WriteLine("释放资源...");
        }

        public bool MoveNext()
        {
            _count++;
            if (_count > _eleNum + 1)
                return false;

            _row++;
            if (_row > _colEleNum)
            {
                _row = 1;
                _col++;
            }
            return true;
        }

        // yield语法糖不支持Reset,故这里也不写了
        public void Reset()
        {
            throw new NotSupportedException();
        }
    }

Current属性加了个三元表达式,然后稍MoveNext()的退出判断条件加了一位,相当于打了个补丁,还可以接受

那么,如果再上点强度,要每次列变更的时候,输出一个CHANGE COL呢:

COL: [01] ROW: [01] COUNT: [01]
COL: [01] ROW: [02] COUNT: [02]
COL: [01] ROW: [03] COUNT: [03]
COL: [01] ROW: [04] COUNT: [04]
COL: [01] ROW: [05] COUNT: [05]
COL CHANGE
COL: [01] ROW: [01] COUNT: [06]
COL: [01] ROW: [02] COUNT: [07]
COL: [01] ROW: [03] COUNT: [08]
COL: [01] ROW: [04] COUNT: [09]
COL: [01] ROW: [05] COUNT: [10]
COL CHANGE
COL: [01] ROW: [01] COUNT: [11]
COL: [01] ROW: [02] COUNT: [12]
COL: [01] ROW: [03] COUNT: [13]
COL: [01] ROW: [04] COUNT: [14]
COL: [01] ROW: [05] COUNT: [15]
COL CHANGE
COL: [01] ROW: [01] COUNT: [16]
COL: [01] ROW: [02] COUNT: [17]
COMPLETE

就得这么写了:

    private class MyEnumerator : IEnumerator<string>
    {
        private readonly int _eleNum;
        private readonly int _colEleNum;
        private string _curVal;
        private int _count = 0;
        private int _col = 1;
        private int _row = 1;
        private bool _printColChange = false;

        public string Current => _curVal;
        object IEnumerator.Current => _curVal;

        public MyEnumerator(int eleNum, int colEleNum)
        {
            _eleNum = eleNum;
            _colEleNum = colEleNum;
        }

        public void Dispose()
        {
            Console.WriteLine("释放资源...");
        }

        public bool MoveNext()
        {
            _count++;
            if (_count > _eleNum + 1)
                return false;

            if (_count == _eleNum + 1)
            {
                _curVal = "COMPLETE";
            }
            else
            {
                _curVal = $"COL: [{_col:D2}] ROW: [{_row:D2}] COUNT: [{_count:D2}]";

                _row++;
                if (_row > _colEleNum)
                {
                    if (!_printColChange)
                    {
                        _printColChange = true;
                        // 补偿_count
                        _count--;
                    }
                    else
                    {
                        _curVal = "CHANGE COL";
                        _printColChange = false;
                        _row = 1;
                        _col++;
                    }
                }
            }

            // 判断是否打印"COL CHANGE"

            return true;
        }

        // yield语法糖不支持Reset,故这里也不写了
        public void Reset()
        {
            throw new NotSupportedException();
        }
    }

看看修改了什么:

  1. Current属性由于有好几种形式了,再用三元表达式可读性就不好了,因此引入中间变量_curVal记录输出,和_printColChange记录是否输出过COL CHANGE,在MoveNext()修改这个中间变量
  2. 由于上述逻辑改变,状态变量_row的初始值得改变
  3. 为了正确维护更多的中间变量,MoveNext()加了好几层嵌套,写起来容易错,读起来也比较难看懂

可见,使用传统的迭代器接口,对于这种复杂点的扩展,需要增加大量变量和逻辑分支(也许有更优雅的写法,但是写起来心智负担也比较大)

那么如果用yield的形式呢?

初始需求可以这么写:

    static IEnumerable<string> GetEnumerable(int eleNum, int colEleNum)
    {
        var count = 1;
        var col = 1;

        while (true)
        {
            foreach (var row in Enumerable.Range(1, colEleNum))
            {
                yield return $"COL: [{col:D2}] ROW: [{row:D2}] COUNT: [{count++:D2}]";
                if (count > eleNum)
                {
                    Console.WriteLine("释放资源...");
                    yield break;
                }
            }
        }
    }

想要加一个COMPLETE输出,加一行就行了:

    static IEnumerable<string> GetEnumerable(int eleNum, int colEleNum)
    {
        var count = 1;
        var col = 1;

        while (true)
        {
            foreach (var row in Enumerable.Range(1, colEleNum))
            {
                yield return $"COL: [{col:D2}] ROW: [{row:D2}] COUNT: [{count++:D2}]";
                if (count > eleNum)
                {
                    yield return "COMPLETE"; // 加一行
                    Console.WriteLine("释放资源...");
                    yield break;
                }
            }
        }
    }

想要再输出个COL CHANGE,也加一行就行了:

    static IEnumerable<string> GetEnumerable(int eleNum, int colEleNum)
    {
        var count = 1;
        var col = 1;

        while (true)
        {
            foreach (var row in Enumerable.Range(1, colEleNum))
            {
                yield return $"COL: [{col:D2}] ROW: [{row:D2}] COUNT: [{count++:D2}]";
                if (count > eleNum)
                {
                    yield return "COMPLETE";
                    Console.WriteLine("释放资源...");
                    yield break;
                }
            }
            yield return $"COL CHANGE"; // 加一行
        }
    }

优雅又没有心智负担~

总结

yield模式相比于传统的迭代器接口,可以灵活增加和变更迭代中的逻辑分支,减少了状态变量的声明和维护,在一些场合还是挺好用的~