Haskell中的类型家族——绝对的指南

类型族是Haskell中最强大的类型级编程特性之一。你可以把它们看作是类型级的函数，但这并没有真正涵盖整个画面。在本文结束时，你将知道它们到底是什么以及如何使用它们。

类型构造器的风格

在Haskell中，一个给定的类型构造器T ，有几个类别可以归属：

data T a b = ... -- data type
newtype T a b = ... -- newtype
class T a b where ... -- type class
type T a b = ... -- type synonym

TypeFamilies 扩展引入了另外两个类别：

type family T a b where ... -- type family
data family T a b = ... -- data family

类型族被进一步细分为封闭式和开放式，开放式类型族可以是顶层的，也可以是与类相关的：

类型族	顶层的	关联的
开放式	✔️	✔️
关闭	✔️	❌

数据族总是开放的，但也可以是顶层的或关联的：

数据族	顶层的	关联的
开放的	✔️	✔️
关闭	❌	❌

尽管这些类型构造器的种类繁多，一开始可能会让人不知所措，但每种类型都有有效的用例，而且有一些共同的原则支撑着它们。

让我们先谈谈闭合类型族，因为它们与另一个著名的概念：函数相似。

闭合类型族

Closed type families

由Impure Pics制造。

在术语层面，当我们需要进行计算时，我们会定义函数。例如，这里是列表连接：

append :: forall a. [a] -> [a] -> [a]    -- type signature
append []     ys = ys                    -- clause 1
append (x:xs) ys = x : append xs ys      -- clause 2

在类型层面上，我们使用封闭类型族来定义这种计算：

type Append :: forall a. [a] -> [a] -> [a]  -- kind signature
type family Append xs ys where              -- header
  Append '[]    ys = ys                     -- clause 1
  Append (x:xs) ys = x : Append xs ys       -- clause 2

这里有一个GHCi会话来演示我们如何使用这两种方法：

ghci> append [1, 2, 3] [4, 5, 6]
[1,2,3,4,5,6]
ghci> :kind! Append [1, 2, 3] [4, 5, 6]
Append [1, 2, 3] [4, 5, 6] :: [Nat]
= '[1, 2, 3, 4, 5, 6]

虽然这两个定义之间有惊人的相似性，但也有一些不同：

与其他类型构造函数一样，类型族的名称必须以大写字母开头，因此Append ，而不是append 。
我们不使用类型签名，而是使用独立的类型签名，它必须以type 关键字为前缀（由StandaloneKindSignatures 扩展启用）。
nil构造函数[] 被写成'[] ，以区别于列表类型构造函数[] 。这个怪癖是由于DataKinds 的工作方式造成的，与类型族本身没有关系。
类型族的子句被分组在类型族的标头下，而术语级函数没有标头。

头部可能是这里最显著的区别，为了理解它的重要性，我们必须首先讨论arity的概念。在这之前，我们先来看看几个封闭类型族的例子，以熟悉其语法。

术语级函数	闭合类型族

not :: Bool -> Bool
not True = False
not False = True

type Not :: Bool -> Bool
type family Not a where
  Not True = False
  Not False = True

fromMaybe :: a -> Maybe a -> a
fromMaybe d Nothing = d
fromMaybe _ (Just x) = x

type FromMaybe :: a -> Maybe a -> a
type family FromMaybe d x where
  FromMaybe d Nothing = d
  FromMaybe _ (Just x) = x

fst :: (a, b) -> a
fst (x, _) = x

type Fst :: (a, b) -> a
type family Fst t where
  Fst '(x, _) = x

练习：使用:kind! 命令将这些类型族应用于GHCi中的各种输入，并将输出结果与你的期望值进行比较。

类型构造函数的arity

类型构造函数的arity是它在使用地点所需要的参数数量。当我们使用高类型的类型时，它就会发挥作用：

type S :: (Type -> Type) -> Type
data S k = MkS (k Bool) (k Integer)

现在，什么构成了S 的有效参数？人们可能会认为，任何类型的构造函数Type -> Type ，都可以用在这里。让我们试一试：

MkS (Just True) Nothing :: S **Maybe**
MkS (Left “Hi”) (Right 42) :: S (**Either String**)
MkS (Identity False) (Identity 0) :: S **Identity**

所以Maybe,Either String, 和Identity 都能正常工作。但是一个类型的同义词呢？

type Pair :: Type -> Type
type Pair a = (a, a)

从独立的类型签名中，我们看到它有适当的类型Type -> Type 。GHCi也确认了这一点：

ghci> :kind Pair
Pair :: Type -> Type

然而，任何使用S Pair 的尝试都是不成功的：

ghci> MkS (True, False) (0, 1) :: S Pair
<interactive>:6:29: error:
    • The type synonym 'Pair' should have 1 argument,
      but has been given none

由于GHC类型系统的某些设计决定，类型同义词不能被部分应用。在Pair 的情况下，我们说它的arity是1，因为它需要一个参数。Pair Bool,Pair Integer, 和Pair String 都是可以的。另一方面，S Pair 或Functor Pair 则不行。一个类型构造函数的使用在满足其arity要求的情况下被称为饱和的，否则就是不饱和的。

请注意，我们只需要类型构造函数的arity概念，这些构造函数在应用于一个参数时可以还原为其他类型。例如，Pair Bool 不仅等于它自己，也等于(Bool, Bool) ：

Pair Bool ~ Pair Bool -- reflexivity
Pair Bool ~ (Bool, Bool) -- reduction

另一方面，Maybe Bool 只等于它自己：

Maybe Bool ~ Maybe Bool -- reflexivity

因此我们称Maybe 为生成型类型构造函数，而Pair 为非生成型。

非生成类型构造函数有分配给它们的算数，并且必须被饱和使用。生成式类型构造函数不受这种限制，所以我们不对它们应用arity的概念。

类型族的应用也可以还原为其他类型：

Append [1,2] [3,4] ~ Append [1,2] [3,4] -- reflexivity
Append [1,2] [3,4] ~ [1, 2, 3, 4] -- reduction

因此，它们是非生成性的，并且有分配给它们的arity。arity是在定义现场通过考虑类型签名和头来确定的：

type Append :: forall a. [a] -> [a] -> [a]
type family Append xs ys where

在标头中，我们有Append xs ys ，而不是Append xs ，或者干脆是Append 。因此，第一眼看上去，Append 的arity是2。然而，我们还必须考虑到forall-bound变量a 。事实上，即使你写成Append [1,2] [3,4] ，在内部也会变成Append @Nat [1,2] [3,4] 。因此，Append 的 Arity 是 3。

即使我们没有明确写出forall ，这也是正确的：

type Append :: [a] -> [a] -> [a]
type family Append xs ys where

好吧，那么为什么标题很重要呢？难道我们不能通过计算种类签名中的量词来推导出arity吗？嗯，这在大多数情况下是可行的，但这里有一个有趣的反例：

type MaybeIf :: Bool -> Type -> Type
type family MaybeIf b t where
  MaybeIf True  t = Maybe t
  MaybeIf False t = Identity t

这个定义被赋予了2的arity，我们可以通过把它应用于两个参数来使用它：

data PlayerInfo b =
  MkPlayerInfo { name  :: MaybeIf b String,
                 score :: MaybeIf b Integer }

这在使用数据库时可能很有用。当读取一个球员的记录时，我们希望所有的字段都存在，但数据库的更新可能只触及其中的一些字段：

dbReadPlayerInfo :: IO (PlayerInfo False)
dbUpdatePlayerInfo :: PlayerInfo True -> IO ()

在PlayerInfo False ，字段被简单地包裹在Identity中，例如：MkPlayerInfo { name = Identity "Jack", score = Identity 8 } 。在PlayerInfo True 中，字段被包裹在Maybe中，因此可以是Nothing，例如：MkPlayerInfo { name = Nothing, score = Just 10 } 。

然而，MaybeIf 不能被传递给S：

ghci> newtype AuxInfo b = MkAuxInfo (S (MaybeIf b))
<interactive>:33:21: error:
    • The type family 'MaybeIf' should have 2 arguments,
      but has been given 1
    • In the definition of data constructor 'MkAuxInfo'
      In the newtype declaration for 'AuxInfo'

幸运的是，这个问题通过对MaybeIf 的定义做一个小的调整就可以解决了：

type MaybeIf :: Bool -> Type -> Type
type family MaybeIf b where
  MaybeIf True  = Maybe
  MaybeIf False = Identity

请注意种类签名没有变化，但是t 参数被从标题和子句中移除。通过这样的调整，MaybeIf 的arity变成了1，而AuxInfo 的定义也被接受。

练习：确定Not,FromMaybe, 和Fst 的节数。

与GADTs的协同作用

在与GADTs一起工作时，对封闭类型族的需求最常出现。这里有一个异质列表的定义：

type HList :: [Type] -> Type
data HList xs where
  HNil :: HList '[]
  (:&) :: x -> HList xs -> HList (x : xs)
infixr 5 :&

它可以用来表示不同类型的值的序列：

h1 :: HList [Integer, String, Bool]
h1 = 42 :& "Hello" :& True :& HNil

h2 :: HList [Char, Bool]
h2 = 'x' :& False :& HNil

就像对待普通列表一样，我们可以定义一些操作，比如计算长度：

hlength :: HList xs -> Int
hlength HNil = 0
hlength (_ :& xs) = 1 + hlength xs

ghci> hlength h1
3

ghci> hlength h2
2

然而，即使是像连接这样微不足道的事情，我们也需要类型级别的计算：

happend :: HList xs -> HList ys -> HList ??

happened h1 h2 的类型应该是什么？那么，它必须包括第一个列表的元素，然后是第二个列表的元素。这正是Append 类型家族所实现的。

happend :: HList xs -> HList ys -> HList (Append xs ys)

而这正是人们寻求封闭类型族的典型原因：实现对GADT的操作。

评价顺序，或缺乏评价顺序

Haskell是一种懒惰的语言，它的评估策略使我们能够写出如下的代码：

ghci> take 10 (iterate (+5) 0)
[0,5,10,15,20,25,30,35,40,45]

现在让我们在类型层面上尝试一个类似的壮举。首先，我们定义对应于take 和iterate (+5) 的类型族：

type IteratePlus5 :: Nat -> [Nat]
type family IteratePlus5 k where
  IteratePlus5 k = k : IteratePlus5 (k+5)
    
type Take :: Nat -> [a] -> [a]
type family Take n a where
  Take 0 xs = '[]
  Take n (x : xs) = x : Take (n-1) xs

我们可以看到，Take 是按预期工作的：

ghci> :kind! Take 3 [0, 1, 2, 3, 4, 5]
Take 3 [0, 1, 2, 3, 4, 5] :: [Nat]
= '[0, 1, 2]

另一方面，IteratePlus5 将类型检查器送入一个无限循环：

ghci> :kind! Take 10 (IteratePlus5 0)
^CInterrupted.

很明显，类型族的评估不是懒惰的。事实上，它也不是急切的--规则根本就没有被定义。即使在处理有限数据时，推理作为类型族实现的算法的时间或空间复杂性也是不可能的。#18965是一个GHC问题，为这个问题提供了一个解决方案。同时，这也是一个必须注意的陷阱。

开放式类型族

Open type families

由Impure Pics制造。

假设我们想给一些类型分配一个文本标签，可能是为了序列化的目的：

type Label :: Type -> Symbol
type family Label t where
  Label Double = "number"
  Label String = "string"
  Label Bool   = "boolean"
  ...

我们可以使用KnownSymbol 类在术语级别上重新定义标签：

label :: forall t. KnownSymbol (Label t) => String
label = symbolVal (Proxy @(Label t))

ghci> label @Double
"number"

但是如果用户在另一个模块中定义了自己的类型MyType 呢？他们怎么能给它分配一个标签，比如说label @MyType = "mt" ？

对于封闭的类型族，这是不可能的。这就是开放类型族进入画面的地方。要使一个类型族开放，我们必须在其标题中省略where 关键字：

type Label :: Type -> Symbol
type family Label t

实例不再缩进。相反，它们被声明在顶层，可能在不同的模块中，并以type instance 关键字序列为前缀：

type instance Label Double = "number"
type instance Label String = "string"
type instance Label Bool   = "boolean"

现在，用户可以很容易地为自己的类型定义一个Label 的实例：

data MyType = MT
type instance Label MyType = "mt"

ghci> label @MyType
"mt"

在这一点上，人们可能会开始怀疑为什么有人会喜欢封闭的类型族，如果开放的类型族似乎更强大和可扩展。其原因是，可扩展性是有代价的：开放类型族的方程不允许重叠。但是，重叠的方程往往是有用的!

重叠的方程

闭合类型族的子句是有序的，从上到下匹配。这使得我们可以对逻辑联结进行如下定义：

type And :: Bool -> Bool -> Bool
type family And a b where
  And True True = True
  And _    _    = False

如果我们对它们重新排序，And _ _ 方程将匹配所有的输入。但是它排在第二位，所以And True True 子句就有机会匹配。这是封闭式类型族相对于开放式类型族的关键属性：方程可能是重叠的。

开放类型族需要列举所有的可能性，导致组合爆炸：

type And' :: Bool -> Bool -> Bool
type family And' a b

type instance And' True  True  = True
type instance And' True  False = False
type instance And' False True  = False
type instance And' False False = False

兼容的方程

如果说在开放类型族中不允许方程重叠，而在封闭类型族中允许方程重叠，那是过于简单化了。在实践中，规则要更复杂一些。

开放式类型族的实例必须是兼容的。如果以下情况中至少有一个成立，则类型族实例是兼容的：

它们的左手边是分开的（也就是说，不重叠）
它们的左手边与一个替换统一，在这个替换下，右手边是相等的。

第二个条件使GHC能够接受更多的程序。考虑一下下面的例子：

type family F a
type instance F a    = [a]
type instance F Char = String

虽然左手边明显重叠（a 比Char 更加通用），但最终并无区别。如果用户需要减少F Char ，两个方程的结果都是[Char] 。倾向于数学的读者会认识到这个属性是汇合的。

这里有一个更有趣的例子，有几个类型变量：

type family G a b
type instance G a    Bool = a -> Bool
type instance G Char b    = Char -> b

左手边统一于一个替换a=Char,b=Bool 。右手边在该替换下是相等的：

type instance G Char Bool = Char -> Bool

因此，接受这两个变量是安全的：它们是兼容的。

实例兼容性在封闭类型族中也起作用。考虑FInteger 和FString：

type family FInteger a where
  FInteger Char = Integer
  FInteger a    = [a]
  
type family FString a where
  FString Char = String
  FString a    = [a]

现在，对于一个未知的x ，GHC能否将FInteger x 减少到[x] ？不能，因为这些方程是从上到下匹配的，GHC首先需要检查x 是否是Char ，在这种情况下，它将减少到Integer 。

另一方面，FString 中的方程是兼容的。因此，如果我们有FString x ，那么x 是否是Char 并不重要，因为两个方程都会被还原为[x] 。

注射性类型族

有些类型族是注入式的：也就是说，我们可以从它们的输出推导出它们的输入。例如，考虑布尔否定法：

type Not :: Bool -> Bool
type family Not x where
  Not True = False
  Not False = True

如果我们知道Not x 是True ，那么我们可以得出结论，x 是False 。默认情况下，编译器不应用这种推理：

s :: forall x. (Not x ~ True, Typeable x) => String
s = show (typeRep @x)

ghci> s
<interactive>:7:1: error:
    • Couldn't match type 'Not x0' with ''True'
        arising from a use of 's'
      The type variable 'x0' is ambiguous

即使编译器可以根据Not x 是True 这一事实将x 实例化为False ，但它没有这样做。当然，我们可以手动操作，GHC会检查我们的操作是否正确：

ghci> s @False
"'False"

ghci> s @True
    <interactive>:12:1: error:
    • Couldn't match type ''False' with ''True'
        arising from a use of 's'

当我们将x 实例化为False ，Not x ~ True 的约束被满足。当我们试图将其实例化为True ，约束不被满足，我们看到一个类型错误。

只有一种有效的方法来实例化x 。如果GHC能自动做到这一点，那不是很好吗？这正是注入式类型族允许我们实现的。按照下面的方法改变Not 的类型族头：

type family Not x = r | r -> x where

首先，我们给Not x 的结果起一个名字，这里我叫它r 。然后，使用功能依赖的语法，我们指定r 决定x 。GHC在需要实例化x 的时候会利用这一信息。

ghci> s
"'False"

这个功能是由TypeFamilyDependencies 扩展启用的。与普通的函数依赖一样，它只被用来指导类型推理，不能用来产生等价物。因此，不幸的是，下面的内容被拒绝了：

not_lemma :: Not x :~: True -> x :~: False
not_lemma Refl = Refl
    -- Could not deduce: x ~ 'False
    -- from the context: 'True ~ Not x

这是一个已知的限制。

关联类型

从代码组织的角度来看，有时将一个开放的类型族与一个类联系起来是有意义的。

考虑一下容器和元素的概念：

type family Elem a
class Container a where
  elements :: a -> [Elem a]
  
type instance Elem [a] = a
instance Container [a] where
  elements = id

type instance Elem ByteString = Word8
instance Container ByteString where
  elements = ByteString.unpack

我们只会对那些也有容器实例的类型使用Elem，所以把它移到类中会更清楚。这正是关联类型使我们能够做到的：

class Container a where
  type Elem a
  elements :: a -> [Elem a]
  
instance Container [a] where
  type Elem [a] = a
  elements = id
 
instance Container ByteString where
  type Elem ByteString = Word8
  elements = ByteString.unpack

关联类型大多等同于开放类型族，喜欢哪种类型往往是一个风格问题。

关联类型的一个优点是它们可以有默认值：

type family Unwrap x where
  Unwrap (f a) = a
  
class Container a where
  type Elem a
  type Elem x = Unwrap x
  elements :: a -> [Elem a]

这样，我们可以在大多数情况下避免明确定义Elem：

instance Container [a] where
  elements = id
  
instance Container (Maybe a) where
  elements = maybeToList
  
instance Container ByteString where
  type Elem ByteString = Word8
  elements = ByteString.unpack

目前的研究表明，关联类型是一种比顶层开放类型族更有前途的抽象机制。参见ICFP 2017--受限类型族。

数据族

数据族可以被认为是类型族，它的实例总是新的、专用的数据类型。

考虑一下下面的例子：

data family Vector a
newtype instance Vector () = VUnit Int
newtype instance Vector Word8 = VBytes ByteArray
data instance Vector (a, b) = VPair !(Vector a) !(Vector b)

一个Vector 是一个元素的序列，但是对于单元类型，我们可以简单地将长度存储为Int ，这比为每个单元值分配内存要有效得多。

注意，我们可以在每个实例的基础上决定在data 和newtype 。

这个例子可以用类型族重写如下：

type family VectorF a
 
type instance VectorF () = VectorUnit
data VectorUnit = VUnit Int
    
type instance VectorF Word8 = VectorWord8
data VectorWord8 = VBytes ByteArray

type instance VectorF (a, b) = VectorPair a b
data VectorPair a b = VPair (VectorF a) (VectorF b)

在这个翻译中，每个类型族实例都有一个数据类型。然而，即使抛开模板，这也是一个不完美的翻译。数据族为我们提供了其他东西：它们引入的类型构造函数是生成的，所以我们不必担心它的算术性

例如，下面的代码是有效的：

data Pair1 f x = P1 (f x) (f x)
type VV = Pair1 Vector

另一方面，Pair1 VectorF 会被拒绝，因为这并没有应用于它的参数：

数据族也可以与一个类相关联。

class Vectorizable a where
  data Vector a
  vlength :: Vector a -> Int

就像关联类型和开放类型族一样，这主要是一个代码组织的问题：

非参数化的量化

在术语中，forall 是一个参数化的量化器，这个事实可以用来推理函数。例如，考虑身份函数的类型签名。

id :: forall a. a -> a

它对它的参数只有一件事可以做：未经处理地返回它。例如，它不能在给定一个整数时返回42：

id :: forall a. a -> a
id (x :: Int) = 42      -- Rejected!
id x = x

这不仅对代码的推理很重要，而且对保证类型擦除也很重要。

然而，这些都不适用于类型家族，它们对forall ，有自己的解释：

type F :: forall a. a -> a
type family F a where
  F (a :: Nat) = 42
  F a = a

这段代码被接受，并且没有错误地工作:

ghci> :kind! F 0
F 0 :: Nat
= 42

ghci> :kind! F "Hello"
F "Hello" :: Symbol
= "Hello"

一方面，这阻碍了我们对类型族进行推理的能力。另一方面，这基本上相当于在类型层面上的Π-类型，所以它可以被很好地利用。

非线性模式

在术语级函数中，一个变量不能被绑定超过一次:

dedup (x : x : xs) = dedup (x : xs)   -- Rejected!
dedup (y : xs) = y : dedup xs
dedup [] = []

如果我们想检查两个输入是否相等，我们必须明确地用== 操作符来做:

dedup (x1 : x2 : xs) | x1==x2 = dedup (x1 : xs)
dedup (y : xs) = y : dedup xs
dedup [] = []

另一方面，在类型族实例中，前一种风格也是允许的:

type family Dedup xs where
  Dedup (x : x : xs) = Dedup (x : xs)
  Dedup (y : xs) = y : Dedup xs
  Dedup '[] = '[]

这个特征恰好被称为非线性模式，但不要把它与线性类型混淆，两者没有关系。

总结

类型族是一个强大而广泛使用的（占Hackage软件包的20%）功能。它们在2005年以关联类型同义词的形式被引入，至今仍是一个积极研究的主题，有封闭类型族（2013）、注入类型族（2015）和约束类型族（2017）等创新。

虽然是一个有用的工具，但由于#8095（"TypeFamilies painfully slow"）和#12088（"Type/data family instances in kind checking"）等公开问题，使用类型族时必须非常小心。然而，目前正在努力解决这些问题。Serokell的GHC部门致力于改善Haskell的类型级编程设施。

Haskell中的类型家族——确切的指南