【个人笔记】Django 3.2 登录策略简要分析（一）前段时间接手了一个使用django 3.2作为后端框架的SPA项

前段时间接手了一个使用django 3.2作为后端框架的SPA项目，它的用户功能直接建立在框架自带的用户系统基础上。而登录数据库，我发现其中的password字段并没有储存用户的明文密码，而是一串形如pbkdf2_sha256$260000$Rq3gdKydANFcvIPzPKEouX$JEDoJH6n2Eu1JouHMIxF7ZMCIrz1yZBBn8z1/oSPf3A=的密文。

于是我出于好奇，便顺带着看了一下django用户登录系统的实现代码。

在我这个项目中，封装的登录接口源码大致如下：

from django.contrib import auth

class UserLoginAPI(APIView):
    def post(self, request):
        data = request.data
        user = auth.authenticate(username=data["username"], password=data["password"])
        # None is returned if username or password is wrong
        if user:
            if not user.two_factor_auth:
                auth.login(request, user)
                return self.success("Succeeded")
            # 后续处理双因素认证的代码略...
        else:
            return self.error("Invalid username or password")

很容易注意到，登录过程中涉及到的两个非常关键的函数auth.authenticate和auth.login。前者主要负责校验数据库中是否存在请求登录的用户名，以及登录时提交的密码是否正确。后者主要负责在服务端为用户登录申请一个SessionKey，并写入请求的response header的Set-Cookie字段以便返回给前端。

本笔记中，主要感兴趣的是auth.authenticate函数的实现。

根据包的导入命令，很容易在找到其源码。它位于django\contrib\auth_init_.py

def authenticate(request=None, **credentials):
    """
    If the given credentials are valid, return a User object.
    """
    for backend, backend_path in _get_backends(return_tuples=True):
    
        backend_signature = inspect.signature(backend.authenticate)
        try:
            backend_signature.bind(request, **credentials)
        except TypeError:
            # This backend doesn't accept these credentials as arguments. Try the next one.
            continue
        try:
            user = backend.authenticate(request, **credentials)
        except PermissionDenied:
            # This backend says to stop in our tracks - this user should not be allowed in at all.
            break
        if user is None:
            continue
        # Annotate the user object with the path of the backend.
        user.backend = backend_path
        return user

    # The credentials supplied are invalid to all backends, fire signal
    user_login_failed.send(sender=__name__, credentials=_clean_credentials(credentials), request=request)

这里涉及到一个知识点，就是怎么找到你电脑上python第三方包的安装路径。在我的电脑上这个路径为C:\Users\Administrator\AppData\Local\Programs\Python\Python38\Lib\site-packages\django\

在该函数中，外层循环会遍历setting.AUTHENTICATION_BACKENDS列表，配合第一个try...except语句找出可供后端使用的身份验证接口backend。这个列表一般在你项目中的setting.py文件中进行配置。

一旦找到合适的接口（在我这个项目中是django.contrib.auth.backends.ModelBackend），就会调用backend.authenticate进行用户名和密码的校验，一旦校验成功便会返回该用户的Model对象。而如果校验失败（方法返回None），则会触发continue语句，程序还会继续尝试剩余的身份验证接口。

接下去看看这个backend.authenticate是怎么实现的。在我这个项目中，其源码位于django\contrib\auth\backends.py。

from django.contrib.auth import get_user_model

UserModel = get_user_model()

class ModelBackend(BaseBackend):
    def authenticate(self, request, username=None, password=None, **kwargs):
        # 兜底措施略...
        try:
            user = UserModel._default_manager.get_by_natural_key(username)
        except UserModel.DoesNotExist:
            # 兜底措施略...
        else:
            if user.check_password(password) and self.user_can_authenticate(user):
                return user
    # 其他方法略...

首先简单说一下用于获取项目用户模型类的get_user_model函数。这个函数的实现比较简单，大概来说就是会根据你在项目setting.py中设定的AUTH_USER_MODEL将对应的用户模型类返回回来。在我这个项目中，基于django.contrib.auth.base_user包中的AbstractBaseUser自行封装了一层用户模型类。

于是乎其实这个方法也不过是做了一层封装而已。它的主要工作是先根据传入的username在数据库中查询得到对应的用户模型对象，然后调用其check_password方法以检测提交的登录密码与数据库中存储的真实密码是否一致。

接下来看一下check_password方法的实现。它的源码位于django\contrib\auth\base_user.py。

from django.contrib.auth.hashers import check_password

class AbstractBaseUser(models.Model):
    password = models.CharField(_('password'), max_length=128)
    last_login = models.DateTimeField(_('last login'), blank=True, null=True)

    def check_password(self, raw_password):
        # setter函数用于自动刷新密码，此处略
        return check_password(raw_password, self.password, setter)

还是一层代码封装，这里除了登录时提交的明文密码raw_password，还会传入储存在数据库中的密文，看来离找到真正的对比代码已经不远了。

接着找到check_password函数的源码：

def check_password(password, encoded, setter=None, preferred='default'):
    
    # 中间代码略...
    
    try:
        hasher = identify_hasher(encoded)
    except ValueError:
        # encoded is gibberish or uses a hasher that's no longer installed.
        return False

    is_correct = hasher.verify(password, encoded)

    # 中间代码略...
    
    return is_correct

这里涉及到两个关键的函数identify_hasher和hasher.verify。前者用于确定密码的加密算法，并返回用于处理这种加密算法的类/对象。后者就是真正进行密码校验了。

先来简单看看前者，它的源码也位于hasher.py中：

def identify_hasher(encoded):
    if ((len(encoded) == 32 and '$' not in encoded) or
            (len(encoded) == 37 and encoded.startswith('md5$$'))):
        algorithm = 'unsalted_md5'
    # Ancient versions of Django accepted SHA1 passwords with an empty salt.
    elif len(encoded) == 46 and encoded.startswith('sha1$$'):
        algorithm = 'unsalted_sha1'
    else:
        algorithm = encoded.split('$', 1)[0]
    return get_hasher(algorithm)

该函数的实现比较简单，它会先检测一下密文是否具备某些加密算法的特征。如果还不能确定加密算法的话，就截取密文中第一个$符号之前的部分，将其传递给函数get_hasher作进一步查询。get_hasher函数的实现就不进一步分析了。

观察一下我这个项目中数据库中存储的密文，很容易知道项目采用的是pbkdf2_sha256算法（即PBKDF2 + HMAC + SHA256），也很容易找到用于处理它的类（也在hasher.py中）：

from django.utils.crypto import (
    constant_time_compare, pbkdf2,
)

class PBKDF2PasswordHasher(BasePasswordHasher):
    algorithm = "pbkdf2_sha256"
    iterations = 260000
    digest = hashlib.sha256

    def encode(self, password, salt, iterations=None):
        # 兜底代码略...
        hash = pbkdf2(password, salt, iterations, digest=self.digest)
        hash = base64.b64encode(hash).decode('ascii').strip()
        return "%s$%d$%s$%s" % (self.algorithm, iterations, salt, hash)

    def decode(self, encoded):
        algorithm, iterations, salt, hash = encoded.split('$', 3)
        # 兜底代码略...
        return {
            'algorithm': algorithm,
            'hash': hash,
            'iterations': int(iterations),
            'salt': salt,
        }

    def verify(self, password, encoded):
        decoded = self.decode(encoded)
        encoded_2 = self.encode(password, decoded['salt'], decoded['iterations'])
        return constant_time_compare(encoded, encoded_2)
    
    # 其他方法略...

这里的verify方法就与之前的check_password中的is_correct = hasher.verify(password, encoded)这行代码对应上了。我们可以清晰地看到，django通过将登录时提交的密码加密成密文，再与数据库中储存的真实密码对应密文进行比较的方式，来实现检查登录时提交的密码是否正确。

这种校验的方法大大提升了用户密码的安全性。简单来说，这是因为利用sha256算法为一对多映射，即使有人拿到了数据库中经过处理的密文，也无法还原出明文的用户密码。

代码比较简单，只记录几个要点:

constant_time_compare函数实际上会执行python原生的secrets.compare_digest接口。该接口通过恒定时间算法比较传入的两个字符串是否相等，以规避计时攻击。
encode方法并不会解密数据库中的密文，它只负责提取密文中的有关信息，以便使用相同参数计算登录时提交的明文密码的密文。从中我们可以知道密文被$分割成了四个部分，依次为加密算法、加密的迭代（反复）次数、盐、明文密码混入盐之后得到的哈希值。
pbkdf2函数会返回一个bytes-like object，在此基础上又对其进行了一次base64编码，最后将其转码成字符串的形式，以便数据库储存或进行比较。