django REST fromework 序列化时多次查询数据库的解决方案

3,036 阅读2分钟

django REST framework 在django程序中不只是帮助提供REST风格的接口;还有接口的系列化输出;权限管理等; 不过在序列化过程中,如果什么都使用默认的方式,可能导致序列化时多次查询数据库,造成性能问题(示例来源):

  • model:
from django.db import models


class Category(models.Model):
    name = models.CharField(max_length=100)


class Blogpost(models.Model):
    title = models.CharField(max_length=100)
    categories = models.ManyToManyField(Category)
  • url + view:
# urls.py
from rest_framework import routers
from . import views


router = routers.DefaultRouter()
router.register(r'blogposts', views.BlogpostViewSet)
---------------------------------------------------------
# views.py
from rest_framework import viewsets

class BlogpostViewSet(viewsets.ModelViewSet):
    queryset = Blogpost.objects.all().order_by('date')
    serializer_class = serializers.BlogpostSerializer

等这些做完,调用接口,我们在控制台将sql输出的话,就会看到像这样的东西:

# sql输出设置
LOGGING = {
    ...
    "loggers": {
        ...
        "django.db.backends": {
            "handlers": ["console"],
            "level": "DEBUG",
            "propagate": False,
        }
    }
}
-- sql 输出结果
SELECT "app_blogpost"."id", "app_blogpost"."title" FROM "app_blogpost";

SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1025;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 193;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 757;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 853;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1116;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1126;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 964;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 591;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1112;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1034;
...

是不是吓了一跳?在序列化时,原生的方式是逐条拿出来,再去查找对应的关联对象;笔者的业务表中有四五个外键关联对象,点完那个接口后好几屏的sql铺面而来;如果数据量大的话,这对数据库绝对是个很大的压力!后面去网上找了下,找到了比较适合的解决方案,下面分享下: 1. 序列化中需要使用到一对多的对象中的字段的:

# views.py
queryset = BusinessTable.objects.all().select_related(
        'one_to_one_field_1',
        'one_to_one_field_2',
        ...
    )
---------------------
# serializer
class xxxSerializer(serializers.ModelSerializer):
    target_field = serializers.SerializerMethodField()

    def get_target_field(self, obj):
        ...
        return obj.one_to_one_field_1.target_field

2. 序列化中需要使用到一对多,多对多的对象中的字段的

queryset = BusinessTable.objects.all().prefetch_related(
        'many_to_many_field_1',
        'many_to_many_field_2',
        ...
    )
---------------------
# serializer
class xxxSerializer(serializers.ModelSerializer):
    target_field_set = serializers.SerializerMethodField()

    def get_target_field(self, obj):
        ...
        return [it.arget_field for it in obj.many_to_many_field_1.all()]

3. 序列化中需要对多对多的字段聚合的

queryset = BusinessTable.objects.all().annotate(
        cnt_xxx=Count("many_to_many_field")
    )
---------------------
# serializer
class xxxSerializer(serializers.ModelSerializer):
    target_field = serializers.SerializerMethodField()

    def get_target_field(self, obj):
        return obj.cnt_xxx

4. 总结

queryset = BlessingBlog.objects.all().annotate(
        cnt_xxx=Count("xxx") # 需要聚合的新命名字段,因annotate对语句的解释,django其实会在外部语句执行联合查询,再分组,而不是直接走子查询。所以这里其实是取巧,所以要这样用只能加一个,否则要出错,要用在子查询来聚合的可以用RowSql或extra!
    ).select_related( # 一对一关联表数据
        'yyy1',
        'yyy2',
    ).prefetch_related( # 一对多,多对多关联表数据
        'xxx1',
        'xxx3',
    ).order_by('zzz1', 'zzz2')

最后,如果使用了annotate,系统给出的查询语句对聚合字段的处理不是通过子查询实现的,是通过联合查询再分组来实现的,这对性能也有很大的影响,而且聚合多个字段的话还要注意去重,如果对性能要求高的,可以考虑自己手写查询语句。

参考: 1 stack overflow 2 博客 3 django官网 4 博客