学习Spring Data findAll的反模式

129 阅读9分钟

简介

在这篇文章中,我们将看到为什么Spring Data findAll方法实际上是一个糟糕的反模式。

一张图片胜过1000字。

The Spring Data findAll Anti-PatternThe Spring Data findAll Anti-Pattern

你会惊讶地发现,有多少次我看到这样的问题:从数据库中获取大量的数据,只是为了在Java中进行过滤,并丢弃绝大部分不需要的记录,以满足特定的业务需求。

反模式

首先,让我们从维基百科给出的反模式定义开始。

在软件工程、项目管理和业务流程中,反模式是对一个反复出现的问题的常见反应,通常是无效的,并有可能产生很大的反作用。

这正是你在Spring DataRepository 基础接口中提供一个findAll 方法时得到的结果,所有的数据访问对象将自动继承这个方法。

Spring Data Repository的层次结构

通常,当你创建一个Spring Data JPARepository 时,你的自定义接口会扩展JpaRepository

@Repository
public interface PostRepository 
        extends JpaRepository<Post, Long> {
        
}

然而,JpaRepository 提供了一系列的方法,它要么直接声明,要么从底层基础接口继承,如下图所示。

Spring Data findAll Anti-PatternSpring Data findAll Anti-Pattern

即使JpaReository 没有明确地提供一个findAll 方法,它仍然会从所有Spring Data模块共享的CrudRepository 继承一个。

Spring Data findAll反模式

findAll 方法的问题是,作为一个API设计者,你不应该为你的客户提供一个获取整个数据库表的方法,因为如果你这样做了,有些开发者确实会这样做。

让我们假设我们的系统中有以下PostTag 实体。

List<Tag> tags = List.of(
    new Tag()
        .setId(1L)
        .setName("JDBC"),
    new Tag()
        .setId(2L)
        .setName("JPA"),
    new Tag()
        .setId(3L)
        .setName("Hibernate")
);

tags.forEach(tag -> entityManager.persist(tag));

for (long i = 1; i <= POST_SIZE; i++) {
    entityManager.persist(
        new Post()
            .setId(i)
            .setTitle(
                String.format(
                    "High-Performance Java Persistence, Part %d",
                    i
                )
            )
            .addTag(tags.get((int) i % 3))
    );
}

我们想获得所有JDBC或Hibernate帖子的标题,那么我们如何才能做到这一点?

最简单的方法当然是使用一个单一的SQL查询,看起来如下。

public List<String> findPostTitleByTags(List<String> tags) {
    return entityManager.createNativeQuery("""
        select p.title
        from post p
        where exists (
            select 1
            from post_tag pt 
            join tag t on pt.tag_id = t.id and pt.post_id = p.id
            where t.name in (:tags)
        )
        order by p.id
        """)
    .setParameter("tags", tags)
    .getResultList();
}

但是,并不是每个Java开发者都喜欢用一个简单的SQL查询来解决这个问题。他们中的一些人会不惜一切代价来避免编写任何SQL查询。

而且,由于我们离 "解决 "这个问题只差一个findAll 方法的调用,在有人提供以下解决方案之前,不会花很长时间。

List<String> postTitlesStreamRecords = postRepository.findAll()
    .stream()
    .filter(
        post -> post.getTags().stream()
                    .map(Tag::getName)
                    .anyMatch(matchingTags::contains)
    )
    .sorted(Comparator.comparing(Post::getId))
    .map(Post::getTitle)
    .collect(Collectors.toList());

当执行findAll 工作方法时,Hibernate将执行以下查询。

Query:["select post0_.id as id1_1_, post0_.title as title2_1_ from post post0_"], Params:[()]

Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(1)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(2)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(3)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(4)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(5)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(6)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(7)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(8)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(9)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(10)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(11)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(12)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(13)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(14)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(15)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(16)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(17)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(18)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(19)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(20)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(21)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(22)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(23)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(24)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(25)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(26)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(27)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(28)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(29)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(30)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(31)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(32)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(33)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(34)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(35)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(36)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(37)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(38)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(39)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(40)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(41)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(42)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(43)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(44)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(45)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(46)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(47)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(48)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(49)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(50)]

不太好吧!

你之所以看到这么多的二次查询是由于filter 谓词需要获取tags 集合以匹配提供的标签。

防止Spring Data fetchAll反模式的发生

findAll 方法不应该被默认提供,这就是为什么 HibernateRepository从Hibernate Types项目中,废弃了它。

此外,如果你调用它,它将抛出一个 UnsupportedOperationException.

为了确保findAll 方法不会被错误地调用,只需让你的Repository 类扩展HibernateRepository 接口。

@Repository
public interface PostRepository extends 
    HibernateRepository<Post>,
    JpaRepository<Post, Long>, CustomPostRepository {

}

并且,将com.vladmihalcea.spring.repository 包提供给@EnableJpaRepositories 注解。

@EnableJpaRepositories(
    basePackages = {       
        "com.vladmihalcea.spring.repository",
        ...
    }
)

这样一来,你就会立刻发现,findAll 方法是应该避免的。

HibernateRepository FindAllHibernateRepository FindAll

然而,如果你确定对于特定的使用情况,有一个findAll 方法是没有问题的,那么你可以简单地覆盖它,如下所示。

@Repository
public interface PostRepository extends HibernateRepository<Post>,
    JpaRepository<Post, Long>, CustomPostRepository {

    @Query("from Post")
    @Override
    List<Post> findAll();
}

这样一来,你就会明确说明确实需要findAll 方法,尽管我怀疑在大多数非琐碎的应用中会出现这种情况。

Hibernate隐式多态查询反模式

findAll 反模式让我想起了Hibernate的一个不太知名的 "功能",它允许你用一个查询来选择整个数据库。

//Don't try this at home!

List<Object> objects = entityManager.createQuery("""
    select o 
    from java.lang.Object o
    """)
.getResultList();

这个查询之所以有效,是因为Hibernate提供的隐式多态查询功能。你之所以没有看到这种反模式在实践中的应用,是因为Hibernate的开发者认为鼓励一次性获取整个数据库并不是一个好主意,所以他们在文档中省略了这一点。

在内部,Hibernate团队在测试中确实使用了这样的查询。

protected void cleanupTestData() {
    doInHibernate(this::sessionFactory, session -> {
        session.createQuery(
            "delete from java.lang.Object"
        ).executeUpdate();
    });
}

但是,由于这只限于Hibernate的集成测试,所以使用这样一个危险的查询是没有问题的。而且,有趣的是,并不是所有的Hibernate测试都使用它。有些Hibernate集成测试只是在测试之间删除并重新创建DB模式,所以删除所有记录是没有意义的。

结论

Spring Data的findAll方法完全遵循了维基百科对反模式的定义,如果你默认避免它,那就更好了,因为这样一来,你可以防止一些非常低效的数据访问操作,在你的生产系统中拥有足够多的数据之前,这些操作可能会被忽略掉。

尽管如此,仅仅因为你废弃了默认的findAll 方法,这并不意味着你不能明确地声明它。然而,这样一来,你会通知其他开发者,你真的知道自己在做什么,而且你仍然可以享受在你的应用程序内存中加载整个数据库表。