如何使用Spring Boot批量处理

486 阅读4分钟

春天来了!"开始使用Spring Boot批处理

Spring Batch是一个轻量级的开源框架,为开发可扩展的批处理应用程序而创建。批量处理大多用于在给定时间内处理大量数据的应用程序。例如,工资系统使用批处理,在每个月的特定时间向员工发送付款。

Spring Batch不包括一个内置的调度框架。它可以与QuartzControl-M 调度框架一起使用,在预定的时间处理数据。

在本教程中,我们将开发一个Spring Boot应用程序,从CSV文件中读取数据并将其存储在SQL数据库(H2数据库)中。

先决条件

  1. 在你的电脑上安装Java开发工具包[(JDK)]。
  2. 对[Spring Boot]有一定了解。

应用程序设置

  • 在你的浏览器上,导航到[spring intializr]。
  • 将项目名称设为springbatch
  • 添加lombok,spring web,h2 database,spring data jpa, 和spring batch 作为项目的依赖项。
  • 点击生成,下载生成的项目压缩文件。
  • 解压下载的文件,并在你喜欢的IDE上打开它。

数据层

  • 在根项目包中创建一个名为domain 的新包。
  • 在上面创建的domain 包中,创建一个名为Customer 的文件并添加以下代码。
@Entity(name = "person")
@Getter // Lombok annotation to generate Getters for the fields
@Setter // Lombok annotation to generate Setters for the fields
@AllArgsConstructor // Lombok annotation to generate a constructor will all of the fields in the class
@NoArgsConstructor // Lombok annotation to generate an empty constructor for the class
@EntityListeners(AuditingEntityListener.class)
public class Customer {
    @Id // Sets the id field as the primary key in the database table
    @Column(name = "id") // sets the column name for the id property
    @GeneratedValue(strategy = GenerationType.AUTO) // States that the id field should be autogenerated
    private Long id;

    @Column(name = "last_name")
    private String lastName;
    @Column(name = "first_name")
    private String firstName;

    // A method that returns firstName and Lastname when an object of the class is logged
    @Override
    public String toString() {
        return "firstName: " + firstName + ", lastName: " + lastName;
    }
}

上面的类有一个id 字段,用于数据库中的主键,lastNamefirstName 字段,我们将从data.csv文件中获取。

存储库层

  • 在根项目包中创建一个名为repositories 的新包。
  • 在上面创建的repositories 包中,创建一个名为CustomerRepository 的接口并添加以下代码。
// The interface extends JpaRepository that has the CRUD operation methods
public interface CustomerRepository extends JpaRepository<Customer, Long> {
}

处理器

  • 在根项目包中创建一个名为processor 的新包。
  • processor 包中,创建一个名为CustomerProcessor 的新的Java文件,然后添加下面的代码。
public class CustomerProcessor implements ItemProcessor<Customer, Customer> {
    // Creates a logger
    private static final Logger logger = LoggerFactory.getLogger(CustomerProcessor.class);
    // This method transforms data form one form to another.
    @Override
    public Customer process(final Customer customer) throws Exception {
        final String firstName = customer.getFirstName().toUpperCase();
        final String lastName = customer.getLastName().toUpperCase();
        // Creates a new instance of Person
        final Customer transformedCustomer = new Customer(1L, firstName, lastName);
        // logs the person entity to the application logs
        logger.info("Converting (" + customer + ") into (" + transformedCustomer + ")");
        return transformedCustomer;
    }
}

上面的类将数据从一种形式转化为另一种形式。ItemProcessor<I, O> 接受输入数据(I ),对其进行转换,然后将结果作为输出数据(O )返回。

在我们的案例中,我们将Customer 实体声明为输入和输出,这意味着我们的数据形式得以保持。

配置层

  • 在根项目包中创建一个名为config 的新包。这个包将包含我们所有的配置。
  • config 包中,创建一个名为BatchConfiguration 的新Java文件,并添加以下代码。
@Configuration // Informs Spring that this class contains configurations
@EnableBatchProcessing // Enables batch processing for the application
public class BatchConfiguration {

    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    @Autowired
    @Lazy
    public CustomerRepository customerRepository;

    // Reads the sample-data.csv file and creates instances of the Person entity for each person from the .csv file.
    @Bean
    public FlatFileItemReader<Customer> reader() {
        return new FlatFileItemReaderBuilder<Customer>()
        .name("customerReader")
        .resource(new ClassPathResource("data.csv"))
        .delimited()
        .names(new String[]{"firstName", "lastName"})
        .fieldSetMapper(new BeanWrapperFieldSetMapper<>() {{
        setTargetType(Customer.class);
        }})
        .build();
    }

    // Creates the Writer, configuring the repository and the method that will be used to save the data into the database
    @Bean
    public RepositoryItemWriter<Customer> writer() {
        RepositoryItemWriter<Customer> iwriter = new RepositoryItemWriter<>();
        iwriter.setRepository(customerRepository);
        iwriter.setMethodName("save");
        return iwriter;
    }

    // Creates an instance of PersonProcessor that converts one data form to another. In our case the data form is maintained.
    @Bean
    public CustomerProcessor processor() {
        return new CustomerProcessor();
    }

    // Batch jobs are built from steps. A step contains the reader, processor and the writer.
    @Bean
    public Step step1(ItemReader<Customer> itemReader, ItemWriter<Customer> itemWriter)
    throws Exception {

        return this.stepBuilderFactory.get("step1")
        .<Customer, Customer>chunk(5)
        .reader(itemReader)
        .processor(processor())
        .writer(itemWriter)
        .build();
    }

    // Executes the job, saving the data from .csv file into the database.
    @Bean
    public Job customerUpdateJob(JobCompletionNotificationListener listener, Step step1)
    throws Exception {

        return this.jobBuilderFactory.get("customerUpdateJob").incrementer(new RunIdIncrementer())
        .listener(listener).start(step1).build();
    }
}
  • config 包中,创建另一个名为JobCompletionNotificationListener 的Java类,并添加下面的代码。
@Component
public class JobCompletionListener extends JobExecutionListenerSupport {
    // Creates an instance of the logger
    private static final Logger log = LoggerFactory.getLogger(JobCompletionListener.class);
    private final CustomerRepository customerRepository;

    @Autowired
    public JobCompletionListener(CustomerRepository customerRepository) {
        this.customerRepository = customerRepository;
    }

    // The callback method from the Spring Batch JobExecutionListenerSupport class that is executed when the batch process is completed
    @Override
    public void afterJob(JobExecution jobExecution) {
        // When the batch process is completed the the users in the database are retrieved and logged on the application logs
        if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
        log.info("!!! JOB COMPLETED! verify the results");
        customerRepository.findAll()
        .forEach(person -> log.info("Found (" + person + ">) in the database.") );
        }
    }
}

控制器层

  • 在根项目包中创建一个名为controllers 的新包。
  • 在上面创建的controllers 包中,创建一个名为BatchController 的Java类,并添加下面的代码片段。
@RestController
@RequestMapping(path = "/batch")// Root path
    public class BatchController {
    @Autowired
    private JobLauncher jobLauncher;
    @Autowired
    private Job job;

    // The function below accepts a GET request to invoke the Batch Process and returns a String as response with the message "Batch Process started!!".
    @GetMapping(path = "/start") // Start batch process path
    public ResponseEntity<String> startBatch() {
        JobParameters Parameters = new JobParametersBuilder()
        .addLong("startAt", System.currentTimeMillis()).toJobParameters();
        try {
        jobLauncher.run(job, Parameters);
        } catch (JobExecutionAlreadyRunningException | JobRestartException
        | JobInstanceAlreadyCompleteException | JobParametersInvalidException e) {

        e.printStackTrace();
        }
        return new ResponseEntity<>("Batch Process started!!", HttpStatus.OK);
    }
}

应用程序配置

在资源目录中,在application.properties 文件中添加下面的代码。

# Sets the server port from where we can access our application
server.port=8080
# Disables our batch process from automatically running on application startup
spring.batch.job.enabled=false

测试

打开Postman,向**http://localhost:8080/batch/start**发送一个`GET` 请求,以启动批处理过程。

在发送GET 请求后,我们可以从应用程序的日志中看到批处理过程正在运行。

总结

现在你已经学会了如何执行批处理,请配置我们开发的应用程序,以使用Spring Boot Scheduler来安排在指定时间自动运行的作业,而不是发送HTTP调用来启动作业。