doris streamload写入过程

11 阅读15分钟

背景

梳理出streamload 的流程

* thread #431, name = 'rs_normal [work', stop reason = breakpoint 6.1
  * frame #0: 0x0000556a4b123c4b doris_be`doris::vectorized::NewJsonReader::_simdjson_set_column_value(this=0x00007f9433b9a180, value=0x00007f96db7721f8, block=0x00007f97c81a8ce8, slot_descs=size=4, valid=0x00007f96db7721ef) at new_json_reader.cpp:935:13
    frame #1: 0x0000556a4b1234c4 doris_be`doris::vectorized::NewJsonReader::_simdjson_handle_simple_json_write_columns(this=0x00007f9433b9a180, block=0x00007f97c81a8ce8, slot_descs=size=4, is_empty_row=0x00007f96db7723ef, eof=0x00007f9433b9a388) at new_json_reader.cpp:705:17
    frame #2: 0x0000556a4b11ffea doris_be`doris::vectorized::NewJsonReader::_simdjson_handle_simple_json(this=0x00007f9433b9a180, (null)=0x00007f953b52ba00, block=0x00007f97c81a8ce8, slot_descs=size=4, is_empty_row=0x00007f96db7723ef, eof=0x00007f9433b9a388) at new_json_reader.cpp:668:9
    frame #3: 0x0000556a4b11e737 doris_be`doris::vectorized::NewJsonReader::_read_json_column(this=0x00007f9433b9a180, state=0x00007f953b52ba00, block=0x00007f97c81a8ce8, slot_descs=size=4, is_empty_row=0x00007f96db7723ef, eof=0x00007f9433b9a388) at new_json_reader.cpp:502:12
    frame #4: 0x0000556a4b11bcee doris_be`doris::vectorized::NewJsonReader::get_next_block(this=0x00007f9433b9a180, block=0x00007f97c81a8ce8, read_rows=0x00007f96db7724b8, eof=0x00007f97c81a8a30) at new_json_reader.cpp:217:9
    frame #5: 0x0000556a4b60e080 doris_be`doris::vectorized::FileScanner::_get_block_wrapped(this=0x00007f97c81a8000, state=0x00007f953b52ba00, block=0x00007f978f75cfe0, eof=0x00007f96db7730d7) at file_scanner.cpp:465:13
    frame #6: 0x0000556a4b607e20 doris_be`doris::vectorized::FileScanner::_get_block_impl(this=0x00007f97c81a8000, state=0x00007f953b52ba00, block=0x00007f978f75cfe0, eof=0x00007f96db7730d7) at file_scanner.cpp:402:17
    frame #7: 0x0000556a4b6eeaab doris_be`doris::vectorized::Scanner::get_block(this=0x00007f97c81a8000, state=0x00007f953b52ba00, block=0x00007f978f75cfe0, eof=0x00007f96db7730d7) at scanner.cpp:143:17
    frame #8: 0x0000556a4b6ee67e doris_be`doris::vectorized::Scanner::get_block_after_projects(this=0x00007f97c81a8000, state=0x00007f953b52ba00, block=0x00007f978f75cfe0, eos=0x00007f96db7730d7) at scanner.cpp:119:16
    frame #9: 0x0000556a4b6f67a3 doris_be`doris::vectorized::ScannerScheduler::_scanner_scan(ctx=std::__shared_ptr<doris::vectorized::ScannerContext, __gnu_cxx::_S_atomic>::element_type @ 0x00007f953b0f8210, scan_task=std::__shared_ptr<doris::vectorized::ScanTask, __gnu_cxx::_S_atomic>::element_type @ 0x00007f953b542290) at scanner_scheduler.cpp:177:5
    frame #10: 0x0000556a4b6f569d doris_be`doris::vectorized::ScannerScheduler::submit(this=0x00007f96db773620)::$_0::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const at scanner_scheduler.cpp:75:17
    frame #11: 0x0000556a4b6f54b7 doris_be`doris::vectorized::ScannerScheduler::submit(this=0x00007f97cafd0300)::$_0::operator()() const::'lambda'()::operator()() const at scanner_scheduler.cpp:74:27
    frame #12: 0x0000556a4b6f5475 doris_be`bool std::__invoke_impl<bool, doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()() const::'lambda'()&>((null)=__invoke_other @ 0x00007f96db77366f, __f=0x00007f97cafd0300) at invoke.h:61:14
    frame #13: 0x0000556a4b6f5435 doris_be`std::enable_if<is_invocable_r_v<bool, doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()() const::'lambda'()&>, bool>::type std::__invoke_r<bool, doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()() const::'lambda'()&>(__fn=0x00007f97cafd0300) at invoke.h:114:9
    frame #14: 0x0000556a4b6f52ed doris_be`std::_Function_handler<bool (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()() const::'lambda'()>::_M_invoke(__functor=0x00007f97c8daab38) at std_function.h:290:9
    frame #15: 0x0000556a43737e1e doris_be`std::function<bool ()>::operator()(this=0x00007f97c8daab38) const at std_function.h:591:9
    frame #16: 0x0000556a4b6f4782 doris_be`doris::vectorized::ScannerSplitRunner::process_for(this=0x00007f97c8daab10, (null)=(__r = 1000000000)) at scanner_scheduler.cpp:414:25
    frame #17: 0x0000556a4b75987c doris_be`doris::vectorized::PrioritizedSplitRunner::process(this=0x00007f97ce996b90) at prioritized_split_runner.cpp:103:35
    frame #18: 0x0000556a4b73fac0 doris_be`doris::vectorized::TimeSharingTaskExecutor::_dispatch_thread(this=0x00007f97651fe810) at time_sharing_task_executor.cpp:566:77
    frame #19: 0x0000556a4b74e5c2 doris_be`void std::__invoke_impl<void, void (doris::vectorized::TimeSharingTaskExecutor::*&)(), doris::vectorized::TimeSharingTaskExecutor*&>((null)=__invoke_memfun_deref @ 0x00007f96db7746cf, __f=0x00007f978f80af80, __t=0x00007f978f80af90) at invoke.h:74:14
    frame #20: 0x0000556a4b74e50d doris_be`std::__invoke_result<void (doris::vectorized::TimeSharingTaskExecutor::*&)(), doris::vectorized::TimeSharingTaskExecutor*&>::type std::__invoke<void (doris::vectorized::TimeSharingTaskExecutor::*&)(), doris::vectorized::TimeSharingTaskExecutor*&>(__fn=0x00007f978f80af80, __args=0x00007f978f80af90) at invoke.h:96:14
    frame #21: 0x0000556a4b74e4dd doris_be`void std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>::__call<void, 0ul>(this=0x00007f978f80af80, __args=0x00007f96db774767, (null)=_Index_tuple<0UL> @ 0x00007f96db77473f) at functional:513:11
    frame #22: 0x0000556a4b74e496 doris_be`void std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>::operator()<void>(this=0x00007f978f80af80) at functional:598:17
    frame #23: 0x0000556a4b74e465 doris_be`void std::__invoke_impl<void, std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>&>((null)=__invoke_other @ 0x00007f96db77478f, __f=0x00007f978f80af80) at invoke.h:61:14
    frame #24: 0x0000556a4b74e425 doris_be`std::enable_if<is_invocable_r_v<void, std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>&>, void>::type std::__invoke_r<void, std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>&>(__fn=0x00007f978f80af80) at invoke.h:111:2
    frame #25: 0x0000556a4b74e28d doris_be`std::_Function_handler<void (), std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>>::_M_invoke(__functor=0x00007f978f821620) at std_function.h:290:9
    frame #26: 0x0000556a436da65e doris_be`std::function<void ()>::operator()(this=0x00007f978f821620) const at std_function.h:591:9
    frame #27: 0x0000556a4587d962 doris_be`doris::Thread::supervise_thread(arg=0x00007f978f821610) at thread.cpp:460:5
    frame #28: 0x00007f9898d66b7b libc.so.6`___lldb_unnamed_symbol3696 + 667
    frame #29: 0x00007f9898de47b8 libc.so.6`___lldb_unnamed_symbol4129 + 7

json 写入block的过程:

Status NewJsonReader::_simdjson_set_column_value(simdjson::ondemand::object* value, Block& block,const std::vector<SlotDescriptor*>& slot_descs,bool* valid) {

...

RETURN_IF_ERROR(_simdjson_write_data_to_column<false>(
    val, slot_descs[column_index]->type(), column_ptr,
    slot_descs[column_index]->col_name(), _serdes[column_index], valid));

...
}

下面是写入到column

template <typename T> Status DataTypeNumberSerDe<T>::deserialize_one_cell_from_json(IColumn& column, Slice& slice,const FormatOptions& options) const {
...

column_data.insert_value(val);
...

}

到了column_data.insert_value(val); 就会将json转换成列结构。当然存储和和现在的格式还是不一样,但是已经基本完成了转化成列结构了。

也就是说一开始的时候就经过parse转换成列结构,后面就是lsmtree的过程了。

lsm部分

lsm写入

* thread #519, name = 'brpc_heavy', stop reason = breakpoint 4.1
  * frame #0: 0x000055ad89c6a1e6 doris_be`doris::MemTable::insert(this=0x00007f62caea3600, input_block=0x00007f64cb71fe98, row_idxs=size=1) at memtable.cpp:199:5
    frame #1: 0x000055ad89cacc15 doris_be`doris::MemTableWriter::write(this=0x00007f6579f88000, block=0x00007f64cb71fe98, row_idxs=size=1) at memtable_writer.cpp:118:27
    frame #2: 0x000055ad8a76e50b doris_be`doris::DeltaWriter::write(this=0x00007f6579f08e00, block=0x00007f64cb71fe98, row_idxs=size=1) at delta_writer.cpp:160:30
    frame #3: 0x000055ad8aad6744 doris_be`doris::BaseTabletsChannel::_write_block_data(doris::PTabletWriterAddBlockRequest const&, long, std::unordered_map<long, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>>>>&, doris::PTabletWriterAddBlockResult*)::$_0::operator()(this=0x00007f64cb71fd60, writer=0x00007f6579f08e00) const at tablets_channel.cpp:619:9
    frame #4: 0x000055ad8aad66fb doris_be`doris::Status std::__invoke_impl<doris::Status, doris::BaseTabletsChannel::_write_block_data(doris::PTabletWriterAddBlockRequest const&, long, std::unordered_map<long, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>>>>&, doris::PTabletWriterAddBlockResult*)::$_0&, doris::BaseDeltaWriter*>((null)=__invoke_other @ 0x00007f64cb71f927, __f=0x00007f64cb71fd60, __args=0x00007f64cb71f9d8) at invoke.h:61:14
    frame #5: 0x000055ad8aad6688 doris_be`std::enable_if<is_invocable_r_v<doris::Status, doris::BaseTabletsChannel::_write_block_data(doris::PTabletWriterAddBlockRequest const&, long, std::unordered_map<long, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>>>>&, doris::PTabletWriterAddBlockResult*)::$_0&, doris::BaseDeltaWriter*>, doris::Status>::type std::__invoke_r<doris::Status, doris::BaseTabletsChannel::_write_block_data(doris::PTabletWriterAddBlockRequest const&, long, std::unordered_map<long, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>>>>&, doris::PTabletWriterAddBlockResult*)::$_0&, doris::BaseDeltaWriter*>(__fn=0x00007f64cb71fd60, __args=0x00007f64cb71f9d8) at invoke.h:114:9
    frame #6: 0x000055ad8aad6588 doris_be`std::_Function_handler<doris::Status (doris::BaseDeltaWriter*), doris::BaseTabletsChannel::_write_block_data(doris::PTabletWriterAddBlockRequest const&, long, std::unordered_map<long, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>>>>&, doris::PTabletWriterAddBlockResult*)::$_0>::_M_invoke(__functor=0x00007f64cb71fd60, __args=0x00007f64cb71f9d8) at std_function.h:290:9
    frame #7: 0x000055ad8aae0b59 doris_be`std::function<doris::Status (doris::BaseDeltaWriter*)>::operator()(this=0x00007f64cb71fd60, __args=0x00007f6579f08e00) const at std_function.h:591:9
    frame #8: 0x000055ad8aad5ca5 doris_be`doris::BaseTabletsChannel::_write_block_data(doris::PTabletWriterAddBlockRequest const&, long, std::unordered_map<long, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, std::vector<unsigned int, doris::CustomStdAllocator<unsigned int, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>>>>>>&, doris::PTabletWriterAddBlockResult*)::$_2::operator()(this=0x00007f64cb71fde0, tablet_id=1772371123463, write_func=function<doris::Status (doris::BaseDeltaWriter *)> @ 0x00007f64cb71fd60) const at tablets_channel.cpp:599:21
    frame #9: 0x000055ad8aad5855 doris_be`doris::BaseTabletsChannel::_write_block_data(this=0x00007f6579d54b10, request=0x00007f631fa65d00, cur_seq=0, tablet_to_rowidxs=size=1, response=0x00007f61bef4f9c0) at tablets_channel.cpp:619:9
    frame #10: 0x000055ad8aad05ad doris_be`doris::TabletsChannel::add_batch(this=0x00007f6579d54b10, request=0x00007f631fa65d00, response=0x00007f61bef4f9c0) at tablets_channel.cpp:657:12
    frame #11: 0x000055ad8a919e7c doris_be`doris::LoadChannel::add_batch(this=0x00007f6579c1a600, request=0x00007f631fa65d00, response=0x00007f61bef4f9c0) at load_channel.cpp:195:9
    frame #12: 0x000055ad8a90e6c1 doris_be`doris::LoadChannelMgr::add_batch(this=0x00007f65e4f0d280, request=0x00007f631fa65d00, response=0x00007f61bef4f9c0) at load_channel_mgr.cpp:178:26
    frame #13: 0x000055ad8ac5dc47 doris_be`doris::PInternalService::tablet_writer_add_block(google::protobuf::RpcController*, doris::PTabletWriterAddBlockRequest const*, doris::PTabletWriterAddBlockResult*, google::protobuf::Closure*)::$_0::operator()(this=0x00007f65e2726640) const at internal_service.cpp:502:54
    frame #14: 0x000055ad8ac5db85 doris_be`void std::__invoke_impl<void, doris::PInternalService::tablet_writer_add_block(google::protobuf::RpcController*, doris::PTabletWriterAddBlockRequest const*, doris::PTabletWriterAddBlockResult*, google::protobuf::Closure*)::$_0&>((null)=__invoke_other @ 0x00007f64cb72057f, __f=0x00007f65e2726640) at invoke.h:61:14
    frame #15: 0x000055ad8ac5db45 doris_be`std::enable_if<is_invocable_r_v<void, doris::PInternalService::tablet_writer_add_block(google::protobuf::RpcController*, doris::PTabletWriterAddBlockRequest const*, doris::PTabletWriterAddBlockResult*, google::protobuf::Closure*)::$_0&>, void>::type std::__invoke_r<void, doris::PInternalService::tablet_writer_add_block(google::protobuf::RpcController*, doris::PTabletWriterAddBlockRequest const*, doris::PTabletWriterAddBlockResult*, google::protobuf::Closure*)::$_0&>(__fn=0x00007f65e2726640) at invoke.h:111:2
    frame #16: 0x000055ad8ac5da2d doris_be`std::_Function_handler<void (), doris::PInternalService::tablet_writer_add_block(google::protobuf::RpcController*, doris::PTabletWriterAddBlockRequest const*, doris::PTabletWriterAddBlockResult*, google::protobuf::Closure*)::$_0>::_M_invoke(__functor=0x00007f64cb7206d0) at std_function.h:290:9
    frame #17: 0x000055ad88d2265e doris_be`std::function<void ()>::operator()(this=0x00007f64cb7206d0) const at std_function.h:591:9
    frame #18: 0x000055ad8ac7e6e2 doris_be`doris::WorkThreadPool<false>::work_thread(this=0x00007f658ae2ac10, thread_id=1) at work_thread_pool.hpp:159:17
    frame #19: 0x000055ad8ac7f1cc doris_be`void std::__invoke_impl<void, void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>((null)=__invoke_memfun_deref @ 0x00007f64cb7207df, __f=0x00007f658108b2a8, __t=0x00007f658108b2c0, __args=0x00007f658108b2b8) at invoke.h:74:14
    frame #20: 0x000055ad8ac7f155 doris_be`std::__invoke_result<void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>::type std::__invoke<void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>(__fn=0x00007f658108b2a8, __args=0x00007f658108b2c0, __args=0x00007f658108b2b8) at invoke.h:96:14
    frame #21: 0x000055ad8ac7f125 doris_be`decltype(std::__invoke((*this)._M_pmf, std::forward<doris::WorkThreadPool<false>*&>(fp), std::forward<int&>(fp))) std::_Mem_fn_base<void (doris::WorkThreadPool<false>::*)(int), true>::operator()<doris::WorkThreadPool<false>*&, int&>(this=0x00007f658108b2a8, __args=0x00007f658108b2c0, __args=0x00007f658108b2b8) const at functional:177:11
    frame #22: 0x000055ad8ac7f0f5 doris_be`void std::__invoke_impl<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>((null)=__invoke_other @ 0x00007f64cb72086f, __f=0x00007f658108b2a8, __args=0x00007f658108b2c0, __args=0x00007f658108b2b8) at invoke.h:61:14
    frame #23: 0x000055ad8ac7f065 doris_be`std::enable_if<is_invocable_r_v<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>, void>::type std::__invoke_r<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>(__fn=0x00007f658108b2a8, __args=0x00007f658108b2c0, __args=0x00007f658108b2b8) at invoke.h:111:2
    frame #24: 0x000055ad8ac7f032 doris_be`void std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>::__call<void, 0ul, 1ul>(this=0x00007f658108b2a8, __args=0x00007f64cb720907, (null)=_Index_tuple<0UL, 1UL> @ 0x00007f64cb7208df) at functional:661:11
    frame #25: 0x000055ad8ac7efc6 doris_be`void std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>::operator()<>(this=0x00007f658108b2a8) at functional:720:17
    frame #26: 0x000055ad8ac7ef95 doris_be`void std::__invoke_impl<void, std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>((null)=__invoke_other @ 0x00007f64cb72092f, __f=0x00007f658108b2a8) at invoke.h:61:14
    frame #27: 0x000055ad8ac7ef55 doris_be`std::__invoke_result<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>::type std::__invoke<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>(__fn=0x00007f658108b2a8) at invoke.h:96:14
    frame #28: 0x000055ad8ac7ef2d doris_be`void std::thread::_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>>::_M_invoke<0ul>(this=0x00007f658108b2a8, (null)=_Index_tuple<0UL> @ 0x00007f64cb72096f) at std_thread.h:301:13
    frame #29: 0x000055ad8ac7ef05 doris_be`std::thread::_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>>::operator()(this=0x00007f658108b2a8) at std_thread.h:308:11
    frame #30: 0x000055ad8ac7ee49 doris_be`std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>>>::_M_run(this=0x00007f658108b2a0) at std_thread.h:253:13
    frame #31: 0x00007f66b50e1224 libstdc++.so.6`___lldb_unnamed_symbol8036 + 20
    frame #32: 0x00007f66b4d66b7b libc.so.6`___lldb_unnamed_symbol3696 + 667
    frame #33: 0x00007f66b4de47b8 libc.so.6`___lldb_unnamed_symbol4129 + 7

  • 堆栈 #431 (Reader 端/生产端)

    • 这是一个 Scanner 线程
    • 它的任务是 “读” 。它正在运行 NewJsonReader,通过 SIMD 优化解析 JSON 文件,并将其转化为 Doris 内部的 vectorized::Block 格式。
    • 关键函数:NewJsonReader::get_next_block -> ScannerScheduler
  • 堆栈 #519 (Writer 端/消费端)

    • 这是一个 Receiver/Writer 线程
    • 它的任务是 “写” 。它已经接收到了 RPC 发送过来的数据块(PTabletWriterAddBlockRequest),正准备将其插入到内存表(MemTable::insert)中。
    • 关键函数:PInternalService::tablet_writer_add_block -> DeltaWriter::write -> MemTable::insert

3. 逻辑关系示意图

在 Doris 的数据导入过程中,这两个线程通常是 协作关系,而不是同一个执行流:

  1. Scanner 线程 (#431) :从文件读取 JSON -> 解析成 Block -> 通过网络发送出去。
  2. 网络传输 (RPC) :数据在节点间传递。
  3. brpc 线程 (#519) :接收到数据请求 -> 调度给 DeltaWriter -> 写入 MemTable

总结

它们是 典型的生产者-消费者模型 中的两个环节。thread #431 负责把原始 JSON 变成内存 Block,而 thread #519 负责把已经到位的内存 Block 落地到存储引擎的内存结构中。