doris streamload写入过程

5 阅读7分钟

背景

梳理出streamload 的流程

* thread #431, name = 'rs_normal [work', stop reason = breakpoint 6.1
  * frame #0: 0x0000556a4b123c4b doris_be`doris::vectorized::NewJsonReader::_simdjson_set_column_value(this=0x00007f9433b9a180, value=0x00007f96db7721f8, block=0x00007f97c81a8ce8, slot_descs=size=4, valid=0x00007f96db7721ef) at new_json_reader.cpp:935:13
    frame #1: 0x0000556a4b1234c4 doris_be`doris::vectorized::NewJsonReader::_simdjson_handle_simple_json_write_columns(this=0x00007f9433b9a180, block=0x00007f97c81a8ce8, slot_descs=size=4, is_empty_row=0x00007f96db7723ef, eof=0x00007f9433b9a388) at new_json_reader.cpp:705:17
    frame #2: 0x0000556a4b11ffea doris_be`doris::vectorized::NewJsonReader::_simdjson_handle_simple_json(this=0x00007f9433b9a180, (null)=0x00007f953b52ba00, block=0x00007f97c81a8ce8, slot_descs=size=4, is_empty_row=0x00007f96db7723ef, eof=0x00007f9433b9a388) at new_json_reader.cpp:668:9
    frame #3: 0x0000556a4b11e737 doris_be`doris::vectorized::NewJsonReader::_read_json_column(this=0x00007f9433b9a180, state=0x00007f953b52ba00, block=0x00007f97c81a8ce8, slot_descs=size=4, is_empty_row=0x00007f96db7723ef, eof=0x00007f9433b9a388) at new_json_reader.cpp:502:12
    frame #4: 0x0000556a4b11bcee doris_be`doris::vectorized::NewJsonReader::get_next_block(this=0x00007f9433b9a180, block=0x00007f97c81a8ce8, read_rows=0x00007f96db7724b8, eof=0x00007f97c81a8a30) at new_json_reader.cpp:217:9
    frame #5: 0x0000556a4b60e080 doris_be`doris::vectorized::FileScanner::_get_block_wrapped(this=0x00007f97c81a8000, state=0x00007f953b52ba00, block=0x00007f978f75cfe0, eof=0x00007f96db7730d7) at file_scanner.cpp:465:13
    frame #6: 0x0000556a4b607e20 doris_be`doris::vectorized::FileScanner::_get_block_impl(this=0x00007f97c81a8000, state=0x00007f953b52ba00, block=0x00007f978f75cfe0, eof=0x00007f96db7730d7) at file_scanner.cpp:402:17
    frame #7: 0x0000556a4b6eeaab doris_be`doris::vectorized::Scanner::get_block(this=0x00007f97c81a8000, state=0x00007f953b52ba00, block=0x00007f978f75cfe0, eof=0x00007f96db7730d7) at scanner.cpp:143:17
    frame #8: 0x0000556a4b6ee67e doris_be`doris::vectorized::Scanner::get_block_after_projects(this=0x00007f97c81a8000, state=0x00007f953b52ba00, block=0x00007f978f75cfe0, eos=0x00007f96db7730d7) at scanner.cpp:119:16
    frame #9: 0x0000556a4b6f67a3 doris_be`doris::vectorized::ScannerScheduler::_scanner_scan(ctx=std::__shared_ptr<doris::vectorized::ScannerContext, __gnu_cxx::_S_atomic>::element_type @ 0x00007f953b0f8210, scan_task=std::__shared_ptr<doris::vectorized::ScanTask, __gnu_cxx::_S_atomic>::element_type @ 0x00007f953b542290) at scanner_scheduler.cpp:177:5
    frame #10: 0x0000556a4b6f569d doris_be`doris::vectorized::ScannerScheduler::submit(this=0x00007f96db773620)::$_0::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const at scanner_scheduler.cpp:75:17
    frame #11: 0x0000556a4b6f54b7 doris_be`doris::vectorized::ScannerScheduler::submit(this=0x00007f97cafd0300)::$_0::operator()() const::'lambda'()::operator()() const at scanner_scheduler.cpp:74:27
    frame #12: 0x0000556a4b6f5475 doris_be`bool std::__invoke_impl<bool, doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()() const::'lambda'()&>((null)=__invoke_other @ 0x00007f96db77366f, __f=0x00007f97cafd0300) at invoke.h:61:14
    frame #13: 0x0000556a4b6f5435 doris_be`std::enable_if<is_invocable_r_v<bool, doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()() const::'lambda'()&>, bool>::type std::__invoke_r<bool, doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()() const::'lambda'()&>(__fn=0x00007f97cafd0300) at invoke.h:114:9
    frame #14: 0x0000556a4b6f52ed doris_be`std::_Function_handler<bool (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_0::operator()() const::'lambda'()>::_M_invoke(__functor=0x00007f97c8daab38) at std_function.h:290:9
    frame #15: 0x0000556a43737e1e doris_be`std::function<bool ()>::operator()(this=0x00007f97c8daab38) const at std_function.h:591:9
    frame #16: 0x0000556a4b6f4782 doris_be`doris::vectorized::ScannerSplitRunner::process_for(this=0x00007f97c8daab10, (null)=(__r = 1000000000)) at scanner_scheduler.cpp:414:25
    frame #17: 0x0000556a4b75987c doris_be`doris::vectorized::PrioritizedSplitRunner::process(this=0x00007f97ce996b90) at prioritized_split_runner.cpp:103:35
    frame #18: 0x0000556a4b73fac0 doris_be`doris::vectorized::TimeSharingTaskExecutor::_dispatch_thread(this=0x00007f97651fe810) at time_sharing_task_executor.cpp:566:77
    frame #19: 0x0000556a4b74e5c2 doris_be`void std::__invoke_impl<void, void (doris::vectorized::TimeSharingTaskExecutor::*&)(), doris::vectorized::TimeSharingTaskExecutor*&>((null)=__invoke_memfun_deref @ 0x00007f96db7746cf, __f=0x00007f978f80af80, __t=0x00007f978f80af90) at invoke.h:74:14
    frame #20: 0x0000556a4b74e50d doris_be`std::__invoke_result<void (doris::vectorized::TimeSharingTaskExecutor::*&)(), doris::vectorized::TimeSharingTaskExecutor*&>::type std::__invoke<void (doris::vectorized::TimeSharingTaskExecutor::*&)(), doris::vectorized::TimeSharingTaskExecutor*&>(__fn=0x00007f978f80af80, __args=0x00007f978f80af90) at invoke.h:96:14
    frame #21: 0x0000556a4b74e4dd doris_be`void std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>::__call<void, 0ul>(this=0x00007f978f80af80, __args=0x00007f96db774767, (null)=_Index_tuple<0UL> @ 0x00007f96db77473f) at functional:513:11
    frame #22: 0x0000556a4b74e496 doris_be`void std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>::operator()<void>(this=0x00007f978f80af80) at functional:598:17
    frame #23: 0x0000556a4b74e465 doris_be`void std::__invoke_impl<void, std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>&>((null)=__invoke_other @ 0x00007f96db77478f, __f=0x00007f978f80af80) at invoke.h:61:14
    frame #24: 0x0000556a4b74e425 doris_be`std::enable_if<is_invocable_r_v<void, std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>&>, void>::type std::__invoke_r<void, std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>&>(__fn=0x00007f978f80af80) at invoke.h:111:2
    frame #25: 0x0000556a4b74e28d doris_be`std::_Function_handler<void (), std::_Bind<void (doris::vectorized::TimeSharingTaskExecutor::* (doris::vectorized::TimeSharingTaskExecutor*))()>>::_M_invoke(__functor=0x00007f978f821620) at std_function.h:290:9
    frame #26: 0x0000556a436da65e doris_be`std::function<void ()>::operator()(this=0x00007f978f821620) const at std_function.h:591:9
    frame #27: 0x0000556a4587d962 doris_be`doris::Thread::supervise_thread(arg=0x00007f978f821610) at thread.cpp:460:5
    frame #28: 0x00007f9898d66b7b libc.so.6`___lldb_unnamed_symbol3696 + 667
    frame #29: 0x00007f9898de47b8 libc.so.6`___lldb_unnamed_symbol4129 + 7

json 写入block的过程:

Status NewJsonReader::_simdjson_set_column_value(simdjson::ondemand::object* value, Block& block,const std::vector<SlotDescriptor*>& slot_descs,bool* valid) {

...

RETURN_IF_ERROR(_simdjson_write_data_to_column<false>(
    val, slot_descs[column_index]->type(), column_ptr,
    slot_descs[column_index]->col_name(), _serdes[column_index], valid));

...
}

下面是写入到column

template <typename T> Status DataTypeNumberSerDe<T>::deserialize_one_cell_from_json(IColumn& column, Slice& slice,const FormatOptions& options) const {
...

column_data.insert_value(val);
...

}