ETL工具比较

1,251 阅读1分钟
 LuigiAirflowPinball
repogithub.com/spotify/lui…github.com/airbnb/airf…github.com/pinterest/p…
docsluigi.readthedocs.orgairflow.readthedocs.orgnone
reviewbytepawn.com/luigi.htmlbytepawn.com/airflow.htm…bytepawn.com/pinball.htm…
github forks75034558
github stars40291798506
github watchers31916647
commits in last 30 dayslots of commitslots of commits3 commits
architecture   
web dashboardnot really, minimalvery niceyes
code/dslcodecodepython dict + python code
files/datasetsyes, targetsnot really, as special tasks?
calendar schedulingno, use cronyes, LocalScheduleryes
datadoc'able [1]maybe, doesn't really fitprobably, by conventionyes, dicts would be easy to parse
backfill jobsyesyes?
persists statekindofyes, to dbyes, to db
tracks historyyesyes, in dbyes, in db
code shippingnoyes, pickleworkflow is shipped using pickle, jobs are not?
prioritiesyesyes?
parallelismyes, workers, threads per workersyes, workers?
control parallelismyes, resourcesyes, pools?
cross-dag depsyes, using targetsyes, using sensorsyes
finds new deployed tasksnoyes?
executes dagno, have to create special sink taskyesyes
multiple dagsno, just oneyes, also several dag instances (dagruns)yes
scheduler/workers   
starting workersusers start worker proccesesscheduler spawns workers processesusers start worker procceses
commsscheduler's HTTP APIminimal, in state dbthrough master module using Swift
workers executeworker can execute tasks that is has locallyworker reads pickled tasks from dbworker can execute tasks that is has locally?
contrib   
hadoopyesyesyes
pigyesdoc mentions PigOperator, it's not in the sourceno
hiveyesyesyes
pgsqlyesyesno
mysqlyesyesno
redshiftyesnono
s3yesyesyes
source   
written inpythonpythonpython
loc18,00021,00018,000
testslotsminimallots
maturityfairlowlow
other serious usersyesnot reallyno
pip installyesyesbroken
niceties-sla, xcom, variables, trigger rules, celery, chartspass data between jobs
does it for you   
sync tasks to workersnoyesno
schedulingnoyesyes
monitoringnonono
alertingnoslas, but probably not enoughsends emails
dashboardsnoyesyes