在分布式领域中,各种资料浩如烟海,让人眼花缭乱。今天趁着放假,整理了一下学习分布式的的路线路,主要是参考耗子叔的练级攻略。在整理这些文件链接的时候,心里也渐渐地想发起一个宏愿。想把链接中的所有的文章都看一遍,都翻译一遍,每次学习完,整理输出一次。这么多个链接,估计得花很多时间才能去看完。
1、分布式架构入门
1、Scalable Web Architecture and Distributed Systems 分布式架构如何解系统扩展性问题
2、Scalability, Availability & Stability Patterns 从扩展性、可用性、稳定性方面提供一个大的架构视野
3、System Design Primer 分布式的资源集中地方
2、分布式理论
An introduction to distributed systems 分布式系统的知识图谱
拜占庭将军问题
Byzantine Generals Problem 将军问题
Dr.Dobb’s - The Byzantine Generals Problem
The Byzantine Generals Problem
Practicle Byzantine Fault Tolerance
CAP、FLP 和 DLS理论
8条荒谬的分布式假设
1、Fallacies of Distributed Computing 8条荒谬理论
2、Fallacies of Distributed Computing Explained 为什么这些假设是错的
3、加勒思·威尔逊(Gareth Wilson)的文章 通俗的例子解释这些假设是错的
一致性方面的论文
1、CAP Twelve Years Later: How the Rules Have Changed (中译版)CAP存在误导的地方,这篇文章进行了讨论
2、Harvest, Yield, and Scalable Tolerant SystemsHarvest 和 Yield 概念,更为详细的讨论
3、Base: An Acid Alternative (中译版)最终一致性的经典文章,文中讨论了 BASE 与 ACID 原则的基本差异
4、Eventually Consistent 阐述了 NoSQL 数据库的理论基石——最终一致性,对传统的关系型数据库(ACID,Transaction)做了较好的补充
3、经典图书
1、 Distributed Systems for fun and profit 分布式系统背后的核心思想
2、Designing Data Intensive Applications了解大数据架构中的数据分区、数据复制的一些坑,并提供了很好的解决方案
3、Distributed Systems: Principles and Paradigms 中文版《分布式系统原理与范型》(第二版)分布式系统方面的经典教材
4、Scalable Web Architecture and Distributed Systems 中文版 可扩展的 Web 架构和分布式系统
5、Principles of Distributed Systems分布式系统架构设计中所需的算法
4、经典论文
4-1 分布式事务
《Transaction Across DataCenter》(YouTube 视频)2009 年的 Google I/O 大会上的演讲
《分布式系统的事务处理》 耗子叔的博客
4-2 Paxos 一致性算法
1、以下几篇文章讲解了Paxos的理论知识
Bigtable: A Distributed Storage System for Structured Data
The Chubby lock service for loosely-coupled distributed systems
MapReduce: Simplified Data Processing on Large Clusters
2、实现 Paxos 时遇到的各种问题和解决方案
Paxos Made Live - An Engineering Perspective
3、容易理解Paxos的文章
4-3 Raft 一致性算法
1、Raft协议的原始论文
In search of an Understandable Consensus Algorithm (Extended Version) 《Raft 一致性算法论文译文》
2、Raft 算法的动画演示
Raft - The Secret Lives of Data
Raft Distributed Consensus Algorithm Visualization
4-4 Gossip 一致性算法
Gossip Visualization 容易理解Gossip 一致性算法的动画
Efficient Reconciliation and Flow Control for Anti-Entropy Protocols 原始协议Gossip的论文
Understanding Gossip (Cassandra Internals) NoSQL 数据库 Cassandra 中使用到的数据协议
Dynamo: Amazon’s Highly Available Key Value Store讲述了 Amazon 的 DynamoDB 是如何满足系统的高可用、高扩展和高可靠的
时钟同步问题
Vector Clock问题
4-5 分布式存储和数据
1、Amazon Aurora: Design Considerations for High Throughput Cloud -Native Relation Databases
2、Spanner: Google’s Globally-Distributed Database
3、Spanner, TrueTime & The CAP Theorem
4、F1 - The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business
5、Cassandra: A Decentralized Structured Storage System
6、CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data,
7、RADOS - A Scalable, Reliable Storage Service for Petabyte-scale
Storage Clusters
4-6 分布式消息系统
1、Kafka: a Distributed Messaging System for Log Processing Kafka基本文章
2、Wormhole: Reliable Pub-Sub to Support Geo-replicated Internet Services Facebook 内部使用的一个 Pub-Sub 系统,可以参考实现
3、All Aboard the Databus! LinkedIn’s Scalable Consistent Change Data Capture Platform(和这篇论文相关的几个链接如下:PDF 论文 、 PPT 分享。)
4-7 日志和数据
1、The Log: What every software engineer should know about real-time data’s unifying abstraction(日志:每个软件工程师都应该知道的有关实时数据的统一概念)
2、The Log-Structured Merge-Tree (LSM-Tree)(文章一、文章二)
3、Immutability Changes Everything (相关视频演讲)
4、Tango: Distributed Data Structures over a Shared Log)说明了不可变性(immutability)架构设计的优点
4-8 分布式监控和跟踪
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure Google 的分布式跟踪监控论文
开源实现有三个 Zipkin、Pinpoint 和 HTrace
4-9 数据分析
The Unified Logging Infrastructure for Data Analytics at Twitter
Scaling Big Data Mining Infrastructure: The Twitter Experience
Dremel: Interactive Analysis of Web-Scale Datasets
Resident Distributed Datasets: a Fault-Tolerant Abstraction for In-Memory Cluster Computing
4-10 与编程相关的论文
1、Distributed Programming Model
2、PSync: a partially synchronous language for fault-tolerant distributed algorithms
3、Programming Models for Distributed Computing
4、Logic and Lattices for Distributed Programming
5、Services Engineering Reading List
6、Readings in Distributed Systems
7、Google Research - Distributed Systems and Parallel Computing
5、分布式架构
5-1 分布式架构
Designs, Lessons and Advice from Building Large Distributed Systems分布式架构设计原则
YouTube视频 Building Software Systems At Google and Lessons Learned
The Twelve-Factor App(中译版)构建 SaaS 应用提供了方法论
Notes on Distributed Systems for Young Bloods
On Designing and Deploying Internet-Scale Services(中译版)
4 Things to Keep in Mind When Building a Platform for the Enterprise
Principles of Chaos Engineering
Building Fast & Resilient Web Applications
Automate and Abstract: Lessons from Facebook on Engineering for Scale 软件自动化和软件抽象
5-2 设计模式
AWS Cloud PatternAWS 云平台的一些设计模式
Design patterns for container-based distributed systems容器化下的分布式架构的设计模式
PPT Patterns for distributed systems 分布式系统的架构模式
A Pattern Language for Micro-Services 微服务架构
SOA Patterns SOA架构
5-3 分布式系统的故障测试
FIT: Failure Injection Testing
Automating Failure Testing Research at Internet Scale
5-4 弹性伸缩
4 Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO
Scale Up vs Scale Out: Hidden Costs
Best Practices for Scaling Out
Reddit: Lessons Learned From Mistakes Made Scaling To 1 Billion Pageviews A Month
Square: Autoscaling Based on Request Queuing
PayPal: Autoscaling Applications
Trivago: Your Definite Guide For Autoscaling Jenkins
Scryer: Netflix’s Predictive Auto Scaling Engine
5-5 一致性哈希
Consistent Hashing: Algorithmic Tradeoffs
Distributing Content to Open Connect
Consistent Hashing in Cassandra
5-6 数据库分布式
Life Beyond Distributed Transactions
How Sharding Works探讨数据 Sharding 的文章
How to Scale Big Data Applications
5-7 缓存
Netflix: Caching for a Global Netflix
Facebook: An analysis of Facebook photo caching
How trivago Reduced Memcached Memory Usage by 50%
Caching Internal Service Calls at Yelp
5-8 消息队列
Understanding When to use RabbitMQ or Apache Kafka
Trello: Why We Chose Kafka For The Trello Socket Architecture
LinkedIn: Running Kafka At Scale
Should You Put Several Event Types in the Same Kafka Topic?
Billions of Messages a Day - Yelp’s Real-time Data Pipeline
Uber: Building Reliable Reprocessing and Dead Letter Queues with Kafka
Uber: Introducing Chaperone: How Uber Engineering Audits Kafka End-to-End
Publishing with Apache Kafka at The New York Times
Salesforce: How Apache Kafka Inspired Our Platform Events Architecture Exactly-once Semantics are Possible: Here’s How Kafka Does it
Delivering billions of messages exactly once
Benchmarking Streaming Computation Engines at Yahoo!
5-9 关于日志方面
Using Logs to Build a Solid Data Infrastructure - Martin Kleppmann
Building DistributedLog: High-performance replicated log service
LogDevice: a distributed data store for logs
5-10 关于性能方面
Common Bottlenecks Performance is a Feature
Make Performance Part of Your Workflow
CloudFlare: How we built rate limiting capable of scaling to millions of domains
5-11 关于搜索方面
Instagram: Search Architecture
eBay: The Architecture of eBay Search
eBay: Improving Search Engine Efficiency by over 25%
LinkedIn: Introducing LinkedIn’s new search architecture
LinkedIn: Search Federation Architecture at LinkedIn
DoorDash: Search and Recommendations at DoorDash
Twitter: Search Service at Twitter (2014)
Pinterest: Manas: High Performing Customized Search System
Sherlock: Near Real Time Search Indexing at Flipkart
Airbnb: Nebula: Storage Platform to Build Search Backends