[精华帖]分布式学习之路

2024-09-15 62 阅读7分钟

在分布式领域中，各种资料浩如烟海，让人眼花缭乱。今天趁着放假，整理了一下学习分布式的的路线路，主要是参考耗子叔的练级攻略。在整理这些文件链接的时候，心里也渐渐地想发起一个宏愿。想把链接中的所有的文章都看一遍，都翻译一遍，每次学习完，整理输出一次。这么多个链接，估计得花很多时间才能去看完。

1、分布式架构入门

1、Scalable Web Architecture and Distributed Systems 分布式架构如何解系统扩展性问题

2、Scalability, Availability & Stability Patterns 从扩展性、可用性、稳定性方面提供一个大的架构视野

3、System Design Primer 分布式的资源集中地方

2、分布式理论

An introduction to distributed systems 分布式系统的知识图谱

拜占庭将军问题

Byzantine Generals Problem 将军问题

用来解释一致性问题的一个虚构模型

Dr.Dobb’s - The Byzantine Generals Problem

The Byzantine Generals Problem

Practicle Byzantine Fault Tolerance

CAP、FLP 和 DLS理论

FLP impossibility

容错的上限 DLS 论文

8条荒谬的分布式假设

1、Fallacies of Distributed Computing 8条荒谬理论

2、Fallacies of Distributed Computing Explained 为什么这些假设是错的

3、加勒思·威尔逊（Gareth Wilson）的文章通俗的例子解释这些假设是错的

一致性方面的论文

1、CAP Twelve Years Later: How the Rules Have Changed （中译版）CAP存在误导的地方，这篇文章进行了讨论

2、Harvest, Yield, and Scalable Tolerant SystemsHarvest 和 Yield 概念，更为详细的讨论

3、Base: An Acid Alternative （中译版）最终一致性的经典文章，文中讨论了 BASE 与 ACID 原则的基本差异

4、Eventually Consistent 阐述了 NoSQL 数据库的理论基石——最终一致性，对传统的关系型数据库（ACID，Transaction）做了较好的补充

3、经典图书

1、 Distributed Systems for fun and profit 分布式系统背后的核心思想

2、Designing Data Intensive Applications了解大数据架构中的数据分区、数据复制的一些坑，并提供了很好的解决方案

3、Distributed Systems: Principles and Paradigms 中文版《分布式系统原理与范型》（第二版）分布式系统方面的经典教材

4、Scalable Web Architecture and Distributed Systems 中文版可扩展的 Web 架构和分布式系统

5、Principles of Distributed Systems分布式系统架构设计中所需的算法

4、经典论文

4-1 分布式事务

《Transaction Across DataCenter》（YouTube 视频）2009 年的 Google I/O 大会上的演讲

《分布式系统的事务处理》耗子叔的博客

4-2 Paxos 一致性算法

1、以下几篇文章讲解了Paxos的理论知识

Bigtable: A Distributed Storage System for Structured Data

The Chubby lock service for loosely-coupled distributed systems

The Google File System

MapReduce: Simplified Data Processing on Large Clusters

2、实现 Paxos 时遇到的各种问题和解决方案

Paxos Made Live - An Engineering Perspective

3、容易理解Paxos的文章

Neat Algorithms - Paxos

Paxos by Examples

4-3 Raft 一致性算法

1、Raft协议的原始论文

In search of an Understandable Consensus Algorithm (Extended Version) 《Raft 一致性算法论文译文》

2、Raft 算法的动画演示

Raft - The Secret Lives of Data

Raft Consensus Algorithm

Raft Distributed Consensus Algorithm Visualization

4-4 Gossip 一致性算法

Gossip Visualization 容易理解Gossip 一致性算法的动画

Efficient Reconciliation and Flow Control for Anti-Entropy Protocols 原始协议Gossip的论文

Understanding Gossip (Cassandra Internals) NoSQL 数据库 Cassandra 中使用到的数据协议

Dynamo: Amazon’s Highly Available Key Value Store讲述了 Amazon 的 DynamoDB 是如何满足系统的高可用、高扩展和高可靠的

时钟同步问题

Vector Clock问题

4-5 分布式存储和数据

1、Amazon Aurora: Design Considerations for High Throughput Cloud -Native Relation Databases

2、Spanner: Google’s Globally-Distributed Database

3、Spanner, TrueTime & The CAP Theorem

4、F1 - The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business

5、Cassandra: A Decentralized Structured Storage System

6、CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data,

7、RADOS - A Scalable, Reliable Storage Service for Petabyte-scale
Storage Clusters

8、Ceph 的架构文档

4-6 分布式消息系统

1、Kafka: a Distributed Messaging System for Log Processing Kafka基本文章

2、Wormhole: Reliable Pub-Sub to Support Geo-replicated Internet Services Facebook 内部使用的一个 Pub-Sub 系统，可以参考实现

3、All Aboard the Databus! LinkedIn’s Scalable Consistent Change Data Capture Platform（和这篇论文相关的几个链接如下：PDF 论文、 PPT 分享。）

4-7 日志和数据

1、The Log: What every software engineer should know about real-time data’s unifying abstraction（日志：每个软件工程师都应该知道的有关实时数据的统一概念）

2、The Log-Structured Merge-Tree (LSM-Tree)（文章一、文章二）

3、Immutability Changes Everything （相关视频演讲）

4、Tango: Distributed Data Structures over a Shared Log）说明了不可变性（immutability）架构设计的优点

4-8 分布式监控和跟踪

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure Google 的分布式跟踪监控论文

开源实现有三个 Zipkin、Pinpoint 和 HTrace

4-9 数据分析

The Unified Logging Infrastructure for Data Analytics at Twitter

Scaling Big Data Mining Infrastructure: The Twitter Experience

Dremel: Interactive Analysis of Web-Scale Datasets

Resident Distributed Datasets: a Fault-Tolerant Abstraction for In-Memory Cluster Computing

4-10 与编程相关的论文

1、Distributed Programming Model

2、PSync: a partially synchronous language for fault-tolerant distributed algorithms

3、Programming Models for Distributed Computing

4、Logic and Lattices for Distributed Programming

5、Services Engineering Reading List

6、Readings in Distributed Systems

7、Google Research - Distributed Systems and Parallel Computing

5、分布式架构

5-1 分布式架构

Designs, Lessons and Advice from Building Large Distributed Systems分布式架构设计原则

YouTube视频 Building Software Systems At Google and Lessons Learned

The Twelve-Factor App（中译版）构建 SaaS 应用提供了方法论

Notes on Distributed Systems for Young Bloods

On Designing and Deploying Internet-Scale Services（中译版）

4 Things to Keep in Mind When Building a Platform for the Enterprise

Principles of Chaos Engineering

Building Fast & Resilient Web Applications

Design for Resiliency

Design for Self-healing

Design for Scaling Out

Design for Evolution

Eventually Consistent最终一致性

Writing Code that Scales

Automate and Abstract: Lessons from Facebook on Engineering for Scale 软件自动化和软件抽象

5-2 设计模式

Cloud Design Patterns

设计模式：可用性

设计模式：数据管理

设计模式：设计和实现

设计模式：消息

设计模式：管理和监控

设计模式：性能和扩展

设计模式：系统弹力

设计模式：安全

AWS Cloud PatternAWS 云平台的一些设计模式

Design patterns for container-based distributed systems容器化下的分布式架构的设计模式

PPT Patterns for distributed systems 分布式系统的架构模式

A Pattern Language for Micro-Services 微服务架构

SOA Patterns SOA架构

5-3 分布式系统的故障测试

FIT: Failure Injection Testing

Automated Failure Testing

Automating Failure Testing Research at Internet Scale

5-4 弹性伸缩

4 Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO

Scaling Stateful Objects

Scale Up vs Scale Out: Hidden Costs

Best Practices for Scaling Out

Scalability Worst Practices

Reddit: Lessons Learned From Mistakes Made Scaling To 1 Billion Pageviews A Month

Autoscaling Pinterest

Square: Autoscaling Based on Request Queuing

PayPal: Autoscaling Applications

Trivago: Your Definite Guide For Autoscaling Jenkins

Scryer: Netflix’s Predictive Auto Scaling Engine

5-5 一致性哈希

Consistent Hashing

Consistent Hashing: Algorithmic Tradeoffs

Distributing Content to Open Connect

Consistent Hashing in Cassandra

5-6 数据库分布式

Life Beyond Distributed Transactions

How Sharding Works探讨数据 Sharding 的文章

Why you don’t want to shard

How to Scale Big Data Applications

MySQL Sharding with ProxySQL

5-7 缓存

缓存更新的套路

Design Of A Modern Cache

Netflix: Caching for a Global Netflix

Facebook: An analysis of Facebook photo caching

How trivago Reduced Memcached Memory Usage by 50%

Caching Internal Service Calls at Yelp

5-8 消息队列

Understanding When to use RabbitMQ or Apache Kafka

Trello: Why We Chose Kafka For The Trello Socket Architecture

LinkedIn: Running Kafka At Scale

Should You Put Several Event Types in the Same Kafka Topic?

Billions of Messages a Day - Yelp’s Real-time Data Pipeline

Uber: Building Reliable Reprocessing and Dead Letter Queues with Kafka

Uber: Introducing Chaperone: How Uber Engineering Audits Kafka End-to-End

Publishing with Apache Kafka at The New York Times

Kafka Streams on Heroku

Salesforce: How Apache Kafka Inspired Our Platform Events Architecture Exactly-once Semantics are Possible: Here’s How Kafka Does it

Delivering billions of messages exactly once

Benchmarking Streaming Computation Engines at Yahoo!

5-9 关于日志方面

Using Logs to Build a Solid Data Infrastructure - Martin Kleppmann

Building DistributedLog: High-performance replicated log service

distributedlog.io

Twitter 高性能分布式日志系统架构解析

LogDevice: a distributed data store for logs

5-10 关于性能方面

Understand Latency

Common Bottlenecks Performance is a Feature

Make Performance Part of Your Workflow

CloudFlare: How we built rate limiting capable of scaling to millions of domains

5-11 关于搜索方面

Instagram: Search Architecture

eBay: The Architecture of eBay Search

eBay: Improving Search Engine Efficiency by over 25%

LinkedIn: Introducing LinkedIn’s new search architecture

LinkedIn: Search Federation Architecture at LinkedIn

Slack: Search at Slack

DoorDash: Search and Recommendations at DoorDash

Twitter: Search Service at Twitter (2014)

Pinterest: Manas: High Performing Customized Search System

Sherlock: Near Real Time Search Indexing at Flipkart

Airbnb: Nebula: Storage Platform to Build Search Backends

6、各公司的架构实践 High Scalability

YouTube Architecture

Scaling Pinterest

Google Architecture

Scaling Twitter

The WhatsApp Architecture

Flickr Architecture

Amazon Architecture

Stack Overflow Architecture

Pinterest Architecture

Tumblr Architecture

Instagram Architecture

TripAdvisor Architecture

Scaling Mailbox

Salesforce Architecture

ESPN Architecture

Uber Architecture

Splunk Architecture