存储对比
单机存储
- 文件系统
- KV存储
- 不支持海量文件
单机数据库
- 半/结构化数据
- 不支持海量
分布式数据库
- 半/结构化数据
- 数据和数据之间有关联
分布式存储
- 对象存储
VS HDFS
接口
HDFS:
- 伪[[Posix]]文件接口 Portable Operating System Interface
- Via TCP
- Data stored as blocks
- Mkdirs, append, create,
Object-Based Storage Device:
- Via Restful API
- Data stores as objects with its own id, metadata
存储
HDFS: Name Node
-
HDFS 文件数量受 name node 限制
The NameNode stores the metadata information of the file system, including the directory structure, file permissions, and file-to-block mapping.
Object Storage: Bucket/Object
-
Bucket is a container storing object.
-
Object contains:
- Key
- Data
- MetaData
Usage
RESTful
MultiUpload
For large file, divide file into several parts and uploadPart:
优化
Partition
A cluster of instances of storage system
分布式+单机存储
There are two types of Partition Logic:
Hash
Evenly distributes data across partitions using hash function
Pros:
- Balanced workloads
- Simple and straightforward
Cons:
- Increased data traffic
- Uneven Data Sizes
Range
Dividing data into partitions based on ranges or intervals of a data attribute.
Pros:
- Data within a specific range stores together, so retrieval is faster
- Efficient
Cons:
- Uneven distribution
- Limited scalability
Replication/Erasure Coding
See SD Notes [[Distributed Key-Value Store]]
冷热切换
最终架构
- API: 介入
- Distributed Storage Pool: 三副本/EC
- GC:垃圾回收后台
- Lifecycle:冷热切换