【Netflix Hollow系列】Hollow的一些术语

397 阅读4分钟

image.png

前言

本文比较简短,主要是为了在介绍详细的Hollow之前,首先需要明确下在Hollow中频繁使用的一些关键术语,这些术语可能会比较生僻,也可能比较熟悉,但是对于Hollow来讲都是至关重要的。

@空歌白石 原创。

术语

我将 官方文档 中按照字母排序的术语根据所属的不同模块进行了一定的分类汇总,希望能够帮助大家更好的理解Hollow。

以下文档之所以没有翻译成中文,主要是因为每个术语的解释都很简短并且很容易理解,感觉确实没必要进行翻译。当然如果大家有需求的话,我也可以找时间进行下整理。

分类名词解释说明
modeldata modelA data model defines the structure of a dataset. It is specified with a set of schemas.
fieldA single value encoded inside of a Hollow record.
hash keyA user-defined specification of one or more fields used to hash elements into a set or entries into a map.
inlineA field for which the value is encoded directly into a record, as opposed to referenced via another record.
namespace (references)The deliberate creation of a type to hold a specific referenced field's data in order to reduce the cardinality of the referenced records.
primary keyA user-defined specification of one or more fields used to uniquely identify a record within a type.
recordA strongly-typed collection of fields or references, the structure of which is specified by a schema.
referenceA field type which indicates a pointer to another field. Can also refer to the technique of pulling out a specific field into a record type of its own to deliberately allow Hollow to deduplicate the values.
schemaMetadata about a Hollow type which defines the structure of the records.
typeA collection of records all conforming to a specific schema.
datablobA blob is a file used by consumers to update their dataset. A blob will be either a snapshot, delta, or reverse delta
blob storeA blob store is a file store to which blobs can be published by a producer and retrieved by a consumer.
broken delta chainWhen a blob namespace contains a state which is not adjacent to any prior states, the delta chain is said to be broken. In this scenario, consumers may need to load a double snapshot.
deduplicationTwo records which have identical data in Hollow will be consolidated into a single record. Any references to duplicate records will be mapped to the canonical one when a dataset is represented with Hollow.
deltaA set of encoded instructions to transition from one data state to an adjacent state. Deltas are encoded as a set of ordinals to remove and a set of ordinals to add, along with the accompanying data to add. 'Delta' may refer specifically to a transition between an earlier state and a later state, contrasted with 'reverse delta', which specifically refers to a transition between a later state and an earlier state.
delta chainA series of states which are all connected via contiguous deltas.
double snapshotWhen a consumer already has an initialized state and an announcement signals to move to a new state for which a path of deltas is not available, the consumer may transition to that state via a snapshot. In this scenario two full copies of the dataset must be loaded in memory.
namespace (blobs)An addressable, logical separation of both published artifacts in a blob store and announcement location. Used to allow multiple publishers to communicate on separate channels to specific groups of consumers.
patch (states)Creating a series of two deltas between states in a delta chain.
reverse deltaA delta from a later state to an earlier state. Generally used during pinning scenarios.
snapshotA blob type which contains a serialization of all of the records in a type. Consumed during initialization, and possibly in a broken delta chain scenario.
producerproducerA single machine that retrieves all data from a source of truth and produces a delta chain.
cycleA producer runs in an infinite loop. Each exection of the loop is called a cycle. Each cycle produces a single data state.
publishWriting blobs to a blob store.
restoreInitializing a HollowWriteStateEngine with data from a previously produced state so that a delta may be created during a producer's first cycle.
write state engineA HollowWriteStateEngine, the root handle to a Hollow dataset as a consumer.
consumerconsumerOne of many machines on which a dataset is made accessible. Consumers are updated in lock-step based on the actions of the producer.
read state engineA HollowReadStateEngine, the root handle to a Hollow dataset as a consumer.
announceannounceAfter the blobs for a state have been published to a blob store by a producer, the state must be announced to consumers. The announcement signals to consumers that they should transition to the announced state.
pinningOverriding the state version announcement from the producer, to force clients to go back to or stay at an older state.
statedata stateA dataset changes over time. The timeline for a changing dataset can be broken down into discrete data states, each of which is a complete snapshot of the data at a particular point in time.
stateSee data state.
adjacent stateIf state A is connected via a single delta to state B, then A and B are adjacent to each other.
diffA comprehensive accounting for the differences between two data states.
ingestionGathering data from a source of truth and importing it into Hollow.
state versionA unique identifier for a state. Should by monotonically increasing as time passes.
state engineBoth the producer and consumers handle datasets with a state engine. A state engine can be transitioned between data states. A producer uses a write state engine and a consumer uses a read state engine
memoryobject longevityA technique used to ensure that stale references to Hollow Objects always return the same data they did initially upon creation. Configured via the HollowObjectMemoryConfig.
ordinalAn integer value uniquely identifying a record within a type. Because records are represented with a fixed-length number of bits, the only necessary information to locate a record in memory is the record's type and ordinal. Ordinals are automatically assigned by Hollow, and are recycled as records are removed and added. Consequently, they lie in the range of 0-n, where n is generally not much larger than the total number of records for the type.

结束语

想要将Hollow介绍清楚,需要比较大的篇幅,大家可以通过 Netflix Hollow系列专栏 查看全部已完成的文章。

祝好。