2.3 Extracting Information from Data

7 阅读1分钟

1. Exam Points

  • Identify the data or data sets needed to get the desired information.
  • Identify problems or challenges with data processing for a given scenario.
  • What information can be extracted from given data sets.

2. Knowledge Points

(1) Data and Metadata (数据和元数据)

  • Metadata are data about data, providing additional information about data.
    • Data: e.g. image itself. (数据本身)
    • Metadata: e.g. date of creation,file size, file name, etc. (数据的数据)
  • Metadata are used for finding, organizing, and managing information.
  • Changes and deletions made to metadata do not change the primary data.

(2) Process Data

  • Information is the collection of facts and patterns (trend) extracted from data.
  • So we process data to extract information.
  • Common steps of processing Data:
    • Combining (数据合并): combine data from different sources.
    • Cleaning (数据清洗): remove corrupt data, incomplete data, make data uniform. (处理无效值、缺失值等,使数据一致)
      • Example: remove invalid ages, change cn to China to use China for all the places.
    • Filtering (数据筛选): identify and extract useful subsets.
      • Example: filter records of females.
    • Classifying (数据分类): group data based on common features.
      • Example: group data based on categories.
    • Pattens (发现模式): identify patterns (trends) in data.
      • Example: find weather trend for prediction.
  • Scalability of systems is an important consideration when processing data.
  • Scalability is the ability to increase the capacity of a resource without having to go to a completely new solution.

3. Exercises