Advanced Data Analytics (1) - Phases,Types,Scales of Measurement and Operations

118 阅读2分钟

Advanced Data Analytics (1) - Phases,Types,Scales of Measurement and Operations

Outline:

  • Data Analysis Phases
  • Different types of data
  • Scales of Measurement
  • Operations in Different Scales

1. Data Analysis Phases

Preparation → Preprocessing → Analysis → Postprocessing

Preparation: assess and select data

  • Planning
  • Data collection
  • Feature generation
  • Data selection

Preprocessing: clean and filter the data

  • Cleaning
  • Filtering
  • Completion
  • Correction
  • Standardization
  • Transformation

Analysis: visualize and analyze the data

  • Visualization
  • Correlation
  • Regression
  • Forecasting
  • Classification
  • Clustering

Postprocessing: the analysis results are interpreted and evaluated

  • Interpretation
  • Documentation
  • Evaluation

2. Different types of data

Data is a collection of measurement or observations. In general, we have two different types of data: qualitative and quantitate.

Quantitative (numerical) data

Quantitative, or numerical, data can be broken down into two types:

  • Discrete data: counting, e.g., the number of pets one person have
  • Continuous data: measuring, e.g., the length of the pet

Qualitative (categorical) data

describes the qualities of data points and is non-numerical

3. Scales of Measurement

Scales of MeasurementDestrciption
NominalCategories (no ordering or direction)
OrdinalOrdered Categories (rankings, order, or scaling)
IntervalDifferences between measurements but no true zero
RatioDifferences between measurements , true zero exists

Nominal Scale

  • Qualitative/Categorical

  • Order does not matter

  • Example: Names, Colors, Labels, Gender, etc

    • For example, we surveyed 10 individuals, 5 says red is the favourite colour, 3 chose Blue, and 2 chose Green. One way to use it is to calculate frequency distribution • Another way to use the data is to use mode: defines as the value that occurs most frequently; In our example, the mode of the survey is Red

Ordinal Scale

  • Ranking/Placement

  • The order matters

  • Differences can not be measured

  • Example: 1st, 2nd, 3rd award in the Olympics

    • For example, 1st place finished race around 4m53s to finish, 2nd place finished race around 4m56s, and 3rd place finished race around 6m • If we have 10 respondents on a question “Climate change is England’s most serious environmental problem” • 1=strongly agree, 2 = agree, 3 = unsure, 4 = disagree, 5 = strongly disagree • Assume we get 1 person said 1, 2 persons said 2, 3 persons said 3, 4 persons said 4, and the remaining 90 said 5 • Median is 5, which means at least 50% of the respondents strongly disagree the question (central tendency)

Interval Scale

  • The order matters

  • Difference can be measured

  • No True “0” starting point

  • Example, time, temperature

    • Differences can be measured • Temperate: 10 °C, 20 °C, 30 °C • The difference is 10 °C • However, ratios can not be measured, we can not say 30 °C is three times hotter than 10 °C

Mean, Median, Mode, Standard Deviation

  • 21, 23, 42, 56, 65, 77, 189, 21, 32 • The mean is the average of a data set • The median is the middle of a data set • The mode is the most common number in a dataset • Standard deviation is a measure of the amount of variation of a set of values • A low standard deviation: values tend to be close to the mean • A high standard deviation: values are spread out over a wider range

Ratio Scale

  • The order matters

  • Difference are measurable

  • Contains a “0” starting point

  • Example: weight, height

    • Contains a true “0” • Weight, height • Example: Grades: 70, 30, 56, 82, 90 • Order matters: 30, 56, 70, 82, 90 • Difference can be measured • 56 – 30 = 26 • 70- 56 = 14 • Measure the ratio • 90/30 = 3: means the highest score is three times higher than the lowest score • True “0” starting point: student did not answer any question or did not answer anything right

4.Operations in Different Scales

OperationNominalOrdinalIntervalRatio
Frequency distributionYYYY
ModeYYYY
MedianNYYY
Addition and subtractionNNYY
Mean, standard deviationNNYY
Multiplication and divisionNNNY
Ratios, coefficient of variationNNNY
Geometric meanNNNY

Last updated date: 2023/11/10