Advanced Data Analytics (1) - Phases,Types,Scales of Measurement and Operations
Outline:
- Data Analysis Phases
- Different types of data
- Scales of Measurement
- Operations in Different Scales
1. Data Analysis Phases
Preparation → Preprocessing → Analysis → Postprocessing
Preparation: assess and select data
- Planning
- Data collection
- Feature generation
- Data selection
Preprocessing: clean and filter the data
- Cleaning
- Filtering
- Completion
- Correction
- Standardization
- Transformation
Analysis: visualize and analyze the data
- Visualization
- Correlation
- Regression
- Forecasting
- Classification
- Clustering
Postprocessing: the analysis results are interpreted and evaluated
- Interpretation
- Documentation
- Evaluation
2. Different types of data
Data is a collection of measurement or observations. In general, we have two different types of data: qualitative and quantitate.
Quantitative (numerical) data
Quantitative, or numerical, data can be broken down into two types:
- Discrete data: counting, e.g., the number of pets one person have
- Continuous data: measuring, e.g., the length of the pet
Qualitative (categorical) data
describes the qualities of data points and is non-numerical
3. Scales of Measurement
| Scales of Measurement | Destrciption |
|---|---|
| Nominal | Categories (no ordering or direction) |
| Ordinal | Ordered Categories (rankings, order, or scaling) |
| Interval | Differences between measurements but no true zero |
| Ratio | Differences between measurements , true zero exists |
Nominal Scale
-
Qualitative/Categorical
-
Order does not matter
-
Example: Names, Colors, Labels, Gender, etc
- For example, we surveyed 10 individuals, 5 says red is the favourite colour, 3 chose Blue, and 2 chose Green. One way to use it is to calculate frequency distribution • Another way to use the data is to use mode: defines as the value that occurs most frequently; In our example, the mode of the survey is Red
Ordinal Scale
-
Ranking/Placement
-
The order matters
-
Differences can not be measured
-
Example: 1st, 2nd, 3rd award in the Olympics
- For example, 1st place finished race around 4m53s to finish, 2nd place finished race around 4m56s, and 3rd place finished race around 6m • If we have 10 respondents on a question “Climate change is England’s most serious environmental problem” • 1=strongly agree, 2 = agree, 3 = unsure, 4 = disagree, 5 = strongly disagree • Assume we get 1 person said 1, 2 persons said 2, 3 persons said 3, 4 persons said 4, and the remaining 90 said 5 • Median is 5, which means at least 50% of the respondents strongly disagree the question (central tendency)
Interval Scale
-
The order matters
-
Difference can be measured
-
No True “0” starting point
-
Example, time, temperature
- Differences can be measured • Temperate: 10 °C, 20 °C, 30 °C • The difference is 10 °C • However, ratios can not be measured, we can not say 30 °C is three times hotter than 10 °C
Mean, Median, Mode, Standard Deviation
- 21, 23, 42, 56, 65, 77, 189, 21, 32 • The mean is the average of a data set • The median is the middle of a data set • The mode is the most common number in a dataset • Standard deviation is a measure of the amount of variation of a set of values • A low standard deviation: values tend to be close to the mean • A high standard deviation: values are spread out over a wider range
Ratio Scale
-
The order matters
-
Difference are measurable
-
Contains a “0” starting point
-
Example: weight, height
- Contains a true “0” • Weight, height • Example: Grades: 70, 30, 56, 82, 90 • Order matters: 30, 56, 70, 82, 90 • Difference can be measured • 56 – 30 = 26 • 70- 56 = 14 • Measure the ratio • 90/30 = 3: means the highest score is three times higher than the lowest score • True “0” starting point: student did not answer any question or did not answer anything right
4.Operations in Different Scales
| Operation | Nominal | Ordinal | Interval | Ratio |
|---|---|---|---|---|
| Frequency distribution | Y | Y | Y | Y |
| Mode | Y | Y | Y | Y |
| Median | N | Y | Y | Y |
| Addition and subtraction | N | N | Y | Y |
| Mean, standard deviation | N | N | Y | Y |
| Multiplication and division | N | N | N | Y |
| Ratios, coefficient of variation | N | N | N | Y |
| Geometric mean | N | N | N | Y |
Last updated date: 2023/11/10