Mathmatics/Probability and Statistics

Measures of Central Value

EveryDayIsNewDay 2023. 9. 7. 09:09

Finding a Central Value

1. 2 Numbers

  • With just 2 numbers the answer is easy: go half-way between.
What is the central value for 2 and 8? : Half-way between 2 and 8, which is 5

Add 2 numbers and dividing the result by 2:
$$ \frac{(2 + 8)}{2} = 5 $$

 

2. 3 or More Numbers

  • We can use that idea of "adding then dividing" when we have 3 or more numbers:
What is the central value of 1, 4 and 10?
Adding 1, 4 and 10 and then dividing the results by 3 (because there are 3 numbers):
$$ \frac{(1 + 4 + 10)}{3} = \frac{15}{3} = 5 $$
Let's generalize a little more, What is the central value of A1, A2, … , An
$$ \frac{A_{1} + A_{2} + \cdots + A_{n}}{n} = \frac {\sum_{i=1}^{n}A_{i}}{n} $$
$$ Don't\,worry,\;You\,will\,learn\, \sum \,\,later\; :) $$

 

3. The Mean(or Average)

  • Mean means add up the numbers and divide by how many numbers
  • Why Calculate the Mean(average)
    • Tendency: The mean is used to describe the central tendency of a data set, which means it provides a single representative value that summarizes the data. It gives you a rough idea of where the "center" of the data lies. For example, if you have a set of test scores, the mean score tells you the average performance of the group.
    • Data Summary: The mean is a concise way to summarize a large amount of data into a single number. This can make it easier to compare different data sets or understand the overall characteristics of a data distribution.
    • Comparison: When comparing different groups or populations, the mean can help identify differences or trends. For instance, if you want to compare the average income of people in different countries, calculating the mean income for each country can provide a basis for comparison.

 

  • However, it's important to note that the mean has limitations. It can be heavily influenced by outliers (extreme values), and it may not accurately represent the data's central tendency if the distribution is highly skewed.. What would happen if the average height of the platoon members was 160cm and they had to cross a river with a depth of 150cm? Obviously some people will be submerged.
In my class, there are 3 friends who are 150cm tall, 2 friends who are 140cm tall, and 2 friends who are 160cm tall. What is the average?
$$ \frac {150+150+150+140+140+160+160} {3+2+2} = \frac {1050} {7} = 150$$
The Weighted mean(or Weighted Average)

ᆞA statistical measure that takes into account the importance or weight of each data point when calculating the mean
ᆞSome data points may have more significance or relevance than others, so they are assigned different weights.
ᆞTo calculate a weighted mean, you multiply each data point by its corresponding weight, sum up these products, and then divide by the sum of the weights. This formula allows you to give more importance to certain data points while calculating the mean, making it a useful tool in situations where not all data points are equally significant.
$$ Weighted \, Mean = \frac {\sum_{i=1}^{n}(W_i * X_i)} {\sum_{i=1}^{n}W_i} $$

 

4. The Median

  • Median is the middle of a sorted listof numbers.
  • Why we use the Median value
    • Robustness to Outliers: The median is less sensitive to extreme outliers or unusual data points compared to the mean. When you have a dataset with extreme values that could skew the results, the median provides a more robust estimate of the central tendency. This makes it particularly useful in situations where data may be subject to significant outliers or errors.
    • Ordinal Data: The median is well-suited for ordinal data, which is a type of data that has a natural order but lacks meaningful intervals between values. For example, when analyzing survey responses with options like "strongly disagree," "disagree," "neutral," "agree," and "strongly agree," the median can help identify the central or typical response.
    • Skewed Distributions: In cases where the data distribution is skewed (i.e., it is not symmetric), the median often provides a better representation of the center of the data compared to the mean. For instance, in a positively skewed distribution (where there are a few high values), the mean can be significantly influenced by these high values, while the median remains closer to the bulk of the data.
    • Handling Non-Numeric Data: The median can be applied to non-numeric data, such as categories or groups. For example, you can find the median of a list of cities by population size, even though population sizes are not numeric in the strict sense.

 

  • The median value depends on the nature of the data and the specific goals of the analysis. While the mean is also a valuable measure of central tendency, the median is preferred in situations where robustness to outliers, skewness, or ordinal data is important.
  • While the median has its advantages, it's important to note that it may not always be the best measure of central tendency, depending on the specific research question and the characteristics of the data. In many cases, using a combination of the median and other descriptive statistics, such as the mean and standard deviation, can provide a more comprehensive understanding of the data.
Example 1: Find the Median of 15, 9, 3
   ᆞPut them in order: 3, 9, 15
   ᆞThe middle number is 9, so the median is 9

Example 2: Find the Median of 15, 9, 3, 17
   ᆞPut them in order: 3, 9, 15, 17
   ᆞThe middle numbers are 9 and 15
   ᆞTo find the value halfway between them:
$$ 9 + 15 = 24 $$
$$ 24 / 2 = 12 $$
   ᆞThe median is 12

 

5. The Mode

  • The Mode is the  most frequently occurring value(s) in a dataset
  • Why we use the Mode value
    • Identifying Common Values: The mode helps identify the values or categories that occur most frequently in a dataset. This is valuable in various fields such as market research, sociology, and education, where understanding what is most common or popular can be crucial.
    • Categorical Data: The mode is particularly useful for categorical or nominal data, where you have distinct categories rather than numerical values. For example, in a survey asking about people's favorite colors, the mode would indicate the most popular color choice.
    • Multimodal Distributions: In datasets with multiple modes (more than one value occurring with the same highest frequency), the mode can help reveal interesting patterns in the data. For instance, in a bimodal distribution of test scores, it's important to know that there are two common performance levels.
    • Descriptive Statistics: Including the mode as a measure of central tendency alongside the mean and median can provide a more complete picture of the data's distribution. This is especially true when working with non-normal or skewed distributions.
    • Decision-Making: In some decision-making contexts, knowing the mode can be helpful. For example, in inventory management, knowing the most frequently sold product can guide restocking decisions.

 

  • It's important to note that not all datasets have a mode, and some datasets may have multiple modes (multimodal) or no mode at all (uniform distribution). In cases where the mode is not meaningful or informative, other measures of central tendency like the mean or median should be used instead. The choice of which measure to use depends on the nature of the data and the specific goals of the analysis.
Example 1: Find the Mode:
$$ 1, 1, 1, 2, 3, 5, 7, 7, 7, 7 $$
   ᆞ"1" occurs 3 times, "2", "3", and "5" occurs only 1 time, "7" occurs 4 times, so the mode is 7

Example 2: Find the Mode:
$$ 1, 1, 1, 2, 3, 5, 7, 7, 7 $$
   ᆞ"1" occurs 3 times but "7" also occurs 3 times. so both 1 and 7 are modes
   ᆞWhen there are two modes it is called "bimodal", when there are three or more modes we call it "multimodal".

 

5. Outliers

  • Outliers are values that "lie outside" the other values.
  • It can affect the mean a lot, so we can either not use them or use the median or mode instead. This is the same principle as calculating the average by excluding the highest and lowest values in a game where multiple judges score scores (such as figure skating).
Example: 4, 5, 5, 6, 100
   ᆞMean :
$$ Mean \; : \; \frac {4 + 5 + 5 + 6 + 100} {5} = 24 $$
   ᆞWithout 100 the mean is:
$$ Mean \; : \; \frac {4 + 5 + 5 + 6} {4} = 5 $$
   ᆞMedian: 5
$$ 4, 5, \textcolor{red}{5}, 6, 100 $$
   ᆞMode: 5
$$ 4, \textcolor{red}{5, 5}, 6, 100 $$

 

6. Other Means

  • Geometirc Mean: multiplies the numbers together, then does a square root or cube root etc depending on how many numbers
  • Harmonic Mean: Add up "1 divided by number" then flips it

$$ Harmonic \, Mean = \frac {n} { \frac {1} {X_{1}} + \frac {1} {X_{2}} + \cdots + \frac {1} {X_{n}}} $$

  • etc...