Saturday, September 14, 2019

Descriptive Statistics

Descriptive Statistics:         

Central Tendency (or Groups’ “Middle Values”)

       Mean, Median, Mode

Variation (or Summary of Differences Within Groups)

     Range, Interquartile Range, Variance, Standard Deviation

Mean:

  • Most commonly called the “average.”
  • Add up the values for each case and divide by the total number of cases.
  • Means can be badly affected by outliers (data points with extreme values unlike the rest)
  • Outliers can make the mean a bad measure of central tendency or common experience

Y-bar  =    (Y1 + Y2 + . . . + Yn) / n  OR   Y-bar  =   Σ Yi/ n

Example:

Mean of below numbers              

102,115,128,109,131,89,98,106  ,140,119,93,97,110

Σ Yi = 102+115+128+109+131+89+98+106+140+119+93+97+110=1437

Y-barA  =   Σ Yi = 1437/13 = 110.54     

Median:

  • The middle value when a variable’s values are ranked in order; the point that divides a distribution into two equal halves.
  • When data are listed in order, the median is the point at which 50% of the cases are above and 50% below it.
  • The 50th
  • The median is unaffected by outliers, making it a better measure of central tendency, better describing the “typical person” than the mean when data are skewed.
  • If the recorded values for a variable form a symmetric distribution, the median and mean are identical.
  • In skewed data, the mean lies further toward the skew than the median.

Example:

Consider below numbers:            

89,93,97,98,102,106,109,110,115,119,128,131,140          

Mode:

  • The most common data point is called the mode.
  • It is possible to have more than one mode.
  • It may give you the most likely experience rather than the “typical” or “central” experience.
  • In symmetric distributions, the mean, median, and mode are the same.
  • In skewed data, the mean and median lie further toward the skew than the mode.

Consider below numbers:

80 87 89 93 93 96 97 98 102 103 105 106 109 109 109 110 111 115 119 120 127 128 131 131 140 162

Range:

  • The spread, or the distance, between the lowest and highest values of a variable.
  • To get the range for a variable, you subtract its lowest value from its highest value.

Example:

102,115,128,109,131,89,98,106,140,119,93,97,110

Range = 140 - 89 = 51

Interquartile Range:

  • A quartile is the value that marks one of the divisions that breaks a series of values into four equal parts.
  • The median is a quartile and divides the cases in half.
  • 25th percentile is a quartile that divides the first ¼ of cases from the latter ¾.
  • 75th percentile is a quartile that divides the first ¾ of cases from the latter ¼.

Variance:

  • A measure of the spread of the recorded values on a variable. A measure of dispersion.
  • The larger the variance, the further the individual cases are from the mean.
  • The smaller the variance, the closer the individual scores are to the mean.
  • Variance is a number that at first seems complex to calculate.
  • Calculating variance starts with a “deviation.”
  • A deviation is the distance away from the mean of a case’s score.

Variance = Σ(Yi – Y-bar)2 / n – 1

Standard Deviation:

  • To convert variance into something, create standard deviation.
  • The square root of the variance reveals the average deviation of the observations from the mean.

S.D.  = Square root  ( Σ(Yi – Y-bar)2  /(n-1) )                       

No comments:

Post a Comment

ML-Model DecisionTree Example-IncomePrediction

DecisionTree -- IncomePrediction Decision Tree: Income Prediction ¶ In this l...