Notes
Data
Applets
Examples

OnLine Help
New User
User's Guide
References

Notes on Topic 6:
Probability & Distributions

    1. Overview
    2. Why Study Probability?
      We study probability because the relationships between samples and populations is usually stated in terms of probability. Probability plays a central in inferential statistics.
      Probability and Inferential Statistics
      The goal of inferential statistics is to use the limited information we have in a sample to draw general conclusions about the population. Probability allows us to start with a population and predict what kind of sample is likely to be obtained from it. Inferential statistics allows us to say how probable the sample is to have come from a particular population.
      Probability and Frequency Distributions
      We are usually concerned about probability when we have a population frequency distribution: We want to know how probable it is to obtain a specific sample. For this reason we discuss Normal and Binomial distributions, the two most commonly used in statistics.

    3. Introduction to Probability
    4. Definition
      Probability is defined for a specific outcome in a situation where several different outcomes are possible. If the possible outcomes are denoted A, B, C, D, etc., then the probability of A is defined as:
      Examples
      Tossing Coins: When you toss a balanced coin, the outcome is either heads or tails. Thus, there are a total of 2 possible outcomes. The probability of tossing a head is
        p(Head) = 1/2 = .5 = 50%

      Selecting Cards: There are 52 cards in an ordinary deck of cards. Thus, there are a total of 52 possible outcomes.
      • The probability of drawing a Heart (there are 13 hearts) is:
          p(Heart) = 13/52 = 1/4 = .25 = 25%
      • The probability of drawing an Ace (there are 4 aces) is:
          p(Ace) = 4/52 = 1/13 = .0769 = 7.69%
      • The probability of drawing a Green card (there are 0 green cards) is:
          p(Green) = 0/52 = .00 = 0%
      • The probability of drawing a card (there are 52 cards) is:
          p(Card) = 52/52 = 1.00 = 100%
      Definition
      Random Sampling: An independent random sample must satisfy two requirements:
      1. Each individual in the population must have an equal chance if being selected.
      2. If more than one individual is selected for the sample, there must be constant probability for each and every selection. (Sample with replacement)

    5. Probability and the Normal Distribution
    6. Definition
      The normal distribution is defined by a complicated equation that we don't need to know or understand.
      Why use Normal Distributions?
      What is important is to understand that the normal distribution is used very frequently because:
      1. It can be shown that many characteristics of interest, such as IQ, height and weight of people, etc., have a normal population distribution.
      2. It can be shown mathematically that this shape is guaranteed in certain situations that will be important to us in inferential statistics.
      Characteristics:
      The normal distribution:
      1. is symmetrical (the left side is a mirror image of the right side).
      2. has 50% of the scores below the mean and 50% above. (Mean = Median)
      3. has most scores are in the middle. Few scores are at the edges.
      The Standard Normal Distribution
      We have a standard normal distribution when the scores in a normal distribution are expressed in standardized z-scores. Thus, the standard normal distribution
      1. has a mean of 0 and a standard deviation of 1, just like any other standardized distribution.
      2. has a normal shape, just like any other normal distribution.
      For a standard normal distribution it can be shown that
      1. 34.13% of the scores are between the mean and +1.00.
      2. 34.13% of the scores are between the mean and -1.00.
      3. 13.59% of the scores are between +1.00 and +2.00.
      4. 13.59% of the scores are between -1.00 and -2.00.
      5. 2.28% of the scores are above +2.00.
      6. 2.28% of the scores are below -2.00.
      Answering Probability Questions with the Unit Normal Table :
      The unit normal table provides a listing of proportions (probabilities) corresponding to many z-scores in the standard normal distribution. Take the following steps to answer probability questions using this table:
      1. Sketch the distribution, showing the mean and standard deviation in raw scores.
      2. On the sketch, locate the specific score identified in the problem, and draw a vertical line through the distribution at this location.
      3. Make sure whether you need to find out about values greater than (to the right side of) or less than (to the left side of) the specific score.
      4. Shade the appropriate portion of the distribution (to the right or left of your line).
      5. Now transform the specific score into a z-score to identify the specific z-score in the standard normal distribution that appears in the Unit Normal Table.
      6. Look at the shaded portion in your sketch to determine which column (B or C) in the table corresponds with the proportion you are trying to find.
      7. Ignore the sign of your z-score and look it up in the table, taking the appropriate value from column B or C.
      Percentiles and the Normal Distribution
      You can use the normal distribution to determine percentiles (and percentile ranks).
      Because a percentile (rank) of a score is the percentage of the scores that fall at or below the score, you will need to find the proportion of the distribution that is to the left of the score.
      ViSta and the Normal Distribution
      With ViSta, you can get the proportion of the scores that are below (to the left) of a given z-score by typing, in the listener window, the function:
      (normal-cdf z)
      where z is replaced with the z-score value in which you are interested.
      Multiply this by 100 for the percentile. You can do this by typing:
      (* 100 (normal-cdf z))
      Subtract the value returned by the function from one to get the proportion to the right of z. This can be done by typing:
      (- 1 (normal-cdf z))

    7. Probability and the Binomial Distribution
    8. Definition
      When a variable is measured on a scale consisting of only two categories, the data are called binary or binomial. In this situation the researcher often knows the population probabilities associated with the two categories. When this is the case, the data have a known population distribution, called the binomial distribution.
      Distribution Shape
      There are a whole family of different binomial population distributions. The exact shape of a member of the family depends on:
      • N, which is the number of observations or individuals in a sample.
      • P, which is the probability of one of the two events (Q=1-P is the probability of the other event).
      Some examples of specific binomial distributions are given here.
      Normality of Binomials
      When the product of N and P and the product of N and Q are both greater than or equal to 10, the binomial distribution is nearly perfectly normal. Under these circumstances:
      • The population mean is NP
      • The population standard deviation is SQRT(NPQ)