COMPARING FREQUENCIES USING CHI SQUARE

With chi square, we are looking at frequency counts, not scores as we did in previous chapters.

Chi Square

·       The statistic that measures the discrepancy between the observed values and the expected values in a contingency table.

·       Used for measuring frequencies- thus your data may be NOMINAL in nature

·       Nonparametric statistic, also used for data that violates assumptions of normal distribution., thus it is a “distribution Free” test.- no assumptions are made about population parameters.

Ex. The following require frequency data:

  • brand preference
  • Gallup Poll
  • "How many people preferred this one or that one?"

Ex.
 Toss coin 100 times:
The frequencies that you have observed as a result of the 100 tosses:
H/57, T/43

In contrast to observed frequencies, expected or theoretical frequencies are what you think you will obtain when you conduct the experiment.

The numbers we observe on the average if the null hypothesis is true.

Ho: The coin is fair.
P(h) = .5
P(h) = Pt
The proportion of heads = the proportion of tails

H1: The coin is not fair.
P(h) not = .5
P(h) not = Pt

The theoretical or expected frequencies if the coin is in fact fair:
H/50, T/50
These expected frequencies are based on 100 tosses.

If the observed frequencies are not far away from the expected frequencies we can say that we have insufficient evidence of an unfair coin, and may need more information.

The test statistic for this problem:

O = Observed Frequency

E = Expected Frequency

Df = k-1

1.  Take the observed frequency for each category and subtract the expected frequency.

2.  Square this difference.

3.  Divide by the expected frequency.

4.  Add all the numbers up.

Decision Rule:

·        alpha = .05

·        df = # of categories - 1 = "K - 1", where K is equal to the number of categories.

·        Use Table to find the CRITICAL CHI-SQUARE VALUES

·        Similar to the f test, chi square is treated as a one-tailed, right-tailed test.

·        Chi square value is NEVER negative.

 

 

For df = 1 and alpha = .05, the critical value is 3.84.

Table of Chi-square statistics

df

P = 0.05

P = 0.01

P = 0.001

1

3.84

6.64

10.83

2

5.99

9.21

13.82

3

7.82

11.35

16.27

4

9.49

13.28

18.47

5

11.07

15.09

20.52

6

12.59

16.81

22.46

7

14.07

18.48

24.32

8

15.51

20.09

26.13

9

16.92

21.67

27.88

10

18.31

23.21

29.59

11

19.68

24.73

31.26

12

21.03

26.22

32.91

13

22.36

27.69

34.53

14

23.69

29.14

36.12

15

25.00

30.58

37.70

16

26.30

32.00

39.25

17

27.59

33.41

40.79

18

28.87

34.81

42.31

19

30.14

36.19

43.82

20

31.41

37.57

45.32

21

32.67

38.93

46.80

22

33.92

40.29

48.27

23

35.17

41.64

49.73

24

36.42

42.98

51.18

25

37.65

44.31

52.62

26

38.89

45.64

54.05

27

40.11

46.96

55.48

28

41.34

48.28

56.89

29

42.56

49.59

58.30

30

43.77

50.89

59.70

31

44.99

52.19

61.10

32

46.19

53.49

62.49

33

47.40

54.78

63.87

34

48.60

56.06

65.25

35

49.80

57.34

66.62

36

51.00

58.62

67.99

37

52.19

59.89

69.35

38

53.38

61.16

70.71

39

54.57

62.43

72.06

40

55.76

63.69

73.41

41

56.94

64.95

74.75

42

58.12

66.21

76.09

43

59.30

67.46

77.42

44

60.48

68.71

78.75

45

61.66

69.96

80.08

46

62.83

71.20

81.40

47

64.00

72.44

82.72

48

65.17

73.68

84.03

49

66.34

74.92

85.35

50

67.51

76.15

86.66

51

68.67

77.39

87.97

52

69.83

78.62

89.27

53

70.99

79.84

90.57

54

72.15

81.07

91.88

55

73.31

82.29

93.17

56

74.47

83.52

94.47

57

75.62

84.73

95.75

58

76.78

85.95

97.03

59

77.93

87.17

98.34

60

79.08

88.38

99.62

61

80.23

89.59

100.88

62

81.38

90.80

102.15

63

82.53

92.01

103.46

64

83.68

93.22

104.72

65

84.82

94.42

105.97

66

85.97

95.63

107.26

67

87.11

96.83

108.54

68

88.25

98.03

109.79

69

89.39

99.23

111.06

70

90.53

100.42

112.31

71

91.67

101.62

113.56

72

92.81

102.82

114.84

73

93.95

104.01

116.08

74

95.08

105.20

117.35

75

96.22

106.39

118.60

76

97.35

107.58

119.85

77

98.49

108.77

121.11

78

99.62

109.96

122.36

79

100.75

111.15

123.60

80

101.88

112.33

124.84

81

103.01

113.51

126.09

82

104.14

114.70

127.33

83

105.27

115.88

128.57

84

106.40

117.06

129.80

85

107.52

118.24

131.04

86

108.65

119.41

132.28

87

109.77

120.59

133.51

88

110.90

121.77

134.74

89

112.02

122.94

135.96

90

113.15

124.12

137.19

91

114.27

125.29

138.45

92

115.39

126.46

139.66

93

116.51

127.63

140.90

94

117.63

128.80

142.12

95

118.75

129.97

143.32

96

119.87

131.14

144.55

97

120.99

132.31

145.78

98

122.11

133.47

146.99

99

123.23

134.64

148.21

100

124.34

135.81

149.48

 

So the decision rule is to reject ho if the Chi-Square test statistic is greater than 3.84, otherwise do not reject ho.

Decision: Since 1.96 is less than 3.84, Do not reject Ho.
Conclusion: There is insufficient evidence of an unfair coin.

Chi square is called a "Goodness of Fit" test because we want to see if observed frequencies fit the theoretical frequencies.

Ex. Given N = 200 consumers:

Theoretical frequencies (percentages)

Observed frequencies

A = 38%

A = 80

B = 27%

B = 50

C = 35%

C = 70

Total = 100%

Total = 200

Question: Does the observed data fit the theoretical data?

IF YOU ARE GIVEN PERCENTAGES, THE PERCENTAGES  MUST BE CONVERTED TO FREQUENCIES BY MULTIPLYING THE PERCENTAGE BY THE TOTAL NUMBER OF CONSUMERS.

EXPECTED FREQUENCY =

(EXPECTED PROPORTION) (N)

Expected (theoretical) frequencies:
A = 38% of 200 = 76
B = 27% of 200 = 54
C = 35% of 200 = 70

Ho: Proportion of A = .38
Ho: Proportion of B = .27
Ho: Proportion of C = .35

H1: Proportion of A does not = .38
H1: Proportion of B does not = .27
H1: Proportion of C does not = .35

Chi square value:

Decision rule:

·      alpha = .05

·      df = K - 1 = 3 - 1 = 2, (where there were 3 categories - 1 = 2)

·      Using Chi square table, find the critical value = 5.99

·      Reject Ho if the chi square test statistic > 5.99, otherwise do not reject ho.

Decision: Since 0.5068 < 5.99, do not reject ho.

Conclusion: There is insufficient evidence of the lack of fit, not enough evidence to refute the researchers proportional claims.

2 way Chi Square

With variables that are categorical we need to use chi square to determine if they are related.

Ex. Is gender related to political affiliation?

Gender

Political

Male

D

Female

R

 

These are categorical data.
You must use chi square for a Contingency Table.

 

 

 

 

 

Data Table:

People

Gender

Political

1

M

D

2

F

D

3

F

D

4

M

R

5

M

R

6

M

D

7

F

D

8

F

R

9

F

D

10

F

R

 

 

Demo

Repub

Total

Gender

 

 

 

Male

2

2

4

 

 

 

 

Female

4

2

6

 

 

 

 

Total

6

4

10

4 fold Contingency Table

 

 

 

Expected frequencies in a contingency table are computed using observed frequency data.

Expected frequency  =

                  (Column Total) Row Total)

                                            (Overall Total)

 

 

 

 

 

 

 

 

 

 

 

 

 

OBSERVED VALUES

 

 

 

Political

Demo

Repub

Total

Gender

 

 

 

Male

2

2

Row1= 4

 

 

 

 

Female

4

2

Row2= 6

 

 

 

 

Total

Column1 = 6

Column 2 = 4

N = 10

 

Expected VALUES

    

Political

Demo

Repub

Total

Gender

 

 

 

Male

C1xR1/N = 2.4

C2xR1/N = 1.6

Row1= 4

 

 

 

 

Female

C1xR2/N = 3.6

C2xR2/N = 2.4

Row2= 6

 

 

 

 

Total

Column1 = 6

Column2 = 4

N = 10

In the expected frequency table the cells represent counts one would expect if the two categorical variables are totally unrelated.

The chi square says that if observed frequencies fit the expected frequencies, we know that the variables are also not related or are independent of one another.

Decision rule at alpha = .05

·      df = (# rows - 1)(# columns - 1) = 1

·      Use table for chi square critical value = 3.84

·      Reject ho if the chi square test statistic > 3.84, otherwise do not reject ho.

Decision: Since 0.2778 < 3.84, do not reject ho.
Gender and politics are not related.

 

 

 

 

Is therapy and improvement related?

Ho: Therapy and improvement are not related (independent).
H1: Therapy and improvement are related (dependent).

Observed data:

OBSERVED VALUES

 

Improvement

YES

NO

Total

Type

 

 

 

Therapy

75

25

R1= 100

 

 

 

 

Placebo

58

42

R2= 100

 

 

 

 

Total

C1 = 133

C2 = 67

N = 200

 

 

 

 

 

 

Expected data:

Expected VALUES

 

 

 

Improvement

YES

NO

Total

Type

 

 

 

Therapy

C1xR1/N = 66.5

C2xR1/N = 33.5

R1= 100

 

 

 

 

Placebo

C1xR2/N = 66.5

C2xR2/N = 33.5

R2= 100

 

 

 

 

Total

C1 = 133

C2 = 67

N = 200

 

 

 

 

Decision rule:

·      alpha = .05

·      df = 1, df = (# rows - 1)(# columns - 1)

·      Table, critical value = 3.84

·      Reject Ho if the chi square statistic is greater than 3.84, otherwise do not reject ho.

Decision: Reject ho since 6.49 > 3.84

Conclusion: There is evidence that therapy is related to improvement.