Research Methods in Social Psychology, 323
 

                    Data Analysis for the Minor Project
 

Chi-Squares
When and why do you use it?

To answer questions about how frequently individuals are distributed into categories.

Are women more likely than men to take psychology courses?

Are Jews more likely than Protestants to vote Democratic?

Are African-Americans under-represented in advertising?



 

How do you compute the chi-square?

One-Sample Chi-Square

Use this for comparing your results to the proportion of the
U.S. population that is and is not Africa-American.


Example: Are women more likely than men to take psychology courses?

Let's say that the answer is "no."  In a class of 150 students, how many would
you expect to be male, and how many female?

(answer: 75 and 75)

So, the question is, "Did the observed distribution differ
from a 50-50 split (75 males and 75 females)?"

More formally, "if you expect a 50-50 distribution by
chance, how likely is any observed deviation from 50-50
to have occurred purely by chance?"

Let's say there really were 75 males and 75 females.
What would your conclusion be?

(Answer: Obviously not different from a 50-50 split).
 

Let's say there were

74M and 76F?  "Different" but barely.  Probably just chance.
72M and 78F?  "A little more different than 50-50."
70M and 80F?  "A little *more* different than 50-50."
60M and 90F?  "This would not likely have come about if men and women were equally likely to take
                         the class."

But at what point does the discrepancy from 50-50 count?  At what point do we believe
that men and women are probably *not* equally likely to take the class?

CHI-SQUARE!
X2 =  SUM((fo-fe)2/fe)
X2= Chi-square.  Note X2 is usually written as a sort of more curvey capital X,
which is the Greek letter Chi.  I could not figure out how to get a real Chi into a web page.

Sum is usually written as the Greek Letter Sigma, sort of like an angled capital E.  I could not
figure out how to that onto a web page either.

Sum (or sigma) = "sum of all"; in this context, it means summing over all cells"

f = frequency
o = observed, so fo means "observed frequency"
e = expected, so fe means "expected frequency"
 

Women:     Observed = 90, Expected=75
Men:          Observed = 60, Expected=75

X2 = (90-75)2/75 + (60-75)2/75 = 6

X2 = 6
 

Is this a high or low chi-square?  Is it statistically significant?

1. You need the Degrees of Freedom (df).

For a one-sample chi-square, df = # of cells -1.
In this example, there are 2 cells, so df = 2-1 = 1.

2. Then, you need to look it up in Table B.4.

Table B.4.
df on left.  Probability on the top.
 

So, a chi-square of 6, with 1 df has a p-value less than
.02, but greater than .01.

What does that mean?
Remember the question your chi-square addresses:

"If you expect a 50-50 distribution by chance, how likely is any
observed deviation from 50-50 to have occurred purely by chance?"

So, does a high probability or a low probability indicate
that the distribution of men and women in my class was
most likely random deviation from 50-50?

What does a high probability indicate?

What does a low probability indicate?
 

Why is the traditional cutoff for considering a result
"statistically significant" p=.05?

What does p=.05 mean?


Example 2:

About two thirds of all Psych Majors are women.

Does the distribution in my class differ from that among
Psych Majors?
 
 

Mo=60, Me=50
Fo=90, Fe=100

 X2 = (60-50)2/50 + (90-100)2/100 = 3.

 X2(1)=3, .05<p<.10.
 

.05<p<.10  This is usually called "marginally significant" meaning that it is
a borderline case.  It is not significant at p<.05, but it is close.  In this
example, this means that the distribution of men and women in my class
may differ from that of Psych Majors, but we can't be too sure.
 

USE THIS ONE-SAMPLE CHI-SQUARE TO ANSWER
YOUR SECOND QUESTION:

Are African-Americans under-represented in advertisements?
 

AFRICAN-AMERICANS:       Observed=????
                                                Expected=.13(total number of all characters)
----------------------------------------------------------------------------------------------
OTHERS                                 Observed=????
                                                Expected=.87(total number of all characters)

Note: .12(total number of all characters) means ".13 times the total number of all characters."
            Same for .87(total number of all characters).
 



TWO-SAMPLE CHI-SQUARE

This is the chi-square discussed in your textbook.  It is also called the
"chi-square test for contingency tables."

It determines whether two groups are differentially distributed on some
nominal variable.

For example, are professors more liberal than policemen?

Specifically, are professors more likely to vote for the
democratic presidential candidate than are cops?

Note: I made this data up.

                                            Pro-Bush        Pro-Gore                Row Sums
                100 Cops                50 cell a             50 cell b                100
                100 Profs                40 cell c            60 cell d                100



                Column Sums         90                    110                        200  Grand Total

Chi -square is still:

X2 =  SUM((fo-fe)2/fe)

Observed is easy.

Expected    =     (Column total)(Row total)
(for each cell)               Grand Total

Ea = (90 x 100)/200 = 45
Eb = (110 x 100)/200 = 55
Ec = (90 x 100)/200 = 45
Ed = (110 x 100)/200 = 55

(Ea is expected value for cell a, Eb for cell b, and so on).

So X2 =  SUM((fo-fe)2/fe) =

(50-45)2/45 + (50-55)2/55 + (40-45)2/45 + (60-55)2/55

= 25/45 +25/55 + 25/45 + 25/55 = .56 + .45 + .56 + .45

= 2.02



Degrees of freedom for the two-sample chi-square:

(r-1)(c-1)

r = # of rows
c= # of columns



So, df=1 in this example.

 2(1) = 2.02, p>.10, ns
ns = nonsignificant.

What does a nonsignificant result mean?

In general, it means that there is no difference between groups or
no relationship between variables.

In this particular case, it means that there is no difference in
cops' and prof's presidential preferences.





USE THE TWO-SAMPLE CHI-SQUARE FOR
ADDRESSING QUESTIONS 1 AND 3:

1. Does the representation of African-Americans in Ads
since 1989 appear to be increasing, decreasing, or
remaining the same?

Compare your results to those of Wilkes & Valencia:
 

                        Ads w/o AA's                Ads w/AA's        Row Totals
W&V                   664                                240                    904

You                    ????                                ????                    ????



Column Totals      664+ ????                      240+ ????           904+ ????  Grand Total
 

Note: ????'s appear to represent the fact that your data belongs there.
 

3. Are African-Americans disproportionately relegated to
insignificant roles in advertisements?

                                    Minor Roles                Major Roles        Row Totals
Other Characters                  ????                          ????                 ????
African-American                 ????                          ????                 ????
Characters



Column Totals                       ????                           ????                 ????  Grand Total
 
 
 

Return to Lee Jussim's Home Page

Go to the Rutgers Psychology Home Page
Go to the Rutgers Home Page