Data Analysis for the Minor Project
Chi-Squares
When and why do you use it?
To answer questions about how frequently individuals are distributed into categories.
Are women more likely than men to take psychology courses?
Are Jews more likely than Protestants to vote Democratic?
Are African-Americans under-represented in advertising?
How do you compute the chi-square?
One-Sample Chi-Square
Use this for comparing your results to the
proportion of the
U.S. population that is and is not Africa-American.
Example: Are women more likely than men to take psychology courses?
Let's say that the answer is "no." In a class of 150 students,
how many would
you expect to be male, and how many female?
(answer: 75 and 75)
So, the question is, "Did the observed distribution differ
from a 50-50 split (75 males and 75 females)?"
More formally, "if you expect a 50-50 distribution by
chance, how likely is any observed deviation from 50-50
to have occurred purely by chance?"
Let's say there really were 75 males and 75 females.
What would your conclusion be?
(Answer: Obviously not different from a 50-50 split).
Let's say there were
74M and 76F? "Different" but barely. Probably just
chance.
72M and 78F? "A little more different than 50-50."
70M and 80F? "A little *more* different than 50-50."
60M and 90F? "This would not likely have come about if men and
women were equally likely to take
the class."
But at what point does the discrepancy from 50-50 count? At
what
point do we believe
that men and women are probably *not* equally likely to take the class?
Sum is usually written as the Greek Letter Sigma, sort of like an
angled
capital E. I could not
figure out how to that onto a web page either.
Sum (or sigma) = "sum of all"; in this context, it means summing over all cells"
f = frequency
o = observed, so fo means "observed frequency"
e = expected, so fe means "expected frequency"
Women: Observed = 90, Expected=75
Men: Observed
= 60, Expected=75
X2 = (90-75)2/75 + (60-75)2/75 = 6
X2 = 6
Is this a high or low chi-square? Is it statistically significant?
1. You need the Degrees of Freedom (df).
For a one-sample chi-square, df = # of cells -1.
In this example, there are 2 cells, so df = 2-1 = 1.
2. Then, you need to look it up in Table B.4.
Table B.4.
df on left. Probability on the top.
So, a chi-square of 6, with 1 df has a p-value less than
.02, but greater than .01.
What does that mean?
Remember the question your chi-square addresses:
"If you expect a 50-50 distribution by chance, how likely is any
observed deviation from 50-50 to have occurred purely by chance?"
So, does a high probability or a low probability indicate
that the distribution of men and women in my class was
most likely random deviation from 50-50?
What does a high probability indicate?
What does a low probability indicate?
Why is the traditional cutoff for considering a result
"statistically significant" p=.05?
What does p=.05 mean?
Example 2:
About two thirds of all Psych Majors are women.
Does the distribution in my class differ from that among
Psych Majors?
Mo=60, Me=50
Fo=90, Fe=100
X2 = (60-50)2/50 + (90-100)2/100 = 3.
X2(1)=3, .05<p<.10.
.05<p<.10 This is usually called "marginally
significant"
meaning that it is
a borderline case. It is not significant at p<.05, but it
is close. In this
example, this means that the distribution of men and women in my class
may differ from that of Psych Majors, but we can't be too sure.
USE THIS ONE-SAMPLE CHI-SQUARE TO ANSWER
YOUR SECOND QUESTION:
Are African-Americans under-represented in advertisements?
AFRICAN-AMERICANS: Observed=????
Expected=.13(total number of all characters)
----------------------------------------------------------------------------------------------
OTHERS
Observed=????
Expected=.87(total number of all characters)
Note: .12(total number of all characters) means ".13
times
the total number of all characters."
Same for .87(total number of all characters).
This is the chi-square discussed in your textbook. It is also
called the
"chi-square test for contingency tables."
It determines whether two groups are differentially distributed on
some
nominal variable.
For example, are professors more liberal than policemen?
Specifically, are professors more likely to vote for the
democratic presidential candidate than are cops?
Note: I made this data up.
Pro-Bush
Pro-Gore
Row Sums
100
Cops
50 cell a
50 cell b
100
100
Profs
40 cell c
60 cell d
100
Chi -square is still:
X2 = SUM((fo-fe)2/fe)
Observed is easy.
Expected = (Column
total)(Row
total)
(for each
cell)
Grand Total
Ea = (90 x 100)/200 = 45
Eb = (110 x 100)/200 = 55
Ec = (90 x 100)/200 = 45
Ed = (110 x 100)/200 = 55
(Ea is expected value for cell a, Eb for cell b, and so on).
So X2 = SUM((fo-fe)2/fe) =
(50-45)2/45 + (50-55)2/55 + (40-45)2/45 + (60-55)2/55
= 25/45 +25/55 + 25/45 + 25/55 = .56 + .45 + .56 + .45
= 2.02
(r-1)(c-1)
r = # of rows
c= # of columns
2(1) = 2.02, p>.10, ns
ns = nonsignificant.
What does a nonsignificant result mean?
In general, it means that there is no difference between groups or
no relationship between variables.
In this particular case, it means that there is no difference in
cops' and prof's presidential preferences.
1. Does the representation of African-Americans in Ads
since 1989 appear to be increasing, decreasing, or
remaining the same?
Compare your results to those of Wilkes & Valencia:
Ads w/o
AA's
Ads w/AA's Row Totals
W&V
664
240
904
You
????
????
????
Note: ????'s appear to represent the fact that your data belongs
there.
3. Are African-Americans disproportionately relegated to
insignificant roles in advertisements?
Minor
Roles
Major Roles Row Totals
Other
Characters
????
????
????
African-American
????
????
????
Characters
Return to Lee Jussim's Home Page
Go to
the Rutgers Psychology Home Page
Go to the Rutgers Home Page