Analyzing the Data
E-Data Aid setp
| Column to keep | What it contains |
| Subject | Your subject number |
| Procedure(Block) | Whether the trial was from the categorization block or the typicality rating block |
| CatDisplay.RT | Reaction time to make the categorization decision. |
| Category | Stimulus category (e.g., beverage, sport) |
| InCategory | The right answer about whether the item is really a member of the category (1=yes, 2=no) |
| Item | Stimulus item rated (e.g., necklace, hawk) |
| SubjectChoice | Your response. It is a "Yes" or "No" for the categorization block and number from 1 to 7 for the typicality rating block. |
| Variable | What is is | What to do with it |
| Category | Stimulus category | Keep |
| Item | Stimulus item | Keep |
| subject | Your subject number | Keep |
| CatDisplay.rt | Categorization reaction time | Keep but re-name variable to "rt" |
| Incatego | Whether items is a member of category | keep |
| V1 or Procedure(Block).1 | Categorization block | Delete |
| V2 or Procedure(Block).2 | Rating block | Delete |
| V3 | Your categorization decision | Keep but rename to "choice" |
| V4 | Your typicality rating | Keep but rename to "rating" |
In this experiment, we predicted that would more typical examples of categories would be classified faster than less typical ones. That is, we predicted that there would be a relationship between each word's typicality and the time subjects took to classify that word. Such a relationship is called a "correlation".
To test this, we computed for each of the 150 words (10 examples in each of 15 categories) two numbers: a typicality rating, which was the mean of the typicality ratings given that word by all the subjects in the rating task, and a reaction time, which was the mean of the reaction times for that word by all the subjects, taking the positive trials only. The very laborious process we used to extract and compute these means was intended to make it perfectly clear to you exactly where the final data really came from, and to give you a hands-on sense of how much calculation goes into analyzing even a simple experiment.
The r statistic. The statistic we use to measure the degree of correlation between the two variables is called (Pearson's) r. It is a number between -1 and 1, and measures the degree to which one variable can be (linearly) predicted from the other. An r = 0 means that there is no relationship at all (the two variables are independent); r = 1 means that there is a perfect linear
relationship between them; and r =-1 means that there is a perfect inverse linear relationship between them— that as one increases the other decreases, in a completely predictable fashion.
Any other value of r means that there is some relationship between the variables, but that it is not perfect. In such a case you can predict the value of one variable from the other, but not precisely; there is also an unpredictable component.
The "regression equation" is the equation of the line that best approximates the relationship between the two variables. The data form a "cloud" around this line (unless r = 1 or r = -1, in which case they fall right on it). The line normally has an equation of the form
Dependent Variable = Slope × Independent Variable + Intercept.
Note that the slope will always have the same sign as r.
Computing significance. Just like with the t statistic, we can also measure, for each r, whether the correlation it measures in the data is large enough to be considered statistically significant. Like t, we do this by associating the r with a p value, which measures how likely our data were to have been produced if there were no relationship (the null hypothesis).
The summary statistics for this experiment— r and p— will be given to you in class. You will also be given a plot of the 150 data points with the regression line plotted.