Categorization Lab

Analyzing the Data

E-Data Aid setp

  1. Double click on the E-Data Aid icon.
  2. Open your data file. (It's in the c:\Psy306\YOURSECTION folder).
  3. From the "Tools" menu, select "arrange columns"
  4. Hide all of the columns except the following:
  5. Column to keep What it contains
    Subject Your subject number
    Procedure(Block) Whether the trial was from the categorization block or the typicality rating block
    CatDisplay.RT Reaction time to make the categorization decision.
    Category Stimulus category (e.g., beverage, sport)
    InCategory The right answer about whether the item is really a member of the category (1=yes, 2=no)
    Item Stimulus item rated (e.g., necklace, hawk)
    SubjectChoice Your response. It is a "Yes" or "No" for the categorization block and number from 1 to 7 for the typicality rating block.
  6. After hiding the columns click "OK." You should now see the data spreadsheet showing only the columns you did not hide
  7. Save the file
  8. From the "File" menu, select "Export"
  9. Select file type "Excel"
  10. As the file name use "typicality" + subj# + ".xls" (e.g., typicality0001.xls)
  11. Click save
Excel Step
  1. Double click on the Excel icon to open excel
  2. Open the file you just created.
  3. Look at it to make sure it looks OK.
  4. Delete the first row of the file (the one that comes before the variable names).
  5. Save the file, making sure that it is being saved as an Excel file. When asked whether you want to change the file format, select “no”.
SPSS Step
  1. Double click on the SPSS icon to open it
  2. Select "open new data file"
  3. Select your Excel file name
  4. For each category/item pair you made a categorization choice (which had a RT) and a typicality rating. We want to show both the RT and the rating on the same line of the data file. To do so we will restructure the data.
  5. Look at the data spreadsheet and notice that there are now 300 lines instead of 450. The datafile should now have the following variable:
     
    Variable What is is What to do with it
    Category Stimulus category Keep
    Item Stimulus item Keep
    subject Your subject number Keep
    CatDisplay.rt Categorization reaction time Keep but re-name variable to "rt"
    Incatego Whether items is a member of category keep
    V1 or Procedure(Block).1  Categorization block Delete
    V2 or Procedure(Block).2  Rating block Delete
    V3 Your categorization decision Keep but rename to "choice"
    V4 Your typicality rating Keep but rename to "rating"
  1. Note: to rename variables, look at the data spreadsheet and notice at the bottom left there are tabs for "Data view" and "Variable view." Select variable view. You now see the list of variable names and can change them. Change both the variable name and the variable label (two separate columns). Click on the "data view" tab to get back to the spreadsheet.
  2. Make sure that both the rt variable and the rating variable are defined as numeric variables.  You can do this from the “Variable view” screen.
  3. We want to ignore categorization trials where the item was not a member of the category and trials where you made an incorrect categorization decision. So we’ll filter those trials out.
  4. Now we are finally ready to compute a correlation between reaction time and typicality rating. From the "Analyze" menu, select "correlate – bivariate". Move "rt" and "rating" into the variables box. Click OK. The correlation coefficient will appear in the ouput.
  5. Also make a scatter plot of rt by rating. From the "Graphs" menu select "scatter – simple." Click "define" and make rt the Y-axis and rating the X-axis. Click OK and your graph will appear. Print out the graph to include in your lab report.
  6. The lab instructor will compute the correlation and construct a similar graph for item means based on the data from the entire class.
Categorization Lab Results

In this experiment, we predicted that would more typical examples of categories would be classified faster than less typical ones. That is, we predicted that there would be a relationship between each word's typicality and the time subjects took to classify that word. Such a relationship is called a "correlation".

To test this, we computed for each of the 150 words (10 examples in each of 15 categories) two numbers: a typicality rating, which was the mean of the typicality ratings given that word by all the subjects in the rating task, and a reaction time, which was the mean of the reaction times for that word by all the subjects, taking the positive trials only. The very laborious process we used to extract and compute these means was intended to make it perfectly clear to you exactly where the final data really came from, and to give you a hands-on sense of how much calculation goes into analyzing even a simple experiment.

The r statistic. The statistic we use to measure the degree of correlation between the two variables is called (Pearson's) r. It is a number between -1 and 1, and measures the degree to which one variable can be (linearly) predicted from the other. An r = 0 means that there is no relationship at all (the two variables are independent); r = 1 means that there is a perfect linear

relationship between them; and r =-1 means that there is a perfect inverse linear relationship between them— that as one increases the other decreases, in a completely predictable fashion.

Any other value of r means that there is some relationship between the variables, but that it is not perfect. In such a case you can predict the value of one variable from the other, but not precisely; there is also an unpredictable component.

The "regression equation" is the equation of the line that best approximates the relationship between the two variables. The data form a "cloud" around this line (unless r = 1 or r = -1, in which case they fall right on it). The line normally has an equation of the form

Dependent Variable = Slope × Independent Variable + Intercept.

Note that the slope will always have the same sign as r.

Computing significance. Just like with the t statistic, we can also measure, for each r, whether the correlation it measures in the data is large enough to be considered statistically significant. Like t, we do this by associating the r with a p value, which measures how likely our data were to have been produced if there were no relationship (the null hypothesis).

The summary statistics for this experiment— r and p— will be given to you in class. You will also be given a plot of the 150 data points with the regression line plotted.