STAT 586: INTERPRETATION OF DATA. Fall 2009. ROOM : Hill 552, TIME: W 7,8 (6:40-7:50,8:10-9:30) WEB PAGE: http://www.rci.rutgers.edu/~cabrera EMAIL: xavier.cabrera@gmail.com INSTRUCTOR: Javier Cabrera. 471 Hill Center. OFF HOURS: Wed.4:30pm-6:10 pm. or by appointment. OBJECTIVES: This is a three credit course designed to be an introduction to statistical computing and data analysis. Topics of interest are Exloratory Data Analysis(EDA), Data mining, Statistical computing with R and with SAS. Students are also required to develop good communication skills by working in groups, writing reports and doing presentations. GRADING: Report 1 (15% of grade) (this can be a group report of 4 or less). Report 2 (30% of grade) (this can be a group report of 4 or less). Class participation and group presentations from Report 2 (10% of grade). Report 3 (45% of grade). PROJECTS: There will be one individual final project and two group projects involving analysis of practical questions using statistical data analysis. Prepare a report following the format provided in the report instructions. The data sets will be uploaded to the course web site shortly. SYLLABUS: 1. Introduction. Interpretation of Data. (2-3 weeks Ch.1-2, Appendix B, Notes) Statistical software. Introduction to the SAS statistical system, and basic syntax. R, Splus: Modern statistical computing with R. 2. Exploratory Data Analysis. (2 weeks, Ch. 3, Notes) Summary statistics. Graphical summaries: Stem and leaf displays, letter values. Checking the shape of distributions with QQ-plots. Power transformations. Comparing batches with box-plots. Power transformations for variance stabilization. Spread vs level plots. 3. Simple and Multiple Linear Regression. (2 weeks, Ch. 7 s 3, Notes) Plots of Relationship. Least squares versus resistant line fitting. The Three group procedure, and the L1 method. Fitting equations to data and scientific discovery. Power transformations and non-linear fits. Fitting Two-way Tables. Median Polish. Examining Residuals. 4. Multivariate Analysis, concepts computation, data visualization. (4 weeks, Ch. 8 s 2,3,4, Notes) Principal components Analysis (PCA), Selecting the number of PC's Factors Analysis (FA). The FA model. Selecting the number of Factors. Rotations. Discriminant Analysis and supervised classification methods. Cluster Analysis and supervised classification Methods. 5. Data Mining. (3 weeks Ch. 8, Notes) Using multivariate analysis methods for variable reduction. Segmentation and subsetting of large databases. Extracting information from large datasets. Recursive Partitioning and trees. 6. Additional topics (if time permits). Time dependent processes (if time permits: From textbook and ref 1) Smoothing techniques, Running Medians, splitting and Hanning. Lag dependent plots for autocorrelation analysis. AR, ARMA, ARIMA models for time series. Robustness. (From Textbook) Introduction to more refined estimation techniques. L-estimators, R-estimators, M-estimators. Text Book: Statistical Consulting, J. Cabrera , A. McDougall, Springer Verlag, 2002. Important References: 1. Understanding Robust and Exploratory Data Analysis. Hoaglin, Mosteller, and Tukey. Wiley, New York 1985. 2. The Basics of S and S-PLUS, Andreas Krause, Melvin Olson. Springer Verlag 1999. 3. Modern Applied Statistics with S-PLUS. W.N. Venables, B.D. Ripley. Third Edition. Springer Verlag 1999. 4. SAS, Applications Programming. A Gentle Introduction. Frank C. Dilorio. PWS-Kent.(Or any other SAS book). Other Suggested References: 5. "Practical Regression and Anova using R." Julian J. Faraway. PDF doc from CRAN: http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf 6. Exploratory Data Analysis. Mosteller, and Tukey. Addison Wesley. 7. Data Analysis and Regression. Tukey. Addison Wesley. 8. Elements of Statistical Computing. Ronald A. Thisted. Chapman and Hall, New York 1985. Tentative class schedule: Sept. 2 Review. Introduction to SAS statistical system. (ARC lab) Sept. 9 Introduction to SAS statistical system(cont'd), R. (ARC lab) Sept. 16 Introduction to R(cont'd). Introduction to EDA (ARC lab), Sept. 23 EDA: Graphical and numerical summaries. Intro to Project I. Sept. 30 EDA: Power transformations, Spread VS level, symmetries. Project I(due around Oct 6th). Oct. 7 EDA: Linear and nonlinear fits. Robust fits. (ARC lab) Oct. 14 EDA: Continue Linear fits. Two way designs. Median Polish. Oct. 21 DM: Logistic regression. Intro Project II. (ARC lab) Oct. 27 Project II (cont'd). (due Nov 3rd) Nov. 4 DM: Dimension Reduction with Principal Components Analysis (PCA). (ARC lab) Nov. 11 DM: Factors Analysis (FA).Cluster Analysis and data segmnentation. Intro to Project III. Nov. 18 DM: Project III. Discriminant Analysis. (ARC lab) Nov. 25 No class: Thanksgiving week. Dec. 3 DM: Classification, Pattern Recognition. Projct III cont'd. Dec. 10 DM: Recursive partitioning. Project III. Due Dec 12.