Ph.D., University of California, 2001
Professional Summary/CV [.PDF]
Department of Statistics, School of Arts and Sciences, New Brunswick; Rutgers
Areas of Interest
Model Selection in Clustering and Classification, Non-parametric Methods for Clustering and Classification, Data Depth, Computational Biology, Information Theory, MDL (Minimum description length), and Image Processing.
Statistics, Data Interpretation, and Regression.
Memberships and Professional Service
Member, New Researchers’ Conference Committee, 2006-2007; Consultant, Bell Labs, Lucent Technologies, 2000; Member, Steering Committee, The Barcode of Life Data Analysis Working Group; Consultant, Statistics Consulting Service, University of California at Berkeley, 1998 and 2000.
Grants, Honors, and Awards
Principal Investigator, National Science Foundation, “Clustering: visualization, validation and response oriented methods,” 2003-2006; Bioinformatics Center Grant, Environmental Protection Agency, Environmental and Occupational Health Sciences Institute; National Instiute of Health-Alcohol and Drug Program Adminstration Grant, Tobacco Dependency.
Academic Interests and Plans
I am interested in developing new statistical methodology for Clustering and Model Selection. These days, data often have a complex structure, and the clustering techniques we apply should reflect this. Recently, I have focused on Multi-level mixture models, which can incorporate multiple distance metrics into clustering simultaneously, and be used to analyze multi-factor experiments. In addition, I am working on subset selection problems in clustering. Some cluster profiles have a simple structure, and by using a full parameterization of these clusters we are "wasting parameters". Simultaneous subset selection via rate-distortion theory lets us select sparse representations for all clusters. This makes interpretation of the clusters more objective, and allows us to detect more distinct clusters. I am currently working on extending the multi-level models and simultaneous selection to regularized estimation schemes, and more complex experimental designs.
My research is often motivated by my collaborative projects. I have a long-standing collaboration with the The Hart Lab, W.M. Keck Center for Collaborative Neuroscience. The Hart Lab is particularly interested in the genomic regulation of neuronal development.
I am working on the development of computational tools for the Analysis of Genomics data, including missing value imputation and significance analysis of microarray data.
I am involved in several projects related to the notion of Data Depth. Data depth allows for the generalization of the median and rank-order to a multivariate setting. I am currently working on robust analysis of functional data via data depth, applicable to the analysis of time-course gene expression data.