Ryan Womack, for the IASSIST 2011 Workshop, May 31, 2011
When working with the script files, remember to uncomment the install.packages commands for any packages you don't already have on your system.
One of R's known limitations is that base R must manipulate data in active memory. This effectively limits the size of datasets one can work with to about half of the computer's memory (since R needs some space to copy and manipulate the original data). Fortunately, there are many ways to work with larger datasets. Here are a couple:
The bigmemory project and its associated programs, biganalytics, biglm, etc., provide tools to manipulate large matrices.
The bigmemory.R script file provides a short example of how this works. This example uses 2008 data from the American Statistical Association's Data Expo '09. For further detail see the Overview and other documentation on the bigmemory site.
R can connect to data stored in databases, and only extract the parts necessary for immediate analysis. For most large data stores, this will be the most convenient way to operate. This Revolutions blog post provides a starting point, including a link to the useful presentation by Jeffrey Breen.
To see the syntax for connecting to and extracting data from a MySQL database, see RMySQL.R. This uses a local database on my system, so you would have to adapt for your own databases to replicate.
New applications to connect to web services for data purposes are constantly being developed for R. Check out these two quick examples:
This R bloggers post describes a quick and easy way to use the GoogleVis data visualization API to create your own maps.
The googleVis.R script lets you try this yourself. Note that the map function was not working as of May 28 (although it had worked previously), although the Data Motion app is fine. This is a copy of the data used in the blog post.
Here's another example drawn from a blog post explaining how to produce a chloropleth map. Try the chloropleth.R file to replicate.
Another blog post describes how to do text data mining and cluster analysis using Twitter feeds.
The twitter.R script lets you try some of this out for IASSIST tweets.