**Week #30- The R Project for Statistical Computing**

Challenge submitted by Paul-.

**CHALLENGE:**

Learn the basisc of the R language and environment, to perform some simple data manipulation and analysis tasks.

**INTRODUCING THE LANGUAGE/TECHNOLOGY:**

R is both a language and a programming environment, used primarily for data analysis and statistical applications. As a language, it is the GNU implementation of S, a high level, functional programming language. One of its more interesting and important features is the extensive use of vectors and matrices as data types. It takes some getting used to this paradigm, if it is the first time you encounter it. Although typical looping structures of procedural programming (for, while) are available, it is more efficient to avoid them, and use vectors or matrices instead. For example, the computation of the squares of a list of numbers can be done in a single step. The code

n=1:20 n*n

will output the squares of the numbers from 1 to 20.

Here are a few more descriptive words from the R project web site:

*"R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes*

- an effective data handling and storage facility,
- a suite of operators for calculations on arrays, in particular matrices,
- a large, coherent, integrated collection of intermediate tools for data analysis,
- graphical facilities for data analysis and display either on-screen or on hardcopy, and
- a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities."

**IDEAS:**

- Do one or more exercises from "Using R for Data Analysis and Graphics" by John Maindonald (http://cran.r-projec...trib/usingR.pdf).
- Write a function which takes an array of numbers and a single value x. Return the subset of the array with values greater than x. Compare the running time of implementations which use or do not use a for loop.
- Find magic squares of order n. A magic square of order n has n*n cells in which the numbers from 1 to n^2 are placed. The sums of the numbers across each row, each columns, and the 2 diagonals has to be the same. Try minimizing the use of explicit loops.
- Explore the dataset "florida". It records the number of votes each candidate received by county in the 2000 United States presidential election in the state of Florida. Make a plot that shows the relationship between the number of votes for Bush against the number of votes for Buchanan. Look for a trend, and find any outliers. Consider the data of Miami-Dade county, home of the infamous butterfly ballot. It has been suggested that many votes intended for Gore mistakenly went to Buchanan in this county. Based on the general trend of the data in other counties, predict the number of votes that Gore supposedly lost in Miami-Dade.

**RESOURCES:**

There is a large number of tutorials available for R, at different levels of depth. The following are two, taken directly from the R project web site. I think they are clear, and to the point.

"An Introduction to R" (http://cran.r-projec...g/manuals.html/) by Venables and Smith gives an overview of the main features of both the language and the environment. Appendix A, "A sample session", is a must for any beginner. Look at specific chapters for the topics that interest you most.

The first 2 chapters of "Using R for Data Analysis and Graphics - Introduction, Examples and Commentary" by John Maindonald (http://cran.r-projec...trib/usingR.pdf), is perhaps an even better introduction to R. The rest of the chapters are probably beyond a one week challenge.

You may also benefit from the extensive online help in the R software, and an active forum at http://n4.nabble.com/R-f789695.html.

**HOW TO GET STARTED:**

R is straght-forward to install. From the project main web site (http://www.r-project.org/) you can access CRAN, the R software repository. Binary distributions are available for Windows, Mac, and Linux.

CRAN also holds a long list of contributed packages. For the 1 week challenge you are unlikely to need any, so stick to the base distribution.

After installation, fire up R and you will get an interactive window with a ">" prompt. Type in your commands and see what happens. You can start with:

> "Hello R!" > demo("graphics")

Enjoy!