CA2 Data Analytics with R

Intro, What is “r”

R sound like a character from James bond, and in ways it kind of is! R is a free application developing statistics and graphics, number crunching in graphical form! R can be run on most if not all platforms UNIX  Windows and MacOS.

AT&T developed R and its named partly after the first names of its designers Ross Ihaka and Robert Gentleman in 1992. R’s ability to produce quality static graphics, dynamic and interactive graphics make it a much used tool in the world of data graphing.

Me and “R”

Coding does not come naturally to me, I struggle with it most of the time, however R is relitavly easy to use and you get lots back in return for a little code..

I struggled with understanding how to upload R’s data, of which it uses to compile the graphs but once I got the links to data sources I looked for an interesting one, I picked the TITANIC passenger list, its based on the different classes of the passengers and their survival rate, the figures which most of us are aware of regarding your chances of being on a life boat depended on your place in society, once you graph the figures and look at them in an image form quite stark. The graph really showed the enormity of the disaster and the consequences for those of the lower classes.

I will list the below sites that helped me and you tube videos I found helpful.

First things first, complete the “TRY R” from Code School http://tryr.codeschool.com/

r-blog-image-1

Once completed I downloaded R to my laptop from this site:                         https://cran.r-project.org/bin/windows/base/

In order to complete my own R data analysis I need to have R on my machine. 

Here is an example of the console interface of the R console

r-blog-console

I used some other sites to help me with understanding R, data types, importing data, exporting data and viewing data. Here is a list of useful sites:

http://www.statmethods.net/input/contents.htm

r-blog-handy-sites

http://www.r-tutor.com/r-introduction/data-frame/data-import

r-blog-handy-sites1

https://www.datacamp.com/community/tutorials/r-data-import-tutorial#gs.H5qvyoU

r-blog-handy-sites2

Next I uploaded some CSV files, A CSV is a comma separated values file, which allows data to be saved in a table structured format, data can be imported via the WEB or locally from your machine. This video on You Tube is really good!!

https://youtu.be/I1K3ZijJ3LM

r-blog-you-tube

Entering my data into R, the commands:

titanic-load

Enter data

>load(“C:\\Users\\gradunne\\Downloads\\Titanic.csv”)

Read Data

>titanic<-read.table(“C:\\Users\\gradunne\\Downloads\\Titanic.csv”, header = T, sep=”,”)

Plot my data

>plot (titanic aes(x =PClass, fill = factor(Survived))) +geom_histogram(width=1.1) + xlab(“PClass”)+ ylab(“total”) + labs(fill = “Survived”)

r-blog-histo

In my graph I have an X axis of total survivors against first, second and third class passengers, in order to make the data stand out more I have used an optional colour code, the concept is quite clear, the wealthy survived where the poor in the lower classes did not. The Green is those who survived and the Red are those who died, being a working class male on the Titanic was a death sentence.

passengers-by-cat

Graphs really do paint a picture that often word and numbers never will.

In his book The Visual Miscellaneum  David McCandless says,  “We’re all visual now, every day every hour maybe even every minute we’re seeing and absorbing information, we’re steeped in it, maybe even lost in it, so perhaps what we need are well -designed colorful and hopefully – useful charts to help us navigate”

visual