last: 25-Nov-2012 by Craig Shallahamer, craig@orapub.com

------------------------------------------------
Using R for Oracle Performance Analysis
------------------------------------------------

You may have noticed I used Mathematica for most of my statistical analysis. While it's an incredibly amazing product, it is not free. Using "R" is a fantastic way to perform statistical analysis...and it's free. What's R?

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. 

http://www.r-project.org

My objective is to provide a fast, inexpensive, yet solid way to analyze sets of Oracle performance data. To keep things simple, the data sets must already be nicely formatted.

In particular, these are my assumptions:

1. Each sample data set resides in its own data file. For example:
$ cat sample1.dat
1042.35379
1041.156993
...

2. You want to better understand each sample set; numerically and visually
3. You may want to compare the two sample sets; numerically and visually

To get started:

First, download the R software at www.r-project.org. It takes seconds. The easiest software install ever!

Second, be aware you need to modify the "read.table" directory paths below for your environment.

Third, have fun!

------------------------------------------------------
-- Basic Sample Set Statistical Analysis
--
-- Load "ds" with data.
rawData<-read.table("/Users/cshallah/Desktop/R_Intro/cbc8192.dat",header=FALSE)
attach(rawData)
ds<-V1

-- Learn about your data
--
fivenum(ds)
summary(ds)
hist(ds)
plot(density(ds))
-- normality test. p > 0.05 is likely normal
shapiro.test(ds)

------------------------------------------------------
-- Comparing Two Sample Sets
--
-- Load ds1 with your first sample set
--
rawData<-read.table("/Users/cshallah/Desktop/R_Intro/cbc8192.dat",header=FALSE)
attach(rawData)
ds1<-V1
dends1<-density(ds1)

-- Do Basic Analysis on sample set one, ds1
-- set ds and run the "Learn about your data" steps in the prior section
ds<-ds1

--
-- Load ds2 with your second sample set
--
rawData<-read.table("/Users/cshallah/Desktop/R_Intro/cbc32768.dat",header=FALSE)
attach(rawData)
ds2<-V1
dends2<-density(ds2)

-- Do basic analysis on sample set two, ds2
-- set ds and run the "Learn about your data" steps in the prior section
ds<-ds2

-- Now compare the two sample sets
--   If p > 0.05 the sets likely came from the same population.
--
-- First, T Test. Both data sets be normal for valid T Test.
t.test(ds1,ds2)

-- Second, if one or both of the data sets are not normal, a location equivalence test is required... no problem...
wilcox.test(ds1,ds2)

-- Let's visually look at both data sets.
--   copy/paste the below three lines
plot(range(dends1$x, dends2$x), range(dends1$y, dends2$y), type = "n", xlab = "Sample Values", ylab = "Probability", main="Probability Density Curve\nss1:red, ss2:blue") 
lines(dends1, col = "red") 
lines(dends2, col = "blue")

quit()

-- END