last: 25-Nov-2012 by Craig Shallahamer, craig@orapub.com ------------------------------------------------ Using R for Oracle Performance Analysis ------------------------------------------------ You may have noticed I used Mathematica for most of my statistical analysis. While it's an incredibly amazing product, it is not free. Using "R" is a fantastic way to perform statistical analysis...and it's free. What's R? R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. http://www.r-project.org My objective is to provide a fast, inexpensive, yet solid way to analyze sets of Oracle performance data. To keep things simple, the data sets must already be nicely formatted. In particular, these are my assumptions: 1. Each sample data set resides in its own data file. For example: $ cat sample1.dat 1042.35379 1041.156993 ... 2. You want to better understand each sample set; numerically and visually 3. You may want to compare the two sample sets; numerically and visually To get started: First, download the R software at www.r-project.org. It takes seconds. The easiest software install ever! Second, be aware you need to modify the "read.table" directory paths below for your environment. Third, have fun! ------------------------------------------------------ -- Basic Sample Set Statistical Analysis -- -- Load "ds" with data. rawData<-read.table("/Users/cshallah/Desktop/R_Intro/cbc8192.dat",header=FALSE) attach(rawData) ds<-V1 -- Learn about your data -- fivenum(ds) summary(ds) hist(ds) plot(density(ds)) -- normality test. p > 0.05 is likely normal shapiro.test(ds) ------------------------------------------------------ -- Comparing Two Sample Sets -- -- Load ds1 with your first sample set -- rawData<-read.table("/Users/cshallah/Desktop/R_Intro/cbc8192.dat",header=FALSE) attach(rawData) ds1<-V1 dends1<-density(ds1) -- Do Basic Analysis on sample set one, ds1 -- set ds and run the "Learn about your data" steps in the prior section ds<-ds1 -- -- Load ds2 with your second sample set -- rawData<-read.table("/Users/cshallah/Desktop/R_Intro/cbc32768.dat",header=FALSE) attach(rawData) ds2<-V1 dends2<-density(ds2) -- Do basic analysis on sample set two, ds2 -- set ds and run the "Learn about your data" steps in the prior section ds<-ds2 -- Now compare the two sample sets -- If p > 0.05 the sets likely came from the same population. -- -- First, T Test. Both data sets be normal for valid T Test. t.test(ds1,ds2) -- Second, if one or both of the data sets are not normal, a location equivalence test is required... no problem... wilcox.test(ds1,ds2) -- Let's visually look at both data sets. -- copy/paste the below three lines plot(range(dends1$x, dends2$x), range(dends1$y, dends2$y), type = "n", xlab = "Sample Values", ylab = "Probability", main="Probability Density Curve\nss1:red, ss2:blue") lines(dends1, col = "red") lines(dends2, col = "blue") quit() -- END