Statistical Computing Blog

Wednesday, March 28, 2012

The Julia Language

The purpose of this post is to mention the Julia Language. It is a new language for technical computing. Its main strength is that it runs faster than R, MATLAB...etc. The code is compiled Just-In-Time. In the backend, amongst other things, it has LAPACK and ARPACK.

So check out http://julialang.org/

Saturday, March 24, 2012

R Programming Syntax Quickstart

If you have ANY programming experience in other languages, this guide will get you started in R very quickly.

Logic Operators

a == b	a equals b
a != b	a is not equal to b
a > b	a is greater than b
a < b	a is less than b
a >= b	a is greater than OR equal to b
a <= b	a is less than OR equal to b
(condition 1) & (condition 2)	(condition 1) AND (condition 2)
(condition 1) \| (condition 2)	(condition 1) OR (condition 2)

Also, try the following to understand "&&" and "||":



> a<-c(1:10) > b<-a > c<-b > c[1:4]<-.5 > (a == b) && (a > c)

[1] TRUE

> (a == b) & (a > c)

[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

> (a == b) | (a > c)

[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

> (a == b) || (a > c)

[1] TRUE

IF statements

The general example:



if( condition ) {



} else if(  other condition ){



} else {



}

The specific example:



a<-55 

if( a <= 54.9 ) {

   print("a is less than or equal to 54.9") 

} else if( a == 55 ){

   print("a equals 55")

} else {

   print("a is greater than 54.9 and not 55") 

}

For Loops

The general example:



for(variable in vector) {



}

Specific examples:



#example 1

for(i in 1:10) {

   print(i)

}



#example 2

index.vector<-c(4,3,7,5)

numberz<-runif(10)

print(numberz)



for(i in index.vector) {

   print(numberz[i])

}



#example 3

for(i in 1:10) {

   if(i == 3) {

      next

   } else if(i == 7) {

      break

   }

   print(i)

}



#example 4

mat<-matrix(0,3,4)

print(mat)



for(i in 1:3) {

   for(j in 1:4) {

      mat[i,j]<-rnorm(1)

   }

}

While Loops

General Example:



while(condition) {



}

Note that you must something write something within the while that will update at least one of the variables in the condition. Otherwise, you could have a perpetual loop.

Specific Example:



i<- -1

while( i < 10) {

print(i)
i<-i+1
}

Repeat Loop

In a repeat loop, you not only explicitly update variables, you must also explicitly test the condition.
Specific Example:



i<- -1

repeat{

   print(i)

   i<-i+1

   if( i == 10) {

      break

   }

}

Functions

For example, you could have a function that evaluates a formula. A function can call other functions.

General Example:



function_name<-function(parameters) {





   return(return_variable)

}

Specific Example:



calcQuadratic<-function(a, b, c, x) {

   y<-a*x*x+b*x+c

   return(y)

}



calcQuadratic(2,3,5,.07)



my.var<-calcQuadratic(3.32,7.6,5.999,3.2)

print(my.var)

BANG!!

Testing for seasonal unit roots in R

I will explain seasonal unit root testing in R. Briefly, R is a language for statistical computing. It is very similar to MATLAB, SAS...etc. The website is http://www.r-project.org

Suppose that a our dataset is seasonal and that we intend to use a seasonal ARIMA model. We need to test our time to see if it is seasonal integrated.

Version 3.x of the "forecast" R package has a new function for testing for seasonal unit roots. The function is nsdiffs().

R also comes with a US Accidental Deaths dataset.

So to follow along, open up R and type the following:



>USAccDeaths

You will then see the US Accidental Deaths dataset. You can see that it is monthly.

Now install the "forecast" R package from CRAN. Then load it.

To view the help file for the nsdiffs() type:



>?nsdiffs

It will bring up a page that is for both nsdiffs and ndiffs.

There are two tests that have been implemented in nsdiffs, the OCSB test (default) and the Canova-Hansen test. You can also speicify the seasonal period of your dataset. USAccDeaths is a TS object and the seasonal period or "frequency" is a data member of the USAccDeaths/TS object.

To perform the OCSB test:



>nsdiffs(USAccDeaths)

To perform the Canova-Hansen test:



>nsdiffs(USAccDeaths, test="ch")

The ouput: "1" means that there is a seasonal unit root and "0" that there is no seasonal unit root.

You will notice that the two different tests give two different answers. This is because the Canova-Hansen test is less likely to decide in favour of a seasonal unit root than the OCSB test. Unlike the Canova-Hansen test, the OCSB test has a null hypothesis of a unit root. The USAccDeaths dataset is "on the edge". Osborn (1990) writes that when in doubt, it's better to seasonally difference.

Bibliography:
Osborn, DR (1990) "A survey of seasonality in UK macroeconomic variables", International Journal of Forecasting 6(3):327-336

Osborn DR, Chui APL, Smith J, and Birchenhall CR (1988) "Seasonality and the order of integration for consumption", Oxford Bulletin of Economics and Statistics 50(4):361-377.

Canova F and Hansen BE (1995) "Are Seasonal Patterns Constant over Time? A Test for Seasonal Stability", Journal of Business and Economic Statistics 13(3):237-252.

Analytics Blog

I started with the ""Insurance Blog", which started as keyword laden drivel. However, there is a limited amount of drivel that I can produce. Eventually, good statistical computing info started to flow - interspersed with keyword laden drivel. According so some research by some start-up analytics company, "insurance" is one of the highest paying keywords. ;-)

When I saw in Google Analytics that my drivel blog was coming up in searches and actually helping people, I became proud of my content. So here is a blog that I am completely proud of: all the info, without the drivel.