Quandl and R

I haven’t taught econometrics for over a year now, but the next time I do, I’ll be using Quandl!  Quandl is a repository of data: “when a user clicks on a dataset on Quandl, the Quandl engine goes to the original publisher of that data, retrieves the freshest version of that data, and presents it to the user.”  There’s a lot of data online, but it’s really nice that they’ve aggregated so much here and made it extremely easy to access.

They have a nice R package that hooks into the Quandl API, allowing you to seamlessly (once you have your authentication token) import data direct from their servers – circumventing one of the major issues for new R users – importing and getting the data structured correctly.

They’ve provided an extremely brief “econometrics” tutorial and their own R cheat sheet.


By | 2016-10-15T05:47:43+00:00 November 7th, 2013|R, Teaching|0 Comments

Stats jobs for undergraduates

Update December 2015: I’ve made a new page dedicated to the various kinds of jobs people with statistical training can apply for (including job descriptions). Check it out here: garthtarr.com/jobs-for-statisticians

I regularly get asked for advice about what undergraduate stats majors can do after their degree (particularly if they don’t want to end up in a bank or consulting company). The standard response is that statisticians can do anything, but if you want to use your stats skills specifically, here are some resources:


Lots of government departments take undergraduate and honours level statisticians, not just the ABS but also ATO, DEEWR, Defence (and specifically DSTO), ABARES, RBA, Treasury, Bureau of Crime Statistics & ResearchStatistics NZ … keep an eye out early in the year for grad programs. Also look into summer internships (e.g. ABS cadetships; RBA cadetships and the ABARES Summer Vacation program).

You could always become a teacher – not enough maths teachers at the high school level (or at the primary school level).  See, for example, the Teach for Australia program.

Private sector

Most (if not all) companies will appreciate a person with solid quantitative skills.  You could consider (to name just a few):

Within banks there are ways to use your statistics without doing financial work or trading.  For example the ANZ Bank has the Central Customer Analytics department and NAB has its Analytics and Research Operations department.

Further study

If you want to specialise further in statistics (without doing a PhD) you might consider a Masters in Statistics or Biostatistics. For example, UNSW has a decent Master of Statistics and the  School of Public Health here at the University of Sydney has a Master of Biostatistics.  There’s a program with NSW Health called NSW Biostatistical Officer Training Program which recruitments trainee biostatisticians every year (applications are usually due in November). While in the program, trainees work full-time in a variety of placements and undertake a Masters of Biostatistics part-time. NSW Health pays university and associated fees and study leave is given.  See also this blog post by Jerzy Wieczorek, mathematical statistician at the U.S. Census Bureau for some thoughts on Masters.

Job listings

You might want to subscribe to the ANZstat mailing list (make sure you set up a filter in your email program of choice so your inbox doesn’t get innundated with messages).  The jobs on this mailing list are often for people with a PhD but not always (for example, those NSW Health trainee biostats jobs get advertised on this mailing list).

There’s also the StatSci joblist and a page with more general information.

The Australian Mathematical Society (Aust MS) has a page on jobs for people with quantitative skills.

Sport statistics jobs

  1. Keep an eye on StatsJobs for potential openings. These are likely to be mostly higher level stats jobs (e.g. requiring a masters or higher) but there may be grad level positions. You could also keep an eye on the Sports Management Australia and New Zealand site.
  2. Go for positions in sports companies/relevant government agencies without a focus on stats, then (after a period of time) transfer into a more stats based job (if you go for a government job, they’re often really good about supporting further study, e.g. masters in stats). E.g. Department of Sports and Recreation 
  3. If you’re planning on heading overseas, the Royal Statistics Society (UK based organisation) has a Statistics in Sport section or the American Statistical Association have this advice. Unfortunately, there’s no equivalent in the Statistical Society of Australia Inc (SSAI).
  4. You could also look at companies like atass sports (UK based) or Statistical Sports Consulting (USA based). A dedicated stats company like this would give you the extra training in the appropriate areas that you’d need. But there doesn’t seem to be anything comparable in Australia (that I’ve been able to find). The next best would be to look for jobs with the Australian Institute of Sport, AFL, NRL, etc. directly.

City jobs

Most people know about the standard jobs in the city: investment banking, derivatives trading, management consulting, human resource consulting, other forms of consulting, … Those companies do a good job of getting the word out on campus about internships and grad positions.

By | 2016-10-15T05:47:46+00:00 August 28th, 2013|Statistics, Teaching|0 Comments

Recommended Reading

A student today asked if there were any books on statistics that I could recommend. He was after more generalist type books. I ended up sending him this list:

  1. The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy (USYD, Amazon) is a great (generalist) read on the progression of bayesian statistics. It’s a really fun read (for a book about statistics).
  2. The lady tasting tea : how statistics revolutionized science in the twentieth century (USYD, Amazon) I quite enjoyed this one, it’s nicely written history of some key stats players.
  3. Statistics on the table : the history of statistical concepts and methods (USYD, Amazon) is a bit dryer than the above two – I haven’t made it all the way through yet – waiting for a rainy day!
  4. Mostly harmless econometrics : an empiricist’s companion (USYD, Amazon) is quite a bit more technical than the above books and more focussed on econometrics (statistics for economics).
  5. The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t (Amazon) I’ve never read this but Nate Silver’s pretty hot right now.
  6. Probabilities : the little numbers that rule our lives (USYD, Amazon) I’m reading this on and off at the moment – it has some interesting observations.
By | 2016-10-15T05:47:47+00:00 April 11th, 2013|Statistics, Teaching|0 Comments

Hans Rosling’s 200 Countries, 200 Years, 4 Minutes

Hans Rosling from The Joy of Stats on BBC Four. Another excellent example of data communication. I use it in first year lectures to elicit discussion on the issues with aggregating data, in particular how a summary statistic can hide differences between subgroups. We also talk about how many variables are being plotted. It’s something different for them – it puts what they’re learning in a global context and shows statistics as being more than just calculating means and variances.

Pretty neat, eh?

By | 2016-10-15T05:47:47+00:00 December 15th, 2012|Statistics, Teaching|0 Comments

Missing the FUN

During my undergraduate (and now postgraduate) years, I often spent my evenings and weekends toiling over statistics assignments. I was always amused when R seemed to know and would sometimes return my favourite error, reminding me that I was missing the fun:

Error in match.fun(FUN) : argument "FUN" is missing, with no default

Of course, I just forgot to supply a function name a command like apply(). The apply() function is really useful way of extracting summary statistics from data sets. The basic format is

apply(array, margin, function, ...)
  • An array in R is a generic data type. A zero dimensional array is a scalar or a point; a one dimensional array is a vector; and a two dimensional array is a matrix…
  • The margin argument is used to specify which margin we want to apply the function to. If the array we are using is a matrix then we can specify the margin to be either 1 (apply the function to the rows of the matrix) or 2 (apply the function to the columns of the matrix).
  • The function can be any function that is built in or user defined (this is what I was missing when I got the error above).
  • The ... after the function refers to any other arguments that needs to be passed to the function being applied to the data.

The apply function internally uses a loop so if time and efficiency is very important one of the other apply functions such as

lapply(list, function, ...)

would be a better choice. The lapply command is designed for lists. It is particularly useful for data frames as each data frame is considered a list and the variables in the data frame are the elements of the list. Note that lapply doesn’t have a margin argument as it simply applies the function to each of the variables in the data frame.

You can see the difference in the example below.  The data set cars is a data frame that comes with R.

mode(cars) # what data type is cars?
[1] "list"
head(cars) # output the first six entries in the data set
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
6     9   10
apply(cars,2,mean) # calculate column means treating cars as a matrix (2D array)
speed  dist
15.40 42.98
lapply(cars,mean) # same thing treating cars as a data frame (list)
[1] 15.4

[1] 42.98

To show how much faster lapply is than apply, consider the following simulation:

X = matrix(rnorm(10000000),ncol=2)
   user  system elapsed 
  0.573   0.394   0.965 
   user  system elapsed 
  0.072   0.049   0.121 

To perform the same operation, the lapply function was nearly 8 times faster than the apply function. You need a reasonably large data set for this to make a noticeable difference, but it’s worth keeping in mind regardless.

To find out more about any of these functions or datasets use the help:

By | 2013-08-28T11:59:40+00:00 December 14th, 2012|R, Teaching|0 Comments

Selling Statistics

This video clip does a great job of selling statistics to a general audience (despite being created SAS). It’s only 2:30 mins – a good length for adding some interest at the start of a first year statistics unit.

“Statisticians help researchers keep children healthy”

Statistics: saving children’s lives since 1850.

By | 2016-10-15T05:47:50+00:00 December 12th, 2012|Statistics, Teaching|0 Comments

Law of large numbers

The first two minutes of this video for a graphical representation of the law of large numbers (the physicist’s center of gravity is the statistician’s mean).  It’s worth a look if only for the awesome 80’s styling and soundtrack.

By | 2016-10-15T05:47:51+00:00 December 10th, 2012|Statistics, Teaching|0 Comments