Writing your own functions in r
Abstracting your code into many small functions is key for writing nice R code. In our experience, biologists are initially reluctant to use functions in their code. Where people do use functions, they don’t use them enough, or try to make their functions do too much at once.
R has many built in functions, and you can access many more by installing new packages. So there’s no-doubt you already use functions. This guide will show how to write your own functions, and explain why this is helpful for writing nice R code.
Many of the benefits of using functions are more obvious by demonstration than by description. First exhibit is a script that does not use functions. We think that this is typical of the sort of scripts that ecologists end up with when analysing data.
This script can be simplified considerably by using functions, as this script shows.
Note that while we’ve called these “before” and “after”, more typically the process of moving code into functions happens incrementally while writing code.
Writing your own R functions
Below we briefly introduce function syntax, and then look at how functions help you to write nice R code. Nice coders with more experience may want to skip the first section.
Writing functions is simple. Paste the following code into your console
You have now created a function called sum.of.squares which requires two arguments and returns the sum of the squares of these arguments. Since you ran the code through the console, the function is now available, like any of the other built-in functions within R. Running sum.of.squares(3,4) will give you the answer 25 .
The procedure for writing any other functions is similar, involving three key steps:
- Define the function,
- Load the function into the R session,
- Use the function.
Defining a function
Functions are defined by code with a specific format:
function.name. is the function’s name. This can be any valid variable name, but you should avoid using names that are used elsewhere in R, such as dir. function. plot. etc.
arg1, arg2, arg3. these are the arguments of the function, also called formals. You can write a function with any number of arguments. These can be any R object: numbers, strings, arrays, data frames, of even pointers to other functions; anything that is needed for the function.name function to run.
Some arguments have default values specified, such as arg3 in our example. Arguments without a default must have a value supplied for the function to run. You do not need to provide a value for those arguments with a default, as the function will use the default value.
The ‘…’ argument. The. or ellipsis, element in the function definition allows for other arguments to be passed into the function, and passed onto to another function. This technique is often in plotting, but has uses in many other places.
Function body. The function code between the within the brackets is run every time the function is called. This code might be very long or very short. Ideally functions are short and do just one thing – problems are rarely too small to benefit from some abstraction. Sometimes a large function is unavoidable, but usually these can be in turn constructed from a bunch of small functions.
More on that below.
Return value. The last line of the code is the value that will be returned by the function. It is not necessary that a function return anything, for example a function that makes a plot might not return anything, whereas a function that does a mathematical operation might return a number, or a list.
Load the function into the R session
For R to be able to execute your function, it needs first to be read into memory. This is just like loading a library, until you do it the functions contained within it cannot be called.
There are two methods for loading functions into the memory:
- Copy the function text and paste it into the console
- Use the source() function to load your functions from file.
Our recommendation for writing nice R code is that in most cases, you should use the second of these options. Put your functions into a file with an intuitive name, like plotting-fun.R and save this file within the R folder in your project. You can then read the function into memory by calling:
From the point of view of writing nice code, this approach is nice because it leaves you with an uncluttered analysis script, and a repository of useful functions that can be loaded into any analysis script in your project. It also lets you group related functions together easily.
Using your function
You can now use the function anywhere in your analysis. In thinking about how you use functions, consider the following:
- Functions in R can be treated much like any other R object.
- Functions can be passed as arguments to other functions or returned from other functions.
- You can define a function inside of another function.
A little more on the ellipsis argument
The ellipsis argument. is a powerful way of passing an arbitrary number of functions to a lower level function. This is how
and our code would still work.
Avoiding coding errors
By using functions, you limit the scope of variables. In the logit function, the p variable is only valid within the body of the logit function – it is unaffected by any other variable called p and it does not affect any other variable called p. This means when you read code you don’t have to look elsewhere to reason about what values variables might take.
Along similar lines, as much as possible functions should be self contained and not depend on things like global variables (these are variables you’ve defined in the main workspace that would show up in RStudio’s object list).
Becoming more productive
Functions enable easy reuse within a project, helping you not to repeat yourself. If you see blocks of similar lines of code through your project, those are usually candidates for being moved into functions.
If your calculations are performed through a series of functions, then the project becomes more modular and easier to change. This is especially the case for which a particular input always gives a particular output.
How long is a piece of string?
In our experience, people seem to think that functions are only needed when you need to use a piece of code multiple times, or when you have a really large problem. However, many functions are actually very small. This post looks at the distribution of function length among R packages, and finds that long functions are the exception, rather than the norm.
This material written for coders with limited experience. Program design is a bigger topic than could be covered in a whole course, and we haven’t even begun to scratch the surface here. Using functions is just one tool in ensuring that your code will be easy for you to read in future, but it is an essential tool.
The more I write code, the more abstract it gets. And with more abstractions, the apps are easier to maintain. Been working for years&…
Justin Kimbrell (@justin_kimbrell) April 30, 2013
If you want to read more about function syntax, check out the following: