There are hundreds of functions in Excel and Calc; here are the ones that I find most useful for statistics and general data handling. Note that where the argument (the part in parentheses) of a function is "\(Y\)", it means a single number or a single cell in the spreadsheet. Where the argument says "\(Ys\)", it means more than one number or cell. See AVERAGE(Ys) for an example.
All of the examples here are given in Excel format. Calc uses a semicolon instead of a comma to separate multiple parameters; for example, Excel would use "=ROUND(A1, 2)" to return the value in cell \(A1\) rounded to \(2\) decimal places, while Calc would use "=ROUND(A1; 2)". If you import an Excel file into Calc or export a Calc file to Excel format, Calc automatically converts between commas and semicolons. However, if you type a formula into Calc with a comma instead of a semicolon, Calc acts like it has no idea what you're talking about; all it says is "#NAME?".
I've typed the function names in all capital letters to make them stand out, but you can use lower case letters.
Math functions
ABS(Y) Returns the absolute value of a number.
EXP(Y) Returns \(e\) to the \(y^{th}\) power. This is the inverse of LN, meaning that "=EXP(LN(Y))" equals \(Y\).
LN(Y) Returns the natural logarithm (logarithm to the base e) of \(Y\).
LOG10(Y) Returns the base-\(10\) logarithm of \(Y\). The inverse of LOG is raising \(10\) to the \(Y^{th}\) power, meaning "=10^(LOG10(Y))" returns \(Y\).
RAND() Returns a pseudorandom number, equal to or greater than zero and less than one. You must use empty parentheses so the spreadsheet knows that RAND is a function. For a pseudorandom number in some other range, just multiply; thus "=RAND()*79" would give you a number greater than or equal to \(0\) and less than \(79\). The value will change every time you enter something in any cell. One use of random numbers is for randomly assigning individuals to different treatments; you could enter "=RAND()" next to each individual, Copy and Paste Special the random numbers, Sort the individuals based on the column of random numbers, then assign the first \(10\) individuals to the placebo, the next \(10\) individuals to \(10 mg\) of the trial drug, etc.
A "pseudorandom" number is generated by a mathematical function; if you started with the same starting number (the "seed"), you'd get the same series of numbers. Excel's pseudorandom number generator bases its seed on the time given by the computer's internal clock, so you won't get the same seed twice. There are problems with Excel's pseudorandom number generator that make it inappropriate for serious Monte Carlo simulations, but the numbers it produces are random enough for anything you're likely to do as an experimental biologist.
ROUND(Y,digits) Returns \(Y\) rounded to the specified number of digits. For example, if cell \(A1\) contains the number \(37.38\), "=ROUND(A1, 1)" returns \(37.4\), "=ROUND(A1, 0)" returns \(37\), and "=ROUND(A1, -1)" returns \(40\). Numbers ending in \(5\) are rounded up (away from zero), so "=ROUND(37.35,1)" returns \(37.4\) and "=ROUND(-37.35)" returns \(-37.4\).
SQRT(Y) Returns the square root of \(Y\).
SUM(Ys) Returns the sum of a set of numbers.
Logical functions
AND(logical_test1, logical_test2,...) Returns TRUE if logical_test1, logical_test2... are all true, otherwise returns FALSE. As an example, let's say that cells \(A1,\; B1\; \text {and}\; C1\) all contain numbers, and you want to know whether they're all greater than \(100\). One way to find out would be with the statement "=AND(A1>100, B1>100, C1>100)", which would return TRUE if all three were greater than \(100\) and FALSE if any one were not greater than \(100\).
IF(logical_test, A, B) Returns \(A\) if the logical test is true, \(B\) if it is false. As an example, let's say you have \(1000\) rows of data in columns \(A\) through \(E\), with a unique ID number in column \(A\), and you want to check for duplicates. Sort the data by column \(A\), so if there are any duplicate ID numbers, they'll be adjacent. Then in cell \(F1\), enter "=IF(A1=A2, "duplicate","ok"). This will enter the word "duplicate" if the number in \(A1\) equals the number in \(A2\); otherwise, it will enter the word "ok". Then copy this into cells \(F2\) through \(F999\). Now you can quickly scan through the rows and see where the duplicates are.
ISNUMBER(Y) Returns TRUE if \(Y\) is a number, otherwise returns FALSE. This can be useful for identifying cells with missing values. If you want to check the values in cells \(A1\) to \(A1000\) for missing data, you could enter "=IF(ISNUMBER(A1), "OK", "MISSING")" into cell \(B1\), copy it into cells \(B2\) to \(B1000\), and then every cell in \(A1\) that didn't contain a number would have "MISSING" next to it in column \(B\).
OR(logical_test1, logical_test2,...) Returns TRUE if one or more of logical_test1, logical_test2... are true, otherwise returns FALSE. As an example, let's say that cells \(A1,\; B1\; \text{and}\; C1\) all contain numbers, and you want to know whether any is greater than \(100\). One way to find out would be with the statement "=OR(A1>100, B1>100, C1>100)", which would return TRUE if one or more were greater than 100 and FALSE if all three were not greater than 100.
Statistical functions
AVERAGE(Ys) Returns the arithmetic mean of a set of numbers. For example, "=AVERAGE(B1..B17)" would give the mean of the numbers in cells \(B1..B17\), and "=AVERAGE(7, A1, B1..C17)" would give the mean of \(7\), the number in cell \(A1\), and the numbers in the cells \(B1..C17\). Note that Excel only counts those cells that have numbers in them; you could enter "=AVERAGE(A1:A100)", put numbers in cells \(A1\) to \(A9\), and Excel would correctly compute the arithmetic mean of those \(9\) numbers. This is true for other functions that operate on a range of cells.
BINOMDIST(S, K, P, cumulative_probability) Returns the binomial probability of getting \(S\) "successes" in \(K\) trials, under the null hypothesis that the probability of a success is \(P\). The argument "cumulative_probability" should be TRUE if you want the cumulative probability of getting \(S\) or fewer successes, while it should be FALSE if you want the probability of getting exactly \(S\) successes. (Calc uses \(1\) and \(0\) instead of TRUE and FALSE.) This has been renamed "BINOM.DIST" in newer versions of Excel, but you can still use "BINOMDIST".
CHIDIST(Y, df) Returns the probability associated with a variable, \(Y\), that is chi-square distributed with \(df\) degrees of freedom. If you use SAS or some other program and it gives the result as "Chi-sq=78.34, 1 d.f., P<0.0001", you can use the CHIDIST function to figure out just how small your \(P\) value is; in this case, "=CHIDIST(78.34, 1)" yields \(8.67\times 10^{-19}\). This has been renamed CHISQ.DIST.RT in newer versions of Excel, but you can still use CHIDIST.
CONFIDENCE(alpha, standard-deviation, sample-size) Returns the confidence interval of a mean, assuming you know the population standard deviation. Because you don't know the population standard deviation, you should never use this function; instead, see the web page on confidence intervals for instructions on how to calculate the confidence interval correctly.
COUNT(Ys)Counts the number of cells in a range that contain numbers; if you've entered data into cells \(A1\) through \(A9,\; A11,\; \text{and}\; A17\), "=COUNT(A1:A100)" will yield \(11\).
COUNTIF(Ys, criterion)Counts the number of cells in a range that meet the given criterion.
"=COUNTIF(D1:E1100,50)" would count the number of cells in the range \(D1:E100\) that were equal to \(50\);
"=COUNTIF(D1:E1100,">50")" would count the number of cells that had numbers greater than \(50\) (note the quotation marks around ">50");
"=COUNTIF(D1:E1100,F3)" would count the number of cells that had the same contents as cell \(F3\);
"=COUNTIF(D1:E1100,"Bob")" would count the number of cells that contained just the word "Bob". You can use wildcards; "?" stands for exactly one character, so "Bo?" would count "Bob" or "Boo" but not "Bobble", while "Bo*" would count "Bob", "Boo", "Bobble" or "Bodacious".
DEVSQ(Ys)Returns the sum of squares of deviations of data points from the mean. This is what statisticians refer to as the "sum of squares." I use this in setting up spreadsheets to do anova, but you'll probably never need this.
FDIST(Y, df1, df2) Returns the probability value associated with a variable, \(Y\), that is \(F\)-distributed with \(df1\) degrees of freedom in the numerator and \(df2\) degrees of freedom in the denominator. If you use SAS or some other program and it gives the result as "F=78.34, 1, 19 d.f., P<0.0001", you can use the FDIST function to figure out just how small your \(P\) value is; in this case, "=FDIST(78.34, 1, 19)" yields \(3.62\times 10^{-8}\). Newer versions of Excel call this function F.DIST.RT, but you can still use FDIST.
MEDIAN(Ys) Returns the median of a set of numbers. If the sample size is even, this returns the mean of the two middle numbers.
MIN(Ys) Returns the minimum of a set of numbers. Useful for finding the range, which is MAX(Ys)-MIN(Ys).
MAX(Ys) Returns the maximum of a set of numbers.
NORMINV(probability, mean, standard_deviation) Returns the inverse of the normal distribution for a given mean and standard deviation. This is useful for creating a set of random numbers that are normally distributed, which you can use for simulations and teaching demonstrations; if you paste "=NORMINV(RAND(),5,1.5)" into a range of cells, you'll get a set of random numbers that are normally distributed with a mean of \(5\) and a standard deviation of \(1.5\).
RANK.AVG(X, Ys, type) Returns the rank of \(X\) in the set of \(Ys\). If type is set to \(0\), the largest number has a rank of \(1\); if type is set to \(1\), the smallest number has a rank of \(1\). For example, if cells \(A1:A8\) contain the numbers \(10,\; 12,\; 14,\; 14,\; 16,\; 17,\; 20,\; 21\) "=RANK(A2, A$1:A$8, 0)" returns \(7\) (the number \(12\) is the \(7^{th}\) largest in that list), and "=RANK(A2, A$1:A$8, 1)" returns \(2\) (it's the 2nd smallest).
The function "RANK.AVG" gives average ranks to ties; for the above set of numbers, "=RANK.AVG(A3, A$1:A$8, 0)" would return \(5.5\), because the two values of \(14\) are tied for fifth largest. Older versions of Excel and Calc don't have RANK.AVG; they have RANK, which handled ties incorrectly for statistical purposes. If you're using Calc or an older version of Excel, this formula shows how to get ranks with ties handled correctly:
\[=\text{AVERAGE}(\text{RANK}(A1,\; A\$1:A\$8,\; 0),\; 1+\text{COUNT}(A\$1:A\$8)-\text{RANK}(A\$1,\; A\$1:A\$8,\; 1))\]
STDEV(Ys) Returns an estimate of the standard deviation based on a population sample. This is the function you should use for standard deviation.
STDEVP(Ys) Returns the standard deviation of values from an entire population, not just a sample. You should never use this function.
SUM(Ys) Returns the sum of the \(Ys\).
SUMSQ(Ys) Returns the sum of the squared values. Note that statisticians use "sum of squares" as a shorthand term for the sum of the squared deviations from the mean. SUMSQ does not give you the sum of squares in this statistical sense; for the statistical sum of squares, use DEVSQ. You will probably never use SUMSQ.
TDIST(Y, df, tails) Returns the probability value associated with a variable, \(Y\), that is \(t\)-distributed with \(df\) degrees of freedom and tails equal to one or two (you'll almost always want the two-tailed test). If you use SAS or some other program and it gives the result as "t=78.34, 19 d.f., P<0.0001", you can use the TDIST function to figure out just how small your \(P\) value is; in this case, "=TDIST(78.34, 19, 2)" yields \(2.55\times 10^{-25}\). Newer versions of Excel have renamed this function T.DIST.2T, but you can still use TDIST.
VAR(Ys) Returns an estimate of the variance based on a population sample. This is the function you should use for variance.
VARP(Ys) Returns the variance of values from an entire population, not just a sample. You should never use this function.