Selma, AL 36701

wolf

- Home
- Guide for Professionals
- Links
- Shamanism: Commentaries
- Original Musical Compositions
- The Timeline
- The Beginning: up to 4000 BC
- First Civilizations: 4000 - 2000 BC
- Migrations: 2000 - 1200 B.C.
- Iron Weapons 1200 to 700 B.C.
- The Axial Age 700 - 300 BC
- The Roman World 300 BC - 100 AD
- Early Christianity 100 - 400
- New Barbarian Kingdoms 400 - 700
- The Spread of Islam 700 - 1000
- The Crusades 1000 - 1250
- The Monguls Time in History 1250 - 1400
- Age of Discovery 1400 - 1500
- The Reformation 1500 - 1600
- The New Science 1600 - 1660
- The Age of Louis XIV 1660 - 1720
- The Enlightenment 1720 - 1760
- Revolutions 1760 - 1800
- The Industrial Revolution 1800 - 1825
- Liberalism and Nationalism 1825 - 1850
- Darwin and Marx 1850 - 1875
- The Age of Imperialism 1875 - 1900
- Europe Plunges Into War 1900-1925
- Depression and the Second World War 1925 - 1950
- The Modern World 1950 - 1970
- Globalism 1970 - 1990
- The Last Decade 1990 - 2000
- Werewolves in the 21st Century: 2000

- Religion and Weres
- Therianthropy
- Experiences
- Excursions
- The Stuff Files (Werewolf Gourmet)
- Stat Files
- Essays
- Verse
- Bibliography
- Howls page

A brief history of mathematics

Do I know how mathematics developed? No, I don't, but I have some ideas that are shared by most of the commentators on the subject. So, just sit back while I tell you a story.

There was a time when there was no numbers. It wasn't a time in recorded history because, when we find literacy in ancient civilizations, we also find innumeracy. Writing civilizations had numbers.

I will assume that counting was first. You don't really need numbers to count, you just need one-to-one correspondence (the ability to line up elements in two groups of things so that, for every item in one group, there is one and only one in the other group). At first, there would have been one and many. A shepherd looks out one day and sees a bunch of sheep. The next day he only sees one sheep and says, "I only have one sheep," or more likely, "Where are all my sheep?" With one to one correspondence, he could carry around a bag and, as his sheep file by, he could drop one pebble in his bag for each sheep. Then, as his sheep file by, he can compare the pebbles to the sheep. If he has too many pebbles he can say, "I'm missing some sheep". If he has more sheep, he can say, "My sheep have been messing around."

But once numbers were invented, he could say specifically, "I have two more pebbles than I have sheep - I'm missing two sheep." Numbers gave names to counts. If you set up a one to one correspondence between a set of items and the set of counting (whole, natural) numbers - that is, the ordered whole numbers from 1 (or zero, according to who you talk to) to infinity - then the largest number you use is the size of the set (the number of elements contained in the set.

Two big milestones in the development of mathematics were the inclusion of zero in the number system, thereby extending the counting numbers from none to how ever many; and place value number systems. Not all number systems were alike. Many represented a number by throwing together a bunch of symbols, each with it's own value. Add the value of all the symbols and you have the value of the number, but that was too disorderly to do any serious calculations. (Try multiplying with Roman numerals!)One of the earliest place value systems was the Babylonian sexagesimal (base 60) system, but they didn't have a zero. If a place didn't have a value, they just left a gap, but if you mistook the gap for just a little more space between digits, you could have made a very serious error.

With a place value number system with zero (like our Arabic system) you can calculate. If you have 35 sheep and you gain 15 more, you don't have to recount the whole group to know how many you now have, you can add the sizes of the groups: 25 + 15 to get the size of your new, larger herd: 40. If you sell off some of your sheep, you can subtract the number you sold to get the recent size of your herd.

Another tool that developed was multiplication. If you had five lots of a particular item, each lot containing the same number of items, you could easily determine how many you have in all by multiplying the number in each lot by the number of lots.

As long as you are adding and multiplying with counting numbers, things go smoothly, but once you start subtracting and dividing, you run into problems.

Subtraction and division are necessary. You need them to undo the effects of adding and multiplying. Subtraction and division are the inverse operations for addition and multiplication. The shepherd above needed subtraction to tell him how many sheep he had after he sold a few. He also needed division to tell him how many sheep he would have if he placed the same number in two separate pastures.

And you are okay as long as you subtract a smaller number from a larger, but what happens when you subtract a larger from a smaller. You no longer have a counting number. But it does make sense - you end up in debt by a specific amount. So negative numbers and the integers were developed.

Some divisions didn't work out very well either. We say that one number divides another evenly if we end up with a whole number from the process, but that doesn't always happen, and when it doesn't, you have fragments left over - remainders, and to make that work better, fractions were invented.

An extension of multiplication is exponentiation. If you multiply 2 by 2 you get 4. If you keep multiplying the results by two, you end up with powers of 2. So two to the 5th power is 32. But what happens if you go in the other direction? For instance, 3 to the second power (3 squared) is 9, so the square root of 9 is 3. But the Pythagorean Theorem (for a right triangle, the sum of the squares of the two shorter sides is equal to the square of the longer side. The length of the longer side, then, is the square root of the sum of the squares of the shorter sides.) demanded a square root of two. The longest side of a triangle with two other sides of length 1 is the square root of two. At first glance, that seems to make sense but nobody could figure out what the square root of two was! In fact, the square root of two is not a number you can write down as a whole number, a fraction, or the sum of the two. It was, in fact, irrational, in that it could not be represented as a fraction with whole numbers on the numerator (top number) and denominator (bottom number). So inverse powers led to the birth of irrational numbers. Inverse operations seem to be trouble all around.

It "feels" like irrational numbers should be rare, but it turns out (look up a guy named Cantor) that, despite the fact that there are an infinite number of rational numbers (whole numbers and fractions), there are a lot more irrational numbers.

Was the trouble over? Well, no, inverse powers - roots - lead to another ponderable. What is the square root of a negative number. With the advent of algebra, people started running into these negative roots when solving equations of order greater than one. And, soon, people found uses for these unreal numbers and the imaginary numbers were born. So did that complete the kinds of numbers? Not by a long shot! In quick succession came transfinites, hyperfinites, matrices, tensors, quaternions, and they're still coming.

All of these "weird" numbers turn out to be very useful, so we keep them. But now let's turn back to statistics. The basic arithmetic operations are often taken for granted, but you find them in all the more advanced statistics and they play their parts as important statistics in their own right.

Counting: The humble count, the most basic arithmetic operation is still one of the most important statistics. It is the primary statistic for nominal statistics - counts are the only number you can actually apply to nominal data.

The data count (the number of individuals in a data set) is usually symbolized by N. Most formulas for statistics will contain an N somewhere. An average is the sum of data values divided by N. Almost all statistical tests will need something called "degrees of freedom" which are ultimately counts.

Orders: Counts are what mathematicians call "cardinal numbers"; there is another kind of number called the "ordinal number". Ordinal numbers designate where an item is in an ordered set - is it the first item, the second, the third,.....? They serve as statistics, also.

There are different ways to order items. See if you can figure out how the following numbers are organized:

8,5,4,9,1,7,6,10,3,2

Give up? Okay, they are in alphabetical order. Data are most commonly sorted in ascending or descending order.

The first challenge in programming courses has traditionally been a sort routine, and there are many different kinds. And they still give me headaches.

Traditional statistics are usually based on parameters of specific frequency distribution; the most popular distribution, because it is so often found in nature, is the normal distribution. There are only two values (parameters) required to describe a pure normal distribution - mean and standard deviation, and you will find that most of the statistics that are used are based on the mean and standard deviation. Those "traditional" statistics are called "parametric" statistics.

There are other kinds of statistics that are not based on parameters and many of them are based on order statistics.They are often called "nonparametric" because they generally ignore the shape of the underlying data distribution. The term "nonparametric" is often used exchangeably with the term "robust", but the two terms really mean different things.

"Nonparametric" means "not related to parameters". "Robust" means pretty much the same thing that it means when talking about taste and health. A robust flavor competes well with other flavors. If you throw cucumbers into a pot with a bunch of really spicy seasonings, you won't be able to taste the cucumber; cucumber isn't a robust flavor. If you mix tomatoes and red wine and beef and garlic together, you'll be able to figure out what went into the sauce. All those flavors are robust. A robust person can take just about whatever nature has to throw at them without a lot of effect. You can throw a bunch of extreme data values into a data set and a robust statistic will just laugh them off. Means are not robust - an extremely large value will pull a mean off to the right of a frequency distribution. A median will stay stock still.

What kind of statistic is the robust median? Why, and order statistic. If you sort all the values in a data set, the median is the one right in the middle. Half of the data values will be above the median and half will be below.

Other very popular order statistics are the quantiles and percentiles.

But not all order statistics are robust, for instance, the minimum and maximum values. Those are the extreme values in the data set, so, if there are outliers at all, it can be that the minimum or the maximum, or both will be outliers.

The sum - the primary arithmetic operations are not usually considered statistical procedures but they are, not only at the basis of all statistical procedures (what could we do without them), but are the most basic statistical procedures in their own rights. Sums are necessary for dealing with nominal data. Once you have the counts of each category that you're working with, almost without thinking, you add them together to get marginal sums, sums of the counts within like categories, and a grand total, the sum of all the counts together.

Sums give you a first idea of the size of the effects of your variables.

There are a few rules that people miss in grade school that are important and I would like to review them here.

1. You can only add like quantities. You can't add apples and oranges. You can add fruit but you can't add apples and oranges. You always (!) have to consider your units.

2. Subtraction is the inverse of addition. In other words, subtraction undoes addition. If a+b=c, then c-a=b. You can't subtract apples from oranges.

3. Multiplication is repeated addition. 2 times 3 is 2 added to itself 2 more times: 2+2+2=6. But you can multiply unlike quantities. If you multiply 2 apples times 3 oranges, what you are saying is that, for every apple you have, you have three oranges: how many oranges do you have if you have two apples? But you still have to consider your units to know what you result means.

4. Division is the inverse of multiplication. If a/b=c, then a*b=c. Again, you can divide unlike terms but you still have to watch your units.

There is no reason to ever make a mistake in mathematics if you know what you're doing and you check your results, and there's always a way to check your results. For addition, if you add a series of numbers, add them in the other direction and you should come up with the same sum. If you don't, you made a mistake. You can even use a calculator to check your results.

Subtraction is a measure of the difference, or the distance, of one quantity from another. Subtracting 3 from 5 gives you 2, If you look on a number line, you can see that 3 is 2 unit distances from 5. Difference can also be used to compare two values. If you subtract a from b and the result is 0, then you know that the values are the same. Any number subtracted from itself is 0. If the result is positive, then you know that b is larger than a. If the result is negative, then you know that a is larger than b.

The quickest check for subtraction is to simply turn it around. Subtraction undoes addition so, if you subtract a from b to get c, then you should be able to add c to a to get b. If not, you made a mistake.

Multiplication is commonly used to weight values. For example, if a set of data contains repeated values, each value can be listed once with the number of times it appears in the set. To get a sum or average of the set, each value is just multiplied by the number of times it appeared. That is approximately how frequency tables work. Instead of individual values, the middle value for each frequency interval "speaks" for that interval. For grouped data, then, the frequency of each interval can be multiplied by the middle value to get the sum for each interval.

Weighting is also useful when you want to give certain values more importance (more "weight"). For example, if you have outliers in a data set, you might want the other values to influence statistics more. You can accomplish that by giving the outliers weights less than 1 or by giving the other values weights more than 1.

Multiplication can be checked like addition by changing the order of the numbers being multiplied. If you end up with the same product, you are probably correct. You can also round the numbers being multiplied and the product will give you an idea if you are close.

Primarily, division tells you how many elements a number of sets have when you've divided a larger set into a number of equally sized smaller sets; but this can lead to some very interesting things.

For instance, a fraction has a numerator on top and a denominator on the bottom. That is primarily a division problem - the numerator is divided by the denominator. But look what happens as the numerator and the denominator changes. If both are the same, the fraction is always equal to one regardless of what the two numbers are. If the top number is larger than the bottom number, the result is larger than one. If the top number is smaller than the bottom number, then the result is between zero and one. So fractions can be used to compare numbers, and the nice thing (contrary to differences) is that you don't have to deal with negative numbers. When you use a fraction to compare numbers, it is called a ratio.

Say that you have 10 red marbles in a sack of 100 different colored marbles and you reach in and pull out a random marble. What is the probability that you get a red marble? Put the 10 red marbles on top of the fraction and the total 100 marbles on the bottom and you get a probability of 10/100, or 0.1. A probability is a fraction - the number of specified outcomes over the total number of outcomes. Statistics is rife with probability.

There are many "speciality" operations in mathematics. Two that arise often in statistics are the absolute value and square. The absolute value of a number is it's positive value regardless of sign; therefore, the absolute values of 3 and -3 are the same: 3. The square of a number is the number multiplied by itself. The inverse function for a square is the square root.

Both can show up when you're trying to determine how "spread out" a set of values is. For instance, you can just subtract the largest value in the set from the smallest - that's called the range of the values but it really doesn't tell you a lot. For instance, it doesn't tell you how the values are distributed in the set - are they evenly spread along the whole range or do they clump around certain values. It's often useful to know how data values behave around the central values of the set. How would you do that?

Perhaps you could simply subtract all the values from the central value (say, the average of the data values). That gives a set of values - as many differences as there were original data points. They are called "residuals" and they can be useful. For instance, you could graph the residuals for a visual representation of how the data points gather around the average; but it would be nice to have a single number.

Maybe the sum of the differences would be useful. The big problem here is that, in symmetric distributions (such as the normal distribution - one of the most common data distributions in statistics), for every positive residual there would be an equal sized negative residual and the end result would be zero. Surely the "spread" isn't zero!

The different signs could be eliminated in a couple of ways and you've probably already guessed at least one. You could use the absolute value of the differences and that is, indeed, what is done for some measures of dispersion, such as the mean of the absolute differences (or MAD). A more common way to deal with this problem is to take the square of each difference. The square of any number, positive or negative, is positive. But with either of these measures, it's difficult to interpret them. What does the sum of all the residuals mean?

If you divided the sum (of either the absolute or squared residuals) by the number of data points, that would be meaningful. It would tell you how far, on the average, each data point differed from the average of the data set. Now we're getting somewhere, but there's one more problem.

Let's say we're actually talking about differences: say the data points are measurements in inches. If you're using MAD, you end up with a number of inches, and that would be okay, but if you use the square to get rid of the chance of a zero dispersion when dispersion is obviously not zero, you end up with a value in square inches. That would be an area - are we really talking about areas?

That's really not much of a problem. If you take the square root of the result, you end up with a perfectly good measure of dispersion and it's in the same units as the original data values. This square root, in fact, is the very common standard deviation.

You've actually seen five common measures of dispersion here and seen why some are generally favored above the others. The sum of deviations squared (and the sum of squared deviations) are simple measures of dispersion used in a common statistical technique called "analysis of variation". They are simple to calculate over and over and, since they are usually intermediate values that aren't reported or interpreted as end values, it doesn't really matter how easy it is to make sense out of them.

The mean sum of the squared deviations is called the variance and the variance of data is sometimes reported, but it is more commonly used as an intermediate value in statistical procedures. It's easy to see why people would just take the square root to get a standard deviation. And the MAD is also popular for the same reason - it's easy to interpret and it eliminates the problem with symmetrical distributions.

We will talk about these measures of dispersion later when we consider descriptive statistics, but I just wanted to show you how statistics develop - as a series of adjustments to measures that aren't quite satisfactory.

Another special arithmetic operator is the factorial. To find a factorial, you start at a number, multiply it by one less than itself, then multiply it again by one less, and so on until you get to one. For example, to find the factorial of 5, you calculate 5 x 4 x 3 x 2 x 1 = 120. Add the rule that the factorial of 1 and 0 are both 1 and you have factorials. So, why are factorials important?

Let's say you wanted to count all the ways to order the four elements of a set (lets label them 1, 2, 3, and 4.) Once the first element is set, there are 6 ways for the other elements to be arranged. If the first element is 1, the arrangements are 1234, 1243, 1324, 1342, 1432, and 1423. Any of the four elements can be the first, so there are 4 x 6 or 24 different arrangements for all four elements. the factorial of 4 (written "4!") is 4 x 3 x 2 x 1 =24. Does this always work? Well, if you have two elements, you only have two different arrangements. Add another element and each element has a chance to precede those two arrangements giving 2 x 3 or 6 possible arrangements. Add another element and each has a chance for preceding the other 6 arrangements giving 4 x 3 x 2 or 24 arrangements. Just keep adding elements and you can see that there are n! ways to arrange n different elements. The mathematics called "combinatorics" deals with shortcuts for counting the number of ways of arranging the elements of sets and the factorial operator figures prominently.

Many statistics are based on the probability of drawing a value out of a set of values given a particular probability distribution. These are called "parametric statistics" because they rely on the shape of the distribution. Other "nonparametric" statistics do not rely on the shape of the particular distribution and many of those are based on the number of ways data values can be arranged or ranked, so factorials find their place in statistics.

At it's simplest, exponentiation is the raising of number to a power. When you raise a number, a, to a power, n, you simply multiply a by itself n-1 times. So 2 to the second power (written 2^{2} ) is 2 x 2 = 4. 2 to the third power (2^{3}) is 2 x 2 x 2 = 8. Add the rules a^{0}=1 and a^{1}=a and you know all you need to know about whole number powers; but exponents have been generalized to other kinds of numbers. For instance, a negative exponent is the reciprocal of the absolute value of the exponent. For instance, 2^{-2}=1/(2x2)=1/4. An exponent of 1/2 is a square root. Fractional exponents are a thing. 2^{0.25} is interpreted as the 100th root of 2 to the 25th power. Since 0.25 is actually 1/4, it's simpler to just take the 4th root of 2 (which is about 1.189). Exponentiation extends to complex numbers, matrices, and so forth.

There are some interesting rules for working with exponents:

First, some terminology:

For the term 3^{2}, 3 is the base and 2 is the exponent. It means 3*3 (or two threes multiplied together.)

Going back to the basics, you can only add and subtract like terms. For exponential terms, both the bases and the exponents must be the same for two terms to be alike. So 5x2 + 7x2 = 12x2, but you can't add 5x2 + 7x3 or 5x2 + 5y2.

More advanced rules are easy to understand when exponential terms are represented as series of products.

Multiplying and dividing exponential terms into a single exponentiated term requires that the bases be the same.

1. To multiply two exponential terms with like bases, add the exponents. It's easy to see why. For instance, to multiply x^{2} * x^{5} = x*x * x*x*x*x*x, count all the xs. The result is x^{7} or x^{2+5}.

2. To divide two exponential terms, subtract the denominator exponent from the numerator exponent. It comes down to cancelling terms. x^{5}/x^{2} = x*x*x*x*x/x*x. Cancel the like denominator terms from the numerator and you have x*x*x=x^{3}=x^{5-2}.

3. To exponentiate an exponential term, multiply the exponential of the term by the "outer" exponential. To make that clear, take the example (x^{3})^{2}, which means to square x^{3}. That is exactly the same as x^{3} * x^{3}, or (x*x*x)(x*x*x) which is the same as x^{6} or x^{3*2}.

Rule 2 above also explains why any number to the zeroth power is 1. X^{n}/X^{n} is 1 (because any number divided by itself results in a quotient of 1), and X^{n}/X^{n} = X^{n-n} = X^{0}.

Moments are very basic statistics, usually used to build more advanced statistics, but they are basically the average of powers of distances of data points from the average of a data set. I will be talking more later about all this but you might find it interesting that the mean is based on the first moment of a data set, the variance is based on the second moment (remember the standard deviation and the square of distances between data points and the mean), measures of skewness is based on the third moment and uses third powers, and measures of kurtosis are based on the fourth moment and use fourth powers. Rarely does statistics use more that powers of two, except.....

Processes that involve growth or decay, including many economic functions such as interest, and population growth, usually increase or decrease according to exponential functions, so many statistical techniques that deal with those processes use complicated exponents.

The logarithms is often called the inverse function for exponentiation, because it looks as if it "undoes" exponentiation. I've even seen it in mathematics textbooks. It is completely false. A logarithm is just another way to write an exponential term and it looks like it "undoes" the exponential because you can convert back and forth to either format. Don't make that mistake because the inverse of an exponential function is a root (or, in other words, a term with a fractional exponent.)

Since logarithms are simply another way for writing exponentials terms, the rules for exponents apply equally to them. To see how that works, look at how to convert from exponential format to logarithmic format.

We'll start with what has been called "common logarithms" because they are very intuitive for users of a decimal number system. Until fairly recently, they were the most popular logarithms. They convert like this:

10^{3}=1000 is the same thing as Log_{10} 1000=3

Notice that 10 to a power is one followed by the same number of zeros as the power. You can always know the common logarithm of a multiple of 10 - it's the number of zeros. That's what I meant when I said that common logarithms are intuitive for users of a decimal number system.

Now, notice that, in both formats, 10 is the base. Common logarithms are distinguished from other kinds in that it uses 10 as the base. The same numbers appear in both examples. To be more general, a^{b}=c is the same as Log_{a }c=b.

When converting from one format to the other, I first think of the above example:

10^{3}=1000 is the same thing as Log_{10} 1000=3

to figure out where each number is supposed to go. So, to convert the common log of 75, which is 1.875, to exponential notation, I write the base (10) to the exponent, 1.875, and make that equal to the result, 75:

10^{1.875}=75

Logarithms can have other bases. The weird number e (~2.718) arises naturally (a lot!) in mathematics, economics, and physical and biological sciences, often in the form of the base of an exponential expression, therefore, logarithms that use e as base have superseded common logarithms in popularity and are known as "natural logarithms". 2 is also a popular base for logarithms, often in information theory, computer science and acoustics.

Converting from one base to another is fairly easy.

log_{a} x=log_{b} x/log_{b} a

For instance, you know that the common log of 1000 is 3. The common log of 2 is about 0.301. So the binary log of 1000 is 3/0.301 or about 9.966.

Here's how the rules for exponents work out when you're dealing with logarithms.

To multiply two values, add the logarithms. Keep in mind that the result is a logarithm, so you will have to look up the value of the logarithm in a table of exponentiate the base by the logarithm to get the value.

For a very simple example, to multiply 5*3, add the logarithms (as long as you use the same base, it doesn't matter what the base is - let's use natural logarithms.)

Log_{e} 5 = 1.609

Log_{e} 3 = 1.099

The sum is 2.708. If you exponentiate e to the power of 2.708, you get 15. Actually, if you did this on a calculator or spreadsheet (the usual function for natural logs is LN and that for exponentiating e is EXP), you will have to deal with a little round off error.

To divide two numbers, subtract the log of the dividend from the log of the divisor from the log of the dividend and exponentiate the result.

To raise a number to a power (any power, negative, fractional, whatever), multiply the log of the number by the power (not the log of the power).

Before calculators, these tricks were used to calculate using very large, very small or very precise values. Logarithm tables were a part of everyone's high school education. Logarithms were also the basis of those battery free calculators - slide rules.

If you play around with a couple of rulers, you will find that you can use them to add. Placing one ruler on top of the other so that one scale is just below the other, if you place the zero of one scale directly under the one of the other scale, you will see the sum of each number on the bottom scale plus one above the respective number on the bottom scale. Try it with the zero under the two on the top scale. You get each number on the bottom scale by two.

Now imagine that the numbers on the scales are not values but the logarithms of values. The results will be sums of logarithms, or logarithms of products. And if you label the logarithms with the values instead, you can read off products....or quotients.

Slide rules are cool because, as I said above, they never need batteries (or solar energy to charge batteries) and, with some understanding of logarithms, you can actually make your own from index cards (see my CardCalc excursion).

Exponents are still used as the basis of scientific notation, which is used to express and calculate with very large or very small numbers.

Logarithms are used in several ways for statistics. We've talked about outliers. Let's say that there are several outliers in a data set and they are determined to be legitimate data values and need to be kept in the analysis. One way to bring data into a narrower range and even make it look more normal (i.e., like a normal distribution so that traditional statistics may be used) is to take the logarithm of the data points and work with those. The distance between 10 and 1000 is 990, but the distance between the common logarithms, 1 and 3, is only 2. Remember, though, that after analyses are done, the results have to be transformed back into the original form by taking the antilogarithms (exponentials).

This kind of transformation is commonly used in graphical examination of data (since and unweildy, wide range data set can be squashed into a more convenient graph), regression analysis (to reduce variation in data, since range is variation), and in contingency table data in the form of odd ratios and relative risks (which we will come to later).

Anther form of advanced mathematical operation is the derivative, the result of differentiation. Here, we are no longer working with data values, but functions. The derivative, the result of differentiation, answers the questions: "How fast does a value change" and "How fast does the change rate of a value increase or decrease." Derivatives are about rates of change.

It's easy enough to calculate the rate of change for a linear equation. Just remember "rise over run". That's the same formula carpenters use to describe the pitch of a roof or the grade of stairs. Divide a horizontal distance along the line by the respective vertical distance. That slope will be the same anywhere along the line regardless if the horizontal or vertical distance chosen.

Curved lines like parabolas and exponential curves are a different matter. The slope at a given point is the slope of the line tangent to the curve at that point and that changes as you move along the line.

Calculus is about three things: differentiation, integration, and limits. Everything else in all those huge calculus texts is either preparation or application of those three concepts.

Actually, practical statisticians rarely deal directly with derivatives, but the theoretical statisticians who developed the techniques that they used had to deal with them plenty.

On the other hand, people who perform other kinds of practical analyses may well run into calculus from time to time. For instance, people who want to know largest or smallest values in functions will use derivatives.

In economics and business, processes with diminishing returns are fairly common. Let's say someone wants to know the best price for a product they are manufacturing. If they place too small a price on it, people will think that it's poor quantity and won't buy it. If it's too expensive, they will look for cheaper alternatives. If there is some data available, perhaps from the sales of similar products or from a consumer study, the manufacturer can graph the data and look for the price where sales stop increasing and begin decreasing. The curve will look like a upside-down parabola or semi-circle and what they want is the highest point on the curve - the maximum. But it's hard to identify the exact maximum on graph.

There is a way to find the exact maximum point on the curve. If you imagine such a curve and to visualize tangents along the curve, you will notice that, where the curve changes from increasin values to decreasing value on top, the slope line is perfectly horizontal - in other words, at the maximum value, the slope is zero. At that point, then, the value of the derivative is zero. The same is true of minimum values. So derivatives can be very useful for finding maximum and minimum values of functions.

The inverse operation of differentiation is integration and the integral is sometimes called the antiderivative. In fact, one way to find an integral is to work backwards and figure out what function a derivative is the derivative of.

A major importance of the integral is that you can use it to calculate the area under any curve. If you have a function that describes the curve, you can integrate it to find the area under it. If you just have data points, there are procedures you can use to find the integral (I'll get back to that below).

A probability distribution is a function that describes the probability that a variable (or value in a set of data) will take any particular value. There are common types of probability distributions that arise frequently in statistics. A probability distribution can be graphed as a curve with values on the horizontal axis, and frequencies of those values as they occur in the data set on the vertical axis. The probability that the variable will take on a specific range of values is precisely the area under the curve between the end point values of the range.

Differentiation and integration can both be checked by graphing the function (or the data values) and by comparing the values you have for the derivative and/or integral with approximations from the graph.

Again, for most practical statisticians, there are tables (and, now, functions in spreadsheets) that will return such probabilities, but someone had to calculate them in the first place, and to do so required integration.

Statisticians today have it good. There is a lot (!) of statistical software out there - both commercial and free. Some of the more popular costly packages are SPSS, SAS, and Stata. The core programs are extensive and there are generally packages available that will do just about everything from optimization to data mining.

On the free side, there are the packages offered in this website:

http://freestatistics.altervista.org/

plus the add ons for various spreadsheet programs like OpenOffice, LibreOffice and Google Sheets. A search for statistics programs in SourceForge will also turn up a considerable list. Some items you might want to look at are PAST, R (a programming language especially set up for statistics), MicrOsiris, and my own software package for OpenOffice or LibreOffice Calc - DANSYS (which is available on the Excursions>LabBooks page of the Therian Timeline).

Most spreadsheets, both free and commercial, offer a broad range of statistical functions. You can usually do hypothesis tests, frequency and contingency table tests, and simple regression. It's pretty impressive the things you can do with Google Sheets extensions.

Caveat: if you just want to play around with some stat packages, download a couple and have fun, but if you want to do serious work with them, try them out on some test data first and run some other tests. You should at least test the randomizer and see if the resulting distributions are what they should be.

For instance, for a long time, people complained that the OpenOffice RAND function did not give a true uniform distribution. A histogram would quickly reveal that small and large values in the interval between 0 and 1 had less chance of appearing than middle values. The RAND function has since improved considerably. Don't assume that a statistics program, even the expensive ones, will give you reliable and valid results.

People who are not mathematically savvy often assume that our decimal number system is the only kind of numbers there are. Not only is that false, but there are other useful number systems. For instance, computers use a variety of number systems.

Our usual number system is called "decimal" because it is said to have "base 10". (The dec- root means 10). The tale tell thing about the decimal system is that there are 10 digits, 1 through 9 and 0. The chief characteristic of a decimal system is that each place is a power of 10 multiplied by one of the digits. Let's look at an example.

Read from right to left, the values of the digits in 123,456 are:

6*10^{0}

5*10^{1}

4*10^{2}

3*10^{3}

2*10^{4}

1*10^{5}

Adding all those values together, you get 6 + 50 + 400 + 3000 + 20000 + 1000000 = 123,456.

One of the common non-decimal systems used by computer scientists is the binary (base-2) system. Since the base is 2, you only have 2 digits to work with: 1 and 0. You can still represent any value that can be represented in a decimal system using a binary system. The value of binary mathematics in computers is that electronic components can only recognize two states (actually, there are exceptions, but most components in computers can only recognize two states) - on and off, or two different voltages. How does that work?

We still work with powers of the base, but in this case, the place values are powers of 2: 1, 2, 4, 8, 16, 32, 64, etc. The number 101011, then, in our decimal system is 1*1 + 1*2 + 0*4 + 1*8 + 0*16 + 1*32 = 43.

We can also translate from decimal to binary. Let's do that with 123,456. First, we need to find the largest power of two just less than 123,456. Let's see:

1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072.

Looks like 65536 = 216 is our leftmost place value. The 17 th place has a 1. What about the 16th place? 123,456 - 65536 = 57920. There's a 32768 in 57920, so the 16th place has a 1. 57920 - 32768 = 25152 and there's room for 16384, so there's a 1 in the 15th place. 25152 - 16384 = 8768 and the next smaller power of 2 is the 14th place. So 8768 - 8192 = 576, and the next place with a 1 is the 10th. 576 - 512 = 64, and there's a 1 in the 7th place and that's it since 64 - 64 = 0. So there are 1s in the 17th, 16th, 15th, 14th, 10th, and 7th places and our binary number is 11110001001000000.

Computer scientists also use hexadecimal (base-16) and octal (base-8) systems as short forms of binary numbers since three bits (binary digits) can be represented with one octal digit and four bits can be represented by a hexadecimal digit.

By the way, the hexadecimal digits are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. A, B, C, D, E, and F.

Are there other number systems commonly in use? Why, yes. One was used by the ancient Babylonians. For regular numbers they used a decimal system (without the 0), but for really big numbers, they used a sexagesimal (base-60) system, and we also use a sexagesimal system for time and angles. 60 seconds make up a minute and 60 minutes make up an hour if you mean time, but a degree if you are measuring angles.

How about the column labelling system used by many spreadsheets? The first column is labelled A, the second B, the third C, and so on to column Z, at which double letters are brought in starting at AA and going to AZ, then BA and so on. This is a base 26 system with no zero.

Units and conversion

Say that you want to paint a surface 6 inches by 27 feet. How do you determine the area you want to paint? Well, we've talked about multiplication and that's obviously what we do now. 6 * 27 = 162. 62 what? Inches? Feet? Foot inches?

Okay, I'm being silly. To end up with anything that makes sense, we have to be working with the same units of measurement. We can either convert the inches to feet, or the feet to inches. We can do it either way, but let's go with inches sense that will eliminate the need to work with fractions. Each foot is 12 inches, so 27 feet is 27 * 12 or 324 inches. Now we have 6 inches * 324 inches = 1944, what? inches. No, when you multiply units by the same units, you end up with those units squared, so this is 1944 square inches.

Now, how much paint do you need to buy. You check around and you find a good paint for the purpose and 5 gallons will cover 2000 square feet of surface. Do you need 5 gallons? No, you only have 1944 square inches which is quite a lot less than 2000 square feet. So how do you figure this out?

When you calculate measurements, you include the units along with the amounts and units add, subtract, multiply, and divide (and exponentiate, and differentiate, etc.) just like the amounts. For instance, you will have to convert the square inches back to square feet, or the square feet coverage to square inches. With units, we can just throw everything together and, as long as the units work out right, we'll be okay. We have:

1944 square inches

144 square inches/1 square foot (each square foot is the same as 144 square inches because 12 inches * 12 inches = 144 square feet)

5 gallons/2000 square feet

We want the answer in gallons of paint, so we can arrange our factors so that everything but gallons cancel out:

5 gallons/2000 square feet * 1 square foot/144 square inches * 1944 square inches =

All the units in the denominator cancels all the units in the numerator except the gallons, so that should work. We get:

0.03375 gallons.

Does that make sense? Well, if we had calculated using feet instead of inches to get the area, we would have 27 half foot square or 13.5 square feet area to cover. Compared to 2000 square feet, we can expect the result to be pretty small so our result looks reasonable.

This example illustrates several things.

There are at least two important parts to measurements. Actually, there are three important parts but many people consistently ignore the third part completely: in precise measurement, you can't!

Every measurement consists of an amount, specification of units, and an indication of how precise the measurement is. No measurement is perfect and there needs to be some indication of how imperfect the measurement is. In technical reports, precision is usually indicate by phrases like "accurate to within x units", or "give or take x units", or, more often "±x units". That means that the perfect measurement, if it could be made would be somewhere within the range between the reported measurement - x units to the reported measurement + x units.

Most people come up through school in math classes where they do "exercises" with endless lists of dimensionless numbers, so they feel like the units are superfluous. Actually, In real life, you rarely run into numbers that don't have units and the units are important. For instance, in the example above, I might have figured out the result by throwing all the numbers together and may have ended up with 0.03375 but 0.03375 what? And how did I know if I were correct?

First, the number "looked right". But it looked right for a gallon of paint. If the result was in ounces, it would have been much larger. I could only know if it looked right if I knew the unit of the result. And notice that I figured out how to combine the numbers by figuring out which arrangement would allow me to cancel out all the units I didn't want.

A common real world problem is conversion. To convert from feet to inches, you multiply the number of feet by 12. 12 is called the conversion factor. Tables of conversion factors might provide a "To convert from" column, a "to" column and a "multiply by" column. Often you will get "from", "To", and "Conversion factor" columns. In the latter case, it can be confusing as to what you multiply by what to get what. Not all tables are equally helpful. There are two useful guides.

Say you want to convert from 3.2 feet to inches and your conversion factor is 12 inches per foot. You may have to multiply or divide by the conversion factor according to which direction you are going. Your first clue is that you will have more inches than feet in the same length, so you know that you should multiply the number of feet giving you 38.4 inches.

Another way to set up the problem, since saying "12 inches per foot" is the same as saying "12 inches/1 foot": You want your result to be in inches, so you want the feet to cancel out, therefore: 3.2 feet * 12 inches/1 foot = 38.4 inches.

The "setting up factors so that, by cancelling units, you end up with the units you want" is called dimensional analysis and, the more complicated problems become, the more useful dimensional analysis is. For instance, chemical reaction problems can be bears to figure out but dimensional analysis can provide invaluable clues as to how problems should be set up.

Another facet of measurement that must be considered in real world problems is precision and error. To the point, every measurement has error. There are two levels where error can be considered. For most real life situations, the first is sufficient. You have to know the limits of your measuring instrument. For precision work, for instance, fine technical machining and statistics, precision of measurements must be quantitized and reported.

It is unreasonable to try to estimate measurements of length from a ruler to the thousandth of centimeter. In fact, you should limit your measurements to half of the smallest interval provided by the instrument. If the smallest units on a ruler are millimeters, you should not try to make measurements less than half a millimeter. If you try to be more precise, several kind of measurement error will muddy up your reading.

There are four sources of error in measurements and two broad types. There are four elements involved in a measurement and, since nothing is perfect, they all inject error into measurements. The four elements are the person making the measurement, the instrument they use, the thing being measured, and the environment.

Sensory organs are incredibly precise but not perfect. And people make mistakes, however so slight. ditto instruments, in fact, measurement instruments usually come with a specification of their precision (for instance, a weight scale instruction manual will say something like "will measure to 1 milligram ±0.5 mg". That means that a true measurement will be somewhere within 0.5 milligrams of the indicated mass.) Things being measured may change while they are being measured, for instance, a metal object will expand or shrink with changes in the temperature of the environment. And, of course, drafts, changes in lighting, and electromagnetic fluctuations might change measurements slightly.

Mind you, all this is not terrible since error can be kept to manageable levels by good measurement practices. Normally, measurement instruments will outline best practices for the use of the instrument. Elemental instruments like clocks, rulers, and graduated cylinders do well with simple common sense. Basic instructions are usually covered in any laboratory manual.

The two broad types of error are random error and systematic error. By far, systematic error is the most aggravating.

Random error is the result of tiny, irrelevant variations in the system that nevertheless affect the measurement. With multiple measurements, they tend to average out.

Systematic errors, on the other hand, do not average out. They exist as biases in the system. Biases show distinct, and not always predictable patterns. An example is parallax error. The hands on analog clocks rotate above the dial of the clock, so a reading from such a clock is going to vary according to the angle the clock is viewed at. The hour hand at 1:00 is going to look closer to the 12 on the clock face when viewed far to the right of the clock than when viewed straight on or to the left.

If an electronic measuring device is not calibrated correctly, it will consistently display measurements that are larger or smaller than the true values.

Error in measurement can be reported as absolute (apparent) or relative error. The ± notation usually indicates absolute error, so, a measurement of 2.5 centimeters ± 0.5 millimeters means that the maximum possible error in the measurement is 0.5 millimeters above the reported length (or, the true measurement is somewhere between 2.45 and 2.55 centimeters.)

The smaller the absolute error, the more precise a measurement is, so 2.5 cm ±0.5 mm is more precise than 2.5 cm ± 1 mm. A tolerance of 1 mm in a design indicates that a measurement can be off by up to 1 millimeter and still be acceptible.

Relative error indicates the significance of error in a measurement. For instance, an error of 0.5 mm in a measurement of 1 centimeter is much more serious than the same error in a measurement of 1 kilometer. Relative error compares the maximum error in a measurement with the reported measurement. It can be reported as a fraction or, more often as percent error. The relative error in a measurement of 1 cm ± 0.5 mm is 0.5 mm/10 mm or 5% error. Notice that, since the units cancel out, percent error is a dimensionless amount.

The smaller the relative error is, the more accurate the measurement is.

Another important issue for measurements is that of significant digits. Digits that are not significant are just placeholders. Any nonzero digit is significant. If a zero is merely used for placing a decimal point, it is not significant; all other zeros are significant. In 307, 2.05, 27.0 and 0.0000000321, each number has three significant digits. In the last case, all the zeros are significant. In 27.0, the zero is not just a placeholder; it indicates that the number is accurate to one-tenth.

In general, a measurement with more significant digits is more accurate. If absolute error isn't specified, it is usually assumed to be half of one more significant digit to the right. So the absolute error of 307 is assumed to be 0.5. The absolute error of 27.0 is assumed to be 0.05.

When calculating with measurements, you can't claim that a result is more accurate than it actually is. There are rules...

When adding or subtracting measurements, convert them to a decimal form of the same unit and round them all to within one position of the same precision as the least precise number, then you can do the calculation. The result should be rounded to the lowest degree of precision. In other words, the result cannot be more precise than the least precise measurement being added or subtracted.

To multiply or divide, bring each number to the same number of significant digits as the least accurate one. The result will have the same degree of accuracy.

If you have a spreadsheet (and, if you're reading this, you probably do), you will notice that there are many different functions for rounding off numbers. You can truncate numbers by just throwing away significant digits. You can round up or round down. There are things called ceilings and floors. As far as accuracy is concerned, there is only one kind of rounding.

When you round to a specified number of significant digits, if the digit in the place just to the left is greater than 5, you round up; if it's less than five you round down. If it's equal to 5, look at the next digit to the right. If all you have there are zeros, well, you have to figure out who to listen to. Some people say to round up; some say to round down; some say to alternate. Maybe, flip a coin.

If you want to round 25307 to two significant digits, that would give you 25000 since the third significant digit from the left is less than 5. If you want to round to four significant digits, the result would be 25310. I would round to 30000 for one significant digit since 5307 is larger than 5000.

Algebra

Algebra throws in the concept of variables. Variables are numbers that do not have a fixed value. You can think of them as placeholders for values you don't know (in problem solving, they are generally called "unknowns"). In algebraic statements, they are generally represented as small case letters at the end of the alphabet: x, y, z, or small case letters with subscripts x_{1}, x_{2}, x_{3}, etc. Subscripts are used because there are ony 26 letters in the English alphabet and you quickly run out of letters to label variables when there are a lot of variables.

Numbers that have only one value each are called constants. The letters at the beginning of the alphabet, with or without subscripts, are used for those if their values are not known. If a variable is multiplied by a constant, as in 4x, the constant is called a coefficient. (Many statistical measures are also called coefficients but they are not directly related to algebraic coefficients, so don't let that confuse you.).

Algebraic expressions are made up of terms. Terms are smaller statements that are connected by operators to form the expressions. Here is an algebraic expression:

4x^{2}+5x-3=2

The terms that make up this expression are 4x^{2}, 5x, 3, and 2. The rules of arithmetic apply in exactly the same way to variable expressions as they do to regular numbers. You can only add and subtract like terms, and you can multiply or divide unlike terms but you have to multiply all the parts to all the other parts and then add the partial products. Like terms in this case are terms that have the same variable and the same exponent. You can add 5x^{2} and 3x^{2} but you can't add 4x_{2} and 5x. 5X^{2} plus 3x^{2} is equal to 8x^{2}.

Mathematics is a language and algebra adds the idea of variables to the language. Like all written languages, algebra uses strings of characters to convey meaning. We've already seen the characters used in arithmetic and now we've added letters to denote variables and constants. There are also special strings.

A statement is generally a string of characters that makes sense in the language of mathematics. A formula is a statement that is constructed so as to calculate a value given other values. Perhaps the most famous formula is E=mc^{2}. There are three variables in this formula: E is a measure of energy, m is a measure of mass, and c is a constant that stands for the speed of light. Given a value for one of the two variables, the value of the other variable can be calculated.

An equation is a statement that specifically indicates the equivalence of two statements. The statement, 4x^{2}+5x-3=2, says that 4x^{2}+5x-3 has the same value (means the same thing) as 2. An equation can be seen as a definition.

A relation does something - it maps the value of one or more variables onto one or more other variables. The variables do not have to be values. In the relation "a is b's son", a and b are people. The set of all persons is mapped back onto the set of all persons in a way that, if a is a son, b is his parent.

Relations can be one-to-one if every instance of a variable is mapped onto (related to) one and only one instance of another variable. y=2x is a one-to-one relationship because each value of y is related to only one value of x and vice versa.

The relation "sonship" above is many-to-two since each each parent can have many sons but a son can only have two parents (in the biological sense).

Mathematical relations can be positive or direct if, as one variable increases in value, the other one also increases; or negative (indirect) where, when one variable increases, the other decreases. If the changes are consistent, for example, if any increase in one variable always occurs with an increase in the other variable, the relation is said to be monotonic.

A function is a special kind of relation in which each value of one variable is related to only one value of another variable (the reverse doesn't have to be true). When the relation is graphed, the variable graphed on x axis has only one value on the y axis, although the each instance of the y variable can have more than one x value. The y variable is often said to be dependent on the x variable, and the x variable is called the independent variable.

Functions are graphed using pictures consisting of lines divided into equal intervals (to provide length measures), at right angles to each other. Of course, two and three lines are all that can be accommodated on a flat piece of paper. The lines are called "axes". Usually the independent variable is graphed on the horizontal, or x, axis. The dependent variable is graphed on the vertical, or y, axis. Each point relates a value of x to a value of y. Any point relating two variables can be positioned using two axes and can be described by an ordered pair of values. If a third variable is in the mix, a z axis is added at right angles to the x and y axes.

Here is a graph of the point (2,3):

The "order" of an ordered pair is x, y, and z (if there is a z). So the point is placed 2 units out the x axis and 3 units up the y axis.

A function generally has several points strung along a line or curve. A way to see if a graphed relation is a function is to draw a vertical line on the graph and slide it back and forth along the x axis. If it ever crosses the curve formed by the relation more than once, it's not a function. Is 4x^{2}+5x-3=2 a function? Here's the graph:

If you slide the thick black line left or right, it will never cross the curve more than once, so this is a function.

This part of the Timeline is not intended to be a course in algebra but algebra is a prerequisite of statistical mathematics, thus the brief overview. I will, however, go over the two most important tools of algebra because there can be some confusion there and they really are important.

I have heard people who should know better, educators and mathematicians, say, when simplifying an equation, "Now, we move the x variable to the other side of the equation." You can get away with that kind of thinking in elementary algebra but as you get deeper into algebra, it will cause you problems. You don't move variables from one side of an equation to another unless you're simply flipping the equation around the equal sign (for example, you can flip 3X + 5 = X - 2 to form X - 2 = 3X + 5.)

The way you isolate variables to one side of an equation is a balancing trick. The basic rule is that you can do anything you want to one side of an equation as long as you do it to both sides of the equation. For example, you can subtract X from both sides of the above equation to get 2X + 5 = 2; then you can subtract 5 from both sides to get 2X=-3. Finally, you can divide both sides by 2 to end up with X alone on the left side of the equation: X=-3/2.

The second indispensable tool is cancelling. If you have a product as the numerator of a fraction and another product as the denominator, you can cancel out any common factors. So, if you have 2xy^{5}/y^{3}, you can cancel three y factors from both the numerator and denominator giving you 2xy^{2}.

There are a lot (!) of tools for manipulating the terms of an equation, such as factoring, completing the square, polynomial division, and such. If you want to build an algebraic toolbox, any algebra textbook will carry you far afield.

The most basic and arguably the most important function is a linear function. You'll see the word "linear" all through mathematics and statistics. It is simply a function with a graph curve that is a straight line. There may be any number of variables, but they never have a power of more than one. The simplest linear function is y=ax+b. Here, a is the coefficient of the variable x and b is a constant. If you graph this function, you'll find that the slope of the line (a carpenter would say, "rise over run") is a and the line crosses the y axis at b (or to put it another way, b is the y intercept of the function).

The term "solving an equation" can commonly mean a couple of different things. First, it might mean finding the value of the dependent variable for a set of values for the independent variable. That just involves plugging the values of the independent variable into the equation and doing the math. So, for y=3x-5, if x=1 then y=-2, if x=2 then y=1, and so forth.

The other way of "solving an equation" is a little more difficult. It involves finding out what the values of x are when y=0. These are the roots of the function and, on the graph, they are the x intercepts. For the above function, you have to find the value (or values) of 3x-5 when y is 0.

For linear equations, it's easy. It's just a matter of balance. 3x-5=0, so 3x=5, and 3=5/3, and that's the answer. But, when you throw in just one more term with an exponent of 2, it suddenly gets much more difficult. Just to give you a taste of how that works, let's look at x^{2}+2x-15=0. One way to solve that is to split the equation into two factors: (x+5)(x-3)=0. If you multiply the equation out, you'll see that the two are the same. Since one or the other factor must equal zero, then x can be either -5 or 3. There's also a quadratic formula you can use to solve functions having a term with a largest exponent of 2. It looks like:

a is the coefficient of the squared term, b is the coefficient of the linear term (the one with an exponent of 1) and c is the constant. In this case, a is 1, b is 2 and c is -15. If you do the math, you will come up with x=-5 (when you add the square root in the denominator to -b), and x=3 (when you subtract).

It gets really complicated when you have exponents greater than 2, but that algebra textbook has ways of dealing with all that.

A linear equation can have more than one variable as long as no variable has an exponent larger than 1. In general, if you have the same number of equations as you have independent variables, you can solve for the values of the variables. For instance:

3x+7y=4

6x-3y=2

The x term in the first equation can be eliminated by multiplying all the terms by -2 to get:

-6x-14y=-8

6x-3y=2

When you add the terms of the equations together you get:

-17y=-6, or y=6/17

Now you can solve for x by plugging this value for y into the second equation:

6x-3(6/17)=2

6x-18/17=2

6x=3 1/17

which makes x about one half (or 0.509803922)

To find y, you just plug that value for x into the first equation:

7y=4-3(0.50....)

7y is about 2.5

y is about 0.35

There are some rounding errors here but, if you plug these values into the equations you will find that they work pretty closely.

Many statistical procedures such as linear regression rely on systems of linear equations.

There are equations but are there not-equations. There sure are, except they're called inequalities. They work pretty much like equations except the inequality sign (<) has direction (it can point left to denote "less than" or right to denote "greater than") and you have to keep up with the direction it's pointing. People get confused about when you have to flip the sign. There are only two situations where you have to flip the sign. If you multiply or divide both sides of an inequality by a negative number, you have to flip the sign (< to >, or > to <). Of course, if you flip the whole inequality around, swapping the right hand side with the left side, the sign flips with it; so, x+5y<3y is the same as 3y>x+5y.

Analytic, Graphic, and Numerical solutions

As with computers, in mathematics there are usually several ways to approach a problem. There are three very broad forms of solutions.

Analytic solutions begin with equations, formula, functions, etc. and juggles variables until there is an easy solution. For instance, solving a quadratic equation by factoring is an analytic solution.

You can also find out a lot about a function or data by just looking at a graph, for instance, finding the roots of a function are easy once you have it graphed; you just look for places where the curve crosses the x axis and read off the values of x there. You don't usually get terribly accurate results that way but the results may be accurate enough for your purposes. Also, the accuracy depends a lot on the way the curve crosses the x axis. If it crosses at a very shallow slope, you may have serious difficulty figuring out exactly what the value of x is. Also, how do you know that your graph contains all the x crossings?

Another kind of graphical solution is a geometric solution. For instance, if you want to find the square root of two, you know that the length of the hypotenuse of a right triangle is the square root of the sum of the squares of the other sides, so all you have to do is draw a right triangle with two sides that are one unit in length and measure the hypotenuse. It will be the square root of 2 units in length.

If you get into integral calculus and differential equations, you will find that there are many problems that have no known analytical or graphical solutions, but there are often indirect methods that you can use that will give you a solution to any specified level of precision. Those are called numerical solutions. Such solutions have been worked out for a lot of different kinds of problems and they're usually not exact solutions because they depend on how a certain function converges on a number as other values approach zero or infinity.

Visualizing

It's hard to get much from a table of data. Even if the data is sorted, it will be hard to see trends. There are a lot of ways to visualize data. You can get a lot of ideas from a table that has been summarized. Crosstabulations, stub and banner tables and pivot tables summarize data.

A crosstabulation records joint counts and probabilities. For instance, look at the following data:

Subject Hair color Gender

1 Red M

2 Black F

3 Blonde F

: : :

50 Brown F

Now, if we counted the number of male subject with brown, red, black, etc. hair, and female subjects with brown, black, red, etc. hair and recorded the counts, they would be joint counts of the different subsets.

Hair color

Gender Black Brown Blonde Red

Male 15 7 6 2

Female 8 2 0 10

If we divided each number by the total number of subjects, we would have joint probabilities. If we multiplied those numbers by 100, we would have joint percents. Compared to the raw data, it's pretty easy to see from the joint frequencies that, in this sample, black hair predominates among males and red hair predominates among females. For nominal and ordinal data, it is pretty standard practice to organize the counts into cross tables very early in the analytical game. Much of the analysis of categorical data is based on crosstabulations.

For categorical data with more than three categories, crosstabulation becomes awkward very quickly. At that point, stub and banner tables can be useful.

You've probably seen them in newspapers and magazines, tables with column and row headings but where the columns and/or rows are grouped into categories. By nesting groups of row and columns, you can conveniently represent up to six sets of categories. For more than six dimensions, things get sticky for even stub and banner tables and you have to get creative to present the information.

Pivot tables are one of the greatest tools created by spreadsheet designers. With them, you can interactively create crosstabulations, stub and banner tables, and summary6 tables and just keep working with them until you end up with something you like. Today, most spreadsheets have pivot table capabilities.

To really get a point across, use graphs. Visual media make concepts intuitive. To say, "As temperature increases, plant growth increases...to a point," gives an idea, but to graph the growth of a plant in relation to the temperature of its environment makes it immediate and obvious.

Today, the first thing a statistician does with data is to graph it. A graph of unfamiliar data gives a first impression of what's going on and how the statistician might appropriately continue an analysis of the data. A histogram gives an idea of what kind of process generated the data - what family of distribution the numbers follow. A scatterplot will usually make clear how the variables relate - positively, inversely, linearly, nonlinearly. Many (many!) other visualization tools are available to mine all the relevant information out of data before research questions are formulated and tested.

Often, a fruitful problem solving technique is to just make some sketches. Drawing brings the visual parts of the brain into play. Relationships that are not immediately obvious can be clarified by seeing them. The first step in solving any geometrical problem is to sketch it, but even other mathematical problems can be tamed by writing out the relationship and juggling the symbols until a useful representation is gained.

Transforming data

There are a lot of reasons why a statistician might want to make data values the argument of a function. For instance, if the data is stretched out over several orders of magnitude, a logarithm function with contract the whole range to a manageable size. Think about it: the logarithms (base 10) of 1, 10, 100, and 1000 are 0, 1, 2, and 3. Using some mathematical function to modify all data values in a data set is called "transforming the data" and the function is called a transformation. You can transform data, preform a statistical analysis, and then get the original sense of the data back by using the inverse function.

Series and sequences

Most computer programming aptitude tests contain items in which you have to look at a sequence of numbers, letters, or symbols and figure out the next item in the sequence. There's a reason for that. A common problem in computer programming is to make the computer generate a specific sequence. To do so, the programmer has to recognize what the sequence is.

Perhaps the most common sequence is the sequence of counting numbers: 1, 2, 3, 4, 5, 6, ... . That is also what is called an arithmetic sequence, a sequence of numbers where any number is the sum of the number before it plus a constant (in this case, the constant is 1.) In the instance where each number is the product of the preceding number times a constant, the sequence is called a geometric sequence.

Sequences and the related series are basic to mathematics. A sequence is an ordered list of numbers, usually infinite in length; and a series is the sum of such numbers. At first thought, an infinite sequence of number would seem to sum to infinity but that isn't always the case. For instance, the sum of fractions of the form 1/2^{n} is equal to 1. Such a series is said to be convergent; it converges on the value 1. Some sequences diverge to infinity.

Series are so important to mathematicians that they have their own language of symbols. The Greek capital letter sigma (Σ) is used to denote a summation. If there is a note like k=1 below the sigma, that means that the summation starts with the element in the series that has k equal to 1. the end value of k, then, is placed above the sigma. There are a lot of other symbols that might be used with the sigma and then there is the capital pi symbol that indicates a serial product instead of a sum.

But just to show you how important a series is to mathematician, think about the way you might calculate the value of pi. Of course, since pi is an infinite, nonrepeating decimal fraction, you can't calculate pi exactly, but you can calculate it to any specified degree of precision. Since pi is the ratio of the circumference of a circle (any circle) to it's radius, you could just draw a circle very accurately and measure its radius and circumference, then calculate the ratio. Try it and compare your result with the value of pi to 10 decimal places: 3.1415926535. Of course, you can buy (or download) books that gives the value of pi to thousands of decimal places and those numbers are precise. How do they do that?

Well, it just so happens that

Here are the first ten items in the sequence and the summation is in the last column.

Term |
| Sum |

1 | 3 | 3 |

2 | 0.1666666667 | 3.1666666667 |

3 | 0.0333333333 | 3.1333333333 |

4 | 0.0119047619 | 3.1452380952 |

5 | 0.0055555556 | 3.1396825397 |

6 | 0.003030303 | 3.1427128427 |

7 | 0.0018315018 | 3.1408813409 |

8 | 0.0011904762 | 3.1420718171 |

9 | 0.0008169935 | 3.1412548236 |

10 | 0.0005847953 | 3.1418396189 |

Notice how the sums bracket the value of pi, getting closer and closer. Notice also that each successive term is progressively smaller than the one before it. That indicates that the series might be converging (but not necessarily). After 10 terms, the sum is already within three decimal points of pi.

There are series that converge more rapidly.

There are serial representations for many, otherwise very difficult to calculate, constants and functions.

By the way, the plural of "series" is "series". They're like deer.

Series also appear in combinatorics, calculus, and many other branches of mathematics.

Differential equations

An equation in which the dependent variable is a rate of change (or a differential, meaning that the statement equated to the dependent variable is a derivative function) is a differential equation. A differential is the rate of change of a variable in relation to another variable as that other variable becomes very small (infinitesimal). In other words, the differential is an instantaneous rate of change. The statement describing the dependent variable can also contain differential terms.

A course about differential equations leads you into deep waters. There are many differential equations that have no known analytical solutions but there will always be, at least, a numerical or graphic solution. Yet, differential equations arise often in the sciences - in fact, any time that a quantity is changing in relation to some other quantity or quantities.

For instance, say that the rate of change of a variable increases as another variable decreases. In other words the change of the dependent variable is inhibited by the independent variable. This would generate one of the simplest differential equations:

y'=1/x

Balancing work for differential equations just like any other equation so, to solve for y, all you have to do is apply the inverse function of differentiation to both sides of the equation. The inverse of differentiation is integration and the integral (antiderivative) of a reciprocal is the natural logarithm, so you end up with:

y=ln x

The solution of many differential equations is much more complicated. Quite a few of those cryptic functions available in spreadsheets such as Bessel functions and Legendre functions are solutions to involved differential equations that occur in science.

So we've covered the standard mathematics curriculum - arithmetic, algebra, geometry, pre-calculus, calculus, and differential equations - but **wait**! There's more.

Complexities

The kinds of numbers that mathematicians work with are many. They appear to deal with glitches in prior number systems. For instance, negative numbers arose from the fact that you can subtract yourself right out of the set of counting numbers. Fractions appeared because the result of division is often not a whole number. Things got really weird when people contemplated the square root of -1.

You see, any number, positive or negative, when multiplied by itself, is equal to a positive number, so how can there be a square root of a negative number, but it turned out that such a weird concept is actually useful in many contexts and it does arise in the real world.

Before Leonard Euler connected the square root of negative 1 with the real world, mathematicians figured that it was just a useful convention - an imaginary number that couldn't exist in the real world. In fact, it and numbers like it, the square root of negative 1 multiplied by some other number, were called imaginary numbers. When an imaginary number is added to a real number, the result is called a complex number.

Complex numbers are often used to describe two value functions such as amplitude and phase of electronic signals.

Vectors

A complex number can be thought of as an ordered pair. The first value represents the real part and the second represents the imaginary part. There are other forms of imaginary-real number entities such as quaternions, which can be represented by four values, one real and three imaginary.

These ordered sets of numbers are called vectors. Vectors can represent imaginary/real numbers. They can also represent physical directed quantities such as velocities, which describe the speed and direction of an object in space. Geometrically, these quantities are represented by arrows pointing in the direction of motion. The length of an arrow represents the magnitude of the motion - its speed. Many physical quantities are directed quantities - acceleration, force, work, current, fields.

The statistical quantities of individual cases can also be considered vectors. For instance, consider a study where the relationship between general health and calories per day consumed, B vitamins consumed per day, and minutes of exercise per day. General health is related, maybe by a regression procedure to a vector consisting of three values: calories per day, B vitamins per day, and minutes of exercise per day.

Vector analysis is a branch of mathematics that treats vectors as numerical entities in their own right. For instance, you can add, subtract, multiply, invert, divide, and take roots of vectors. Function and equations can have vector values and matrices can also contain vector values.

Matrices

Oh, yes - matrices.

If you don't want to do one calculation at a time for a big set of numbers, it turns out you can work with all the numbers at the same time as though they were a single number.

A matrix is a rectangular, ordered table of numbers. Each position in the table is labelled by the row and column it's in. The upper left position of matrix A is labelled a_{1,1}. The next position to the right is labelled a_{1,2}. One position down from that is a_{2,2}. The first number of the subscript is the row position and the second is the column position. Matrices can be added, subtracted, multiplied, divided, etc. just like regular numbers and that saves a lot of work.

For instance, when solving a system of equations, say:

2x_{1}-5x_{2}+3x_{3}=12

-2x_{1}+x_{2}-x_{3}=6

2.5x_{1}+0.5x_{2}-x_{3}=10

You can juggle the equations around until you get an expression for one of the three variables, then back solve for another variable, then plug those two variables into the last equation and solve for the third. That's a lot of work.

Or.....you can just place the coefficients into two matrices:

A = 2 -5 3

-2 1 -1

2.5 0.5 -1

B = 12

6

10

And solve the equation:

AX=B

for X and you'll get a matrix with the values of the variables.

Okay, this isn't a tutorial on matrix algebra so I'm not going to explain it here. We'll actually get into it later, especially when we talk about regression analysis, because it uses a lot of matrix algebra to turn a very complicated routine into a simple juggling of matrices. Right now, just trust me that matrix algebra (also called linear algebra) is a shorthand for juggling a large number of values.

Logic

Mathematics and logic are how scientists think about the world. As is oft stated, mathematics and logic are the languages of science.

There are two kinds of logic and I will, explicitly or implicitly, be using them throughout everything that follows in this series.

Deductive logic is concerned with valid arguments. In a valid argument, true conclusions follow necessarily from true premises, but there is a cost to such certainty. You can't get anything new from deductive logic; you can only get insights into what's already there. Detectives use a lot of deductive logic. For instance:

If the door was locked from the inside, and there is no other way into the room, then the murderer is still in the room.

The door was locked from the inside, and there is no other way into the room;

therefore, the murderer is still in the room.

"The murderer is still in the room," is information that is already included in the rest of the argument. The purpose of the argument is to point out implicit information that might not be immediately obvious.

So, how do you discover new information? Inductive logic is the answer. Inductive logic looks at what information is available, makes correlational and causal connections and then makes acceptable guesses about what they imply. The cost? Inductive arguments can only give probabilistic conclusions. Inductive conclusions always come with some amount of uncertainty. The good news is that the amount of uncertainty can be determined, often quantitatively. That is, in fact, what inferential statistics is for.

If you want to go deep into logic, there are many good books and websites out there, and I'm working (albeit, slowly) on one of the LabBooks on this site that deals with logic.

Research

Statistics and research are intimately connected. The scientific method consists of observing a phenomena, looking up all the relevant information extant on the topic, coming up with a hypothesis to explain it, and then testing the hypothesis. Replication is important to validate the results of the research: the hypothesis is tested over and over and from many different angles. Statistics comes in as a tool to test hypotheses.

Most statistical methods are designed in order to disconfirm an hypothesis if possible. It's much easier to disconfirm than it is to confirm - all you need is one counter-example. Also, most researchers have a bias toward confirming their pet hypothesis and that bias may be even more virulent because it is usually subconscious.

Many research designs have been worked out to deal with studies that range in complexity from simple case observations to mind-numbingly complex situations with many variables, sources of error, controls, and layers upon layers of chaotic processes.

At the very least, a good research design will include some way to control error and irrelevant conditions. One of the most important control is randomization of sample selection. This helps to control for researcher bias and will also tend to cause random errors in measurement to cancel out.

Control groups are also important. Researchers try to compare the phenomenon they are observing with some neutral cases that do not display the phenomenon. For instance, in a test of a new drug, the people who take the drug will be compared with a group that does not take the drug. It's easy, then to test if there is any difference between the groups. If there is no difference, then what is observed is not due to the drug.

Significance

Now, we can start looking at some of the fundamental concepts of statistics and we will start at one of the most difficult and least well grasped concepts. Let's open it up and look under the hood.

Research studies have significance. Specifically, the relationships we are looking at are either significant or insignificant. And, here, I am talking about a very specific kind of significance, statistical significance. It is symbolized by the letter p; p is a probability and, therefore, will be positive and will never exceed 1.

In the early days of real science (think the Bacons - Roger and Francis), the goal was to come up with ideas about what was happening in an observed phenomenon (hypotheses) and to test these ideas. In other words, if I were to hypothesize that water always flows downhill, I would set up a series of situations and test whether, in each situation, water flowed downhill. Soon, scientists began discussing this method and finding flaws.

First, it's pretty obvious that, if a researcher's pet hypothesis is that water always flows downhill, that he will likely see that water flows downhill whether it does or not in all situations. In other words, researchers are biased in the direction of their pet hypotheses.

Another factor that might not be so obvious at first is that it's much, much easier to prove that something is not the case than to prove that it is the case. All you need is one counter-example.

So, beginning in the 19th century and culminating in the 20th century, the goal of research shifted from validating hypotheses to refuting hypotheses. If you honestly try to destroy your own hypothesis and you can't, there's a really good chance that your hypothesis is a correct one.

Once you start really looking at statistical tests - inferential statistics, you will notice that all of them test the null hypothesis - the hypothesis that what you're seeing isn't what you think you're seeing. In the example above, perhaps what you think your seeing, that water always goes downhill, is just an anomaly of your locality. Perhaps you need to look at a lot of cases and check the probability that, sometimes, water might decide to go uphill.

That's what statistical tests do. They test that there is no difference between groups exposed to what you're studying and groups that are not exposed to what you are studying; or that the effect goes in the opposite direction than you expect. In a way, modern research uses reductio ad absudum.

In a purely qualitative manner, you may notice that, when water hits a barrier with enough force, it sometimes goes uphill; or, you get on a space ship and notice that, in space, no one can hear you scream when water goes in all directions, including uphill.

We will be going into the statistical methods in much more detail later, but in brief, most classical statistical procedures look at the averages of the study groups and checks to see if the differences could possibly be due to simple random error. To summarize this effect, the statistician comes up with a value, p.

A lot of people, even statisticians, think that p is the probability that the null hypothesis is true, in it usually doesn't hurt to think that. It usually works for government work, but it's not what p really means.

What p actually is, is the probability that, if you repeat your study a large number of times, you will see that there is actually no difference between the groups - more generally, that the null hypothesis holds. If p=.25, then you can expect that 25 out of 100 times, you will not be able to distinguish between two study groups (assuming that your alternate hypothesis is the there are two groups and they are actually different). 25 out of 100, or 1 out of 4 is really not a very good p if you really want to validate your alternate hypothesis.

What would be a good outcome for validating your hypothesis is a p of less than 0.05; p=0.01 is even more stringent and is a very good outcome. That means that you will have problems distinguishing between your two groups (the group with water flowing in your locality and the group with water flowing in another locality, for instance) only 1 time out of 100. That one time could easily be written off as caused by irrelevant conditions.

But should a researcher ignore a study with a p of 0.25? Weeeell, not really. The mere fact that they expected to see one thing and found something else should be of interest to everyone. Science often disconfirms common sense and, when it does, there's very good cause to ask "why?". On another level, the fact that the researcher saw what they saw indicates that there may be something there anyway. They may decide to take a cautionary course and say, "This study does not confirm our hypothesis, but we recommend further study. It may be that our research design was flawed."

There is a more technically accurate interpretation of p. How much can we trust what we have seen in a study? p is actually a measure of reliability of the result of a study. A large p value indicates that you can't put a lot of faith that the observed relationship between groups is a reliable indicator of the true state of affairs. It is the probability of error in accepting your observed relationship as valid.

Understand firmly that research results are always probabilistic. Even with a p of 0.001, there is still a possibility that that 1 observation out of a thousand will indicate the true state of reality. It's not very likely, but there is that possibility.

Validity and reliability

I mentioned above that p is a measure of reliability, so this might be a good place to eliminate confusion about reliability and validity. That's easy enough. Validity answers the question, "Does it measure what it's supposed to measure?" and reliability answers the question, "If I repeat this measure over and over, will it always give me the same answer, or how close will the answers be?"

We'll get around to how these things are measured, but for now, let's take an intuitive look at what they mean.

A research study measures something - it measures how well you can trust that a hypothesis is correct. Let's say that you perform the same study 10 times and, each time, you get answers that are very similar. That would indicate that the study design is reliable, but is it valid? Not necessarily.

Perhaps you want to measure the effects of light on ripening of fruits in your greenhouse but, unbeknownst to you, as your greenhouse warms up, ethylene is produced from some spilt chemicals. Ethylene is known to speed the ripening of fruit. Your experiment is measuring something, but it's not measuring what you want it to, so the experiment is not valid.

How statistical tests work

I've said that a ratio can be used to compare two values. Most statistical tests use the ratio of how much variation in the dependent variable is explained by the independent variable (the variation common to both variables) to the total amount of variation in the dependent variable. Obviously, it this ratio is 1, all the variation in the dependent variable is shared with the independent variable and the relationship between the two is very strong. On the other hand, if the ratio is zero, then what the independent variable does has very little to do with what the dependent variable does and the relationship is very weak or nonexistent.

But the ratio isn't enough to quantify the relationship between the two variables. As we'll see later, "significance" depends on the size of the sample the researcher is looking at. In a very large sample, even a very weak relationship between two variables looks significant. In very small samples, even a very strong relationship will look insignificant. Statistical tests have to take sample size into consideration.

It would be nice if the significance of the differences in variation increased regularly, in a linear fashion, with the differences in variation; then significance for different sized samples would vary only by the slope of the relationships. Unfortunately, it's a little more complex. The probability function is normally not linear - in fact, it is usually normal, or nearly normal, but we understand normal distributions, they're common in nature, and we can deal with them.

Why the normal distribution is so important

I go very deep into the normal distribution here:

http://www.theriantimeline.com/excursions/standards

so I won't retrace my steps, but I will explain why it is so common in nature and why so much in statistics is based on it.

If you think about it, you can probably convince yourself that many, if not most events in nature are composed of many small events that have two or a small number of possible states - What do I want to do? - exercise or eat.....hmmmm, eat it is. Where do I want to eat? Burger Haven is close by. Do I want my usual or something else? Usual...and then, as you sit down, you see an old friend walk in. Well, they've made a similar series of small decisions that have brought you two together.

The popular game show, The Price Is Right, had a frequent game called plinko. The equipment consisted of a big board with pegs sticking out of the face. The board was tilted so that contestants slide Frisbee sized pucks down between the pegs and had to guess which bin at the bottom the pucks would fall into.

If a large number of pucks were slid down the board, most would end up in the middle and fewest would be out to the sides. The result would be a curve something like this:

As more and more Plinko chips pile up, the picture would look more and more like the classical curve called a "bell curve" which is the graphical representation of....yes, a normal distribution.

This process is so common in nature that classical statistics automatically assume that data is distributed in a normal fashion, with most data near the middle and fewer and fewer occurring at large and small values.

The problem is that not all data follow a normal distribution.

Probability distributions

A probability distribution shows graphically how many data points are within equal sized ranges within the full set of data points. In the normal distribution, most values are around the middle sized values. The values can be any fractional value between the maximum and minimum values making the normal distribution a continuous function.

Discrete probability functions can only take certain values. For instance, the probability distribution for the values of consecutive throws of a dice can only take the values 1, 2, 3, 4, 5, or 6. Since each value has a probability of 1/6, this would be a discrete uniform distribution.

On the other hand, the probability distribution for tossing six heads with a coin would be more lop-sided and would describe a binomial distribution.

There are many different probability distributions encountered by statisticians and they have to be careful what procedures they use to analyze data - one procedure does not fit all!

Expectation

If you had a bag with many slips of paper, each paper having a value on it, and you had to guess the value on a random slip of paper pulled from the bag, what would your guess be? What would you expect the value to be?

The answer should be "the mean" of all the values in the bag. A useful definition for the mean is "expected value". Often "mean" and "arithmetic average" are used interchangeably, but there are many different kinds of means and they serve as expected values in different situations, especially related to different probability distributions. The arithmetic average is the appropriate mean for a normal distribution. For instance, if you had to guess the height of a student randomly drawn from a classroom (height is usually normally distributed), your best guess would be the arithmetic average of the heights of all the students in the classroom.

Samples and populations

If your data set contains data on every subject relevant to a study, the group of subjects is called the "population". If it only contains a subset of all the relevant subjects, it is called a "sample". rarely does a researcher have access to all the subjects relevant to a study, for instance, a researcher studying cardiac disease would not have access to everyone that has a cardiac disease. In such a case, the researcher will want to draw a sample of people with cardiac disease which is representative of the whole population. Sampling is a major concern of statisticians and we will be looking ta it in much detail later.

One concern is that, the larger a sample is, the more it will look like the population. For instance, the average of a large random sample will be closer to the average of the population than that of a small sample.

Computer formats

Statistical analyses are usually performed on computers today. In fact, there are many popular statistical procedures that are so involved that it is very inconvenient or impossible to do them without a computer. That said, there are some common conventions in place for dealing with data using a computer.

Most data files are set up such that individual rows contain individual cases (subjects, files) and columns contain variables (parameters, fields). For instance, in that cardiac disease study, a data set will have a row for each subject and there might be a column for number of heart attacks in the last ten years, one for whether the subject smokes or not, a column that tells whether they drink or not, etc.

Most programs that process data assume this layout.

The most common exception would be arrangements used for nominal or ordinal data. Usually counts for such data are arranged in crosstabulations where cells contain joint frequencies - counts of subjects in overlapping categories. For instance, the number of male and female cardiac patient who smoke or do not smoke might be arranged like that. It would look something like:

This (fictional) crosstabulation indicates that 32 of the female cardiac patients out of 49 smoke.

Randomization

Computers are often called on to provide random numbers for processes such as simulations or statistical procedures (especially nonparametric statistical procedures.)

That brings up the question of whether anything a computer does is actually random. For instance, the difference between most computer algorithms for randomized numbers and the roll of a dice is usually ne only of degree of complexity. The roll of a dice is certainly not random. It is unpredictable because there are so many effective causes that go into which face lands up - spin of the hand, speed of the dice, air drafts, etc., etc. The dice itself might even be biased, showing a marked preference for certain faces.

An early incarnation of OpenOffice Calc brought many complaints from users who tested the randomizer algorithm behind the spreadsheet's RAND function. It was supposed to provide a uniform deviate - every value between 0 and 1 was supposed to have an equal chance of appearing as every other value; yet, values near 0 and 1 were very much underrepresented. But OpenOffice found another algorithm and, today, if you generate, say, one thousand values using the RAND function and create a histogram of the values, you will find a nice flat curve that terminates right at 0 and 1.

The algorithm that generates numbers does quite well at simulating a random number generator but, if you open the macro for the function and look at it, you will find a Rube Goldsbergian sequence of operations on large numbers that operate more on complexity than randomization; therefore, the process should be called "pseudorandom". But that's okay. For most purposes, a pseudorandom number generator will give acceptable results.

If you need truly random numbers, you will have to pay the price. You can get hardware that uses quantum processes to generate values. Actually, the equipment is coming down in price and can be fairly affordable, and can be small - about the size of a memory stick. A randomizer usually uses an unstable electronic flip-flop circuit to generate a string of bits that can be translated into a decimal fraction. Some more expensive ones may even use the decay of a radioactive isotope. Nevertheless, a software algorithm is going to be pseudorandom because, one way or another, the result is going to be mechanically determined by the logic of the code.

One way that randomization is used in statistics is in picking individuals out of a large, unwieldy population to serve as a sample. Samples that are chosen by the researcher, regardless of how they try to choose randomly, will have some bias in them and cannot be considered representative of the population.

Another way randomization is used is in the Monte Carlo method of generating statistics. We have seen that a large number of random sets sampled from a data set follow the Central Limit Theorem. In other words, the averages of such a large group of samples tend to be distributed normally, even if the original data set is not normal. Other statistics behave similarly. Therefore, Monte Carlo methods can be used to generate statistics from non-normal data, and even from data from which statistics cannot be calculated using straightforward formulas!

Simulation

In nature, many processes are loaded with complexity and chaos and it's just plain hard to get a handle on them directly - try to study a hurricane!

Often, what researchers will do is create models that include most of the relevant factors that affect the phenomenon. Then they can study the simulation instead of the actual phenomenon. Creation of such a model is front-end intensive. A lot of work goes into making sure that the phenomenon is accurately represented and the model has to be tested and tweaked to make sure that it behaves like the real thing.

But if you track hurricanes, you will have noticed that the weather systems do not broadcast what the storm is going to do; instead, they give a number of probable tracks generated by several models. That is, honestly, the best they can do, but the collected forecasts can usually give a fairly useful envelope of tracks and strengths specifying a range that the storm will likely fall into. As the storm gets closer, probable models drop out of the scenario.

Statistics and geography

And, speaking of hurricanes, a whole field of geostatistics has been developed to deal with data that includes geographic information; and, of course, GPS has opened up a broad new world of geographic data.

Geographical statistics, in a way, is an offshoot of the older time series data because, geographic series can be seen as working much like time series involving two dimensional (and, sometimes, three dimensional) coordinates instead of the one time coordinate.

For instance, instead of tracking how a disease spreads over time, it is instructive to use the same principles to track how it spreads across geographic regions, perhaps over time.

Dealing with text

Statisticians and researchers occasionally have to deal with text, but words, by their very nature, are not quantitative, so what do they do.

Well, in a very basic way, words are qualitative in that they are represented by binary numbers in the computer's memory, so they can be sorted and listed like numbers.

Also, spreadsheets and statistics programs usually have text functions to do simple things like return the length of words and change word formats. Often you can even count words and calculate things like the average word length in a passage. DANSYS, for instance, even has functions to estimate the reading level of a passage and to calculate the "distance" between two passages. In this context, the distance between words and passages is the minimum number of editorial changes you have to make to one passage to turn it into the other passage.

And speaking of "context" the real fun comes in what's called "context analysis". Interesting patterns often emerge in passages when a researcher starts counting words, comparing words counts in different passages or documents, and correlating which words occur with other words. They begin to be able to make statements about whether a particular author is really responsible for a specific document. And the prevalence of specific words in a document can often tell things about the meaning of an ambiguous passage or the mental state of the author.

Problem solving

I think I mentioned in the introduction that statistics is not primarily mathematics. What it is, is problem solving. A person has data and they have questions. How will they best get from the data to the answers? Statistics (and other forms of analysis) provide the path, but what is required is an agile mind and one that is knowledgeable in the tools that are available to tease out the significant patterns in the data. Both grow from experience.

The rest of the Stat Files will deal with the tools...and, of course, the stories.

Copyright 2010 The Therian Timeline. All rights reserved.

Selma, AL 36701

wolf