Newton's Method

Imagine that you were tasked with solving the following equation:

x2 + 2x - 8 = 0

Easy peasy. You'd just look up the quadratic formula that you learned in school, apply it and come up with the two solutions (or roots) of the equation: x = 2 and x = -4. Now imagine that you were tasked with solving the equation:

x8 + 3x4 - 4 = 0

With a lot of head-scratching and tedious pen-and-paper calculation, it is possible to solve analytically cubic and even quartic equations. However, equations of 5th order and higher have no general solutions. By using root finding methods you can solve these equations numerically, eventually coming up with all the roots you need to whatever degree of precision you require.

The Newton-Raphson method (Newton's method for short; after all, who the heck is Raphson? Answer: English mathematician Joseph Raphson who derived the method independently of Newton) is just one of many root finding methods but is often the method of choice due to its simplicity and speed. Here it is in its generalized form for real values x:

xk+1 = xk - R·------

Ignore R for the moment by pretending that it's 1. In case the notation isn't familiar to you, it means the differential of f where f is any old mathematical function. If you don't know what "differential" means, check out the five minute guide to calculus.

Applying Newton's Method

Imagine that you were tasked with solving the following equation:

x3 - 1 = 0

(Imagine also that you are too stupid to be able to see instantly that at least one solution is 1). The iterative function for Newton's method will look like this:

            xk3 - 1
xk+1 = xk - --------

Now imagine that you are stupid enough to figure that 2 might be a sensible guess to the solution. Here, including the initial dumb guess, is what pops out: 2, 1.4167, 1.1121, 1.0106, 1.0001. So after just four iterations, we end up with a solution correct to three decimal places which should be sufficient for most applications. The rate of convergence is, on average, quadratic which means that the number of accurate digits roughly doubles with every step (0, 1, 1, 2, 4 in our simple example). In practice, since we don't know the answer ahead of time, we stop calculating when the difference between two steps falls below a certain threshold and figure that the value of x at that point is the value we're after.

How It Works

No matter how sharply curved the graph of a function may be, if you get in close enough it looks like a straight line (unless, of course, said curve is fractal). This is what the differential of a function gives you: a straight line, called the tangent, whose slope you can measure. If we follow this line up or down to the x-axis, so that y = 0, then the value of x at this point is assumed to be a better guess and fed into the equation for the next iteration.

The Relaxation Parameter

Remember, R, the thing that I told you to pretend was 1? It is called the relaxation parameter. It's generally written as A, but I call it R because I'm a programmer and "R" for "relaxation parameter" is a lot more meaningful than "A" for nothing at all. It is used to tune the rate of convergence. Imagine that you had a problem function that overshot the root you were seeking and then undershot it and overshot it and... With R< 1 you can reduce the step size of each iteration and solve the overshooting. This is called under-relaxation. Or, imagine that you had a function that converged on a solution at a linear rate rather than a quadratic one. This will happen when two roots are extremely close or when one of the roots is a solution more than once (in mathematical terms, its multiplicity is >1). In this case R>1 will increase the step size and improve the rate of convergence. This is called over-relaxation. Randomly tinkering with R is far more likely to make the method less effective than more effective so when required, its value is carefully tuned for the specific problem domain. For our purposes, since we are more interested in generating fractals than in solving equations, it's just another parameter to play with (and a quite effective one at that).

Applying Newton's Method to the Complex Plane

19th century British mathematician Arthur Cayley investigated the application of Newton's method to values in the complex plane. Specifically, he asked the question, given any starting value, is it possible to predict which root the value will converge to? Analyzing the family of quadratic polynomials, he found that it was. Note that we can transform any quadratic function into the form z2 + c, so the simple case can be generalized to more complex cases. Here's the result for the function:

f(z) = z2 - 1
           z2 - 1   z2 + 1
N(z) = z - ------ = ------
             2z       2z
Basins of attraction for z2 - 1 = 0

The two bright spots are the roots of f lying, as expected, at 1 and -1, the square roots of 1. The basins of attraction for 1 and -1, written as A(+1) and A(-1) are:

A(+1) = {z : Re(z) >0}
A(-1) = {z : Re(z) <0}

In English, the basin of attraction for +1 is all values of z that have a real component >0, while that for -1 is all values whose real component is <0. The common boundary of the basins, denoted by ∂A, is given by

A(+1) = ∂A(-1) = J

where J represents the imaginary axis. Furthermore,

N(J) = J = N-1(J)

meaning that set of values J is invariant under N, i.e. numbers that lie on the imaginary axis move up and down the imaginary line but never away from it under iteration so that J stays the same.

Cayley left the analysis of cubic functions as an open problem. As we'll see that turned out to be rather tricky.

Back to the index