18. Computing Square Roots#

18.1. Introduction#

Chapter 24 of [Russell, 2004] about early Greek mathematics and astronomy contains this fascinating passage:

The square root of 2, which was the first irrational to be discovered, was known to the early Pythagoreans, and ingenious methods of approximating to its value were discovered. The best was as follows: Form two columns of numbers, which we will call the a’s and the b’s; each starts with a 1. The next a, at each stage, is formed by adding the last a and the b already obtained; the next b is formed by adding twice the previous a to the previous b. The first 6 pairs so obtained are (1,1),(2,3),(5,7),(12,17),(29,41),(70,99). In each pair, 2a2b2 is 1 or 1. Thus b/a is nearly the square root of two, and at each fresh step it gets nearer. For instance, the reader may satisy himself that the square of 99/70 is very nearly equal to 2.

This lecture drills down and studies this ancient method for computing square roots by using some of the matrix algebra that we’ve learned in earlier quantecon lectures.

In particular, this lecture can be viewed as a sequel to Eigenvalues and Eigenvectors.

It provides an example of how eigenvectors isolate invariant subspaces that help construct and analyze solutions of linear difference equations.

When vector xt starts in an invariant subspace, iterating the different equation keeps xt+j in that subspace for all j1.

Invariant subspace methods are used throughout applied economic dynamics, for example, in the lecture Money Financed Government Deficits and Price Levels.

Our approach here is to illustrate the method with an ancient example, one that ancient Greek mathematicians used to compute square roots of positive integers.

18.2. Perfect squares and irrational numbers#

An integer is called a perfect square if its square root is also an integer.

An ordered sequence of perfect squares starts with

4,9,16,25,36,49,64,81,100,121,144,169,196,225,

If an integer is not a perfect square, then its square root is an irrational number – i.e., it cannot be expressed as a ratio of two integers, and its decimal expansion is indefinite.

The ancient Greeks invented an algorithm to compute square roots of integers, including integers that are not perfect squares.

Their method involved

  • computing a particular sequence of integers {yt}t=0;

  • computing limt(yt+1yt)=r¯;

  • deducing the desired square root from r¯.

In this lecture, we’ll describe this method.

We’ll also use invariant subspaces to describe variations on this method that are faster.

18.3. Second-order linear difference equations#

Before telling how the ancient Greeks computed square roots, we’ll provide a quick introduction to second-order linear difference equations.

We’ll study the following second-order linear difference equation

(18.1)#yt=a1yt1+a2yt2,t0

where (y1,y2) is a pair of given initial conditions.

Equation (18.1) is actually an infinite number of linear equations in the sequence {yt}t=0.

There is one equation each for t=0,1,2,.

We could follow an approach taken in the lecture on present values and stack all of these equations into a single matrix equation that we would then solve by using matrix inversion.

Note

In the present instance, the matrix equation would multiply a countably infinite dimensional square matrix by a countably infinite dimensional vector. With some qualifications, matrix multiplication and inversion tools apply to such an equation.

But we won’t pursue that approach here.

Instead, we’ll seek to find a time-invariant function that solves our difference equation, meaning that it provides a formula for a {yt}t=0 sequence that satisfies equation (18.1) for each t0.

We seek an expression for yt,t0 as functions of the initial conditions (y1,y2):

(18.2)#yt=g((y1,y2);t),t0.

We call such a function g a solution of the difference equation (18.1).

One way to discover a solution is to use a guess and verify method.

We shall begin by considering a special initial pair of initial conditions that satisfy

(18.3)#y1=δy2

where δ is a scalar to be determined.

For initial condition that satisfy (18.3) equation (18.1) impllies that

(18.4)#y0=(a1+a2δ)y1.

We want

(18.5)#(a1+a2δ)=δ

which we can rewrite as the characteristic equation

(18.6)#δ2a1δa2=0.

Applying the quadratic formula to solve for the roots of (18.6) we find that

(18.7)#δ=a1±a12+4a22.

For either of the two δ’s that satisfy equation (18.7), a solution of difference equation (18.1) is

(18.8)#yt=δty0,t0

provided that we set

y0=δy1.

The general solution of difference equation (18.1) takes the form

(18.9)#yt=η1δ1t+η2δ2t

where δ1,δ2 are the two solutions (18.7) of the characteristic equation (18.6), and η1,η2 are two constants chosen to satisfy

(18.10)#[y1y2]=[δ11δ21δ12δ22][η1η2]

or

(18.11)#[η1η2]=[δ11δ21δ12δ22]1[y1y2]

Sometimes we are free to choose the initial conditions (y1,y2), in which case we use system (18.10) to find the associated (η1,η2).

If we choose (y1,y2) to set (η1,η2)=(1,0), then yt=δ1t for all t0.

If we choose (y1,y2) to set (η1,η2)=(0,1), then yt=δ2t for all t0.

Soon we’ll relate the preceding calculations to components an eigen decomposition of a transition matrix that represents difference equation (18.1) in a very convenient way.

We’ll turn to that after we describe how Ancient Greeks figured out how to compute square roots of positive integers that are not perfect squares.

18.4. Algorithm of the Ancient Greeks#

Let σ be a positive integer greater than 1.

So σI{2,3,}.

We want an algorithm to compute the square root of σI.

If σI, σ is said to be a perfect square.

If σI, it turns out that it is irrational.

Ancient Greeks used a recursive algorithm to compute square roots of integers that are not perfect squares.

The algorithm iterates on a second-order linear difference equation in the sequence {yt}t=0:

(18.12)#yt=2yt1(1σ)yt2,t0

together with a pair of integers that are initial conditions for y1,y2.

First, we’ll deploy some techniques for solving the difference equations that are also deployed in Samuelson Multiplier-Accelerator.

The characteristic equation associated with difference equation (18.12) is

(18.13)#c(x)x22x+(1σ)=0

(Notice how this is an instance of equation (18.6) above.)

Factoring the right side of equation (18.13), we obtain

(18.14)#c(x)=(xλ1)(xλ2)=0

where

c(x)=0

for x=λ1 or x=λ2.

These two special values of x are sometimes called zeros or roots of c(x).

By applying the quadratic formula to solve for the roots the characteristic equation (18.13), we find that

(18.15)#λ1=1+σ,λ2=1σ.

Formulas (18.15) indicate that λ1 and λ2 are each functions of a single variable, namely, σ, the object that we along with some Ancient Greeks want to compute.

Ancient Greeks had an indirect way of exploiting this fact to compute square roots of a positive integer.

They did this by starting from particular initial conditions y1,y2 and iterating on the difference equation (18.12).

Solutions of difference equation (18.12) take the form

yt=λ1tη1+λ2tη2

where η1 and η2 are chosen to satisfy prescribed initial conditions y1,y2:

(18.16)#λ11η1+λ21η2=y1λ12η1+λ22η2=y2

System (18.16) of simultaneous linear equations will play a big role in the remainder of this lecture.

Since λ1=1+σ>1>λ2=1σ, it follows that for almost all (but not all) initial conditions

limt(yt+1yt)=1+σ.

Thus,

σ=limt(yt+1yt)1.

However, notice that if η1=0, then

limt(yt+1yt)=1σ

so that

σ=1limt(yt+1yt).

Actually, if η1=0, it follows that

σ=1(yt+1yt)t0,

so that convergence is immediate and there is no need to take limits.

Symmetrically, if η2=0, it follows that

σ=(yt+1yt)1t0

so again, convergence is immediate, and we have no need to compute a limit.

System (18.16) of simultaneous linear equations can be used in various ways.

  • we can take y1,y2 as given initial conditions and solve for η1,η2;

  • we can instead take η1,η2 as given and solve for initial conditions y1,y2.

Notice how we used the second approach above when we set η1,η2 either to (0,1), for example, or (1,0), for example.

In taking this second approach, we constructed an invariant subspace of R2.

Here is what is going on.

For t0 and for most pairs of initial conditions (y1,y2)R2 for equation (18.12), yt can be expressed as a linear combination of yt1 and yt2.

But for some special initial conditions (y1,y2)R2, yt can be expressed as a linear function of yt1 only.

These special initial conditions require that y1 be a linear function of y2.

We’ll study these special initial conditions soon.

But first let’s write some Python code to iterate on equation (18.12) starting from an arbitrary (y1,y2)R2.

18.5. Implementation#

We now implement the above algorithm to compute the square root of σ.

In this lecture, we use the following import:

import numpy as np
import matplotlib.pyplot as plt
def solve_λs(coefs):    
    # Calculate the roots using numpy.roots
    λs = np.roots(coefs)
    
    # Sort the roots for consistency
    return sorted(λs, reverse=True)

def solve_η(λ_1, λ_2, y_neg1, y_neg2):
    # Solve the system of linear equation
    A = np.array([
        [1/λ_1, 1/λ_2],
        [1/(λ_1**2), 1/(λ_2**2)]
    ])
    b = np.array((y_neg1, y_neg2))
    ηs = np.linalg.solve(A, b)
    
    return ηs

def solve_sqrt(σ, coefs, y_neg1, y_neg2, t_max=100):
    # Ensure σ is greater than 1
    if σ <= 1:
        raise ValueError("σ must be greater than 1")
        
    # Characteristic roots
    λ_1, λ_2 = solve_λs(coefs)
    
    # Solve for η_1 and η_2
    η_1, η_2 = solve_η(λ_1, λ_2, y_neg1, y_neg2)

    # Compute the sequence up to t_max
    t = np.arange(t_max + 1)
    y = (λ_1 ** t) * η_1 + (λ_2 ** t) * η_2
    
    # Compute the ratio y_{t+1} / y_t for large t
    sqrt_σ_estimate = (y[-1] / y[-2]) - 1
    
    return sqrt_σ_estimate

# Use σ = 2 as an example
σ = 2

# Encode characteristic equation
coefs = (1, -2, (1 - σ))

# Solve for the square root of σ
sqrt_σ = solve_sqrt(σ, coefs, y_neg1=2, y_neg2=1)

# Calculate the deviation
dev = abs(sqrt_σ-np.sqrt(σ))
print(f"sqrt({σ}) is approximately {sqrt_σ:.5f} (error: {dev:.5f})")
sqrt(2) is approximately 1.41421 (error: 0.00000)

Now we consider cases where (η1,η2)=(0,1) and (η1,η2)=(1,0)

# Compute λ_1, λ_2
λ_1, λ_2 = solve_λs(coefs)
print(f'Roots for the characteristic equation are ({λ_1:.5f}, {λ_2:.5f}))')
Roots for the characteristic equation are (2.41421, -0.41421))
# Case 1: η_1, η_2 = (0, 1)
ηs = (0, 1)

# Compute y_{t} and y_{t-1} with t >= 0
y = lambda t, ηs: (λ_1 ** t) * ηs[0] + (λ_2 ** t) * ηs[1]
sqrt_σ = 1 - y(1, ηs) / y(0, ηs)

print(f"For η_1, η_2 = (0, 1), sqrt_σ = {sqrt_σ:.5f}")
For η_1, η_2 = (0, 1), sqrt_σ = 1.41421
# Case 2: η_1, η_2 = (1, 0)
ηs = (1, 0)
sqrt_σ = y(1, ηs) / y(0, ηs) - 1

print(f"For η_1, η_2 = (1, 0), sqrt_σ = {sqrt_σ:.5f}")
For η_1, η_2 = (1, 0), sqrt_σ = 1.41421

We find that convergence is immediate.

Next, we’ll represent the preceding analysis by first vectorizing our second-order difference equation (18.12) and then using eigendecompositions of an associated state transition matrix.

18.6. Vectorizing the difference equation#

Represent (18.12) with the first-order matrix difference equation

[yt+1yt]=[2(1σ)10][ytyt1]

or

xt+1=Mxt

where

M=[2(1σ)10],xt=[ytyt1]

Construct an eigendecomposition of M:

(18.17)#M=V[λ100λ2]V1

where columns of V are eigenvectors corresponding to eigenvalues λ1 and λ2.

The eigenvalues can be ordered so that λ1>1>λ2.

Write equation (18.12) as

xt+1=VΛV1xt

Now we implement the algorithm above.

First we write a function that iterates M

def iterate_M(x_0, M, num_steps, dtype=np.float64):
    
    # Eigendecomposition of M
    Λ, V = np.linalg.eig(M)
    V_inv = np.linalg.inv(V)
    
    # Initialize the array to store results
    xs = np.zeros((x_0.shape[0], 
                   num_steps + 1))
    
    # Perform the iterations
    xs[:, 0] = x_0
    for t in range(num_steps):
        xs[:, t + 1] = M @ xs[:, t]
    
    return xs, Λ, V, V_inv

# Define the state transition matrix M
M = np.array([
      [2, -(1 - σ)],
      [1, 0]])

# Initial condition vector x_0
x_0 = np.array([2, 2])

# Perform the iteration
xs, Λ, V, V_inv = iterate_M(x_0, M, num_steps=100)

print(f"eigenvalues:\n{Λ}")
print(f"eigenvectors:\n{V}")
print(f"inverse eigenvectors:\n{V_inv}")
eigenvalues:
[ 2.41421356 -0.41421356]
eigenvectors:
[[ 0.92387953 -0.38268343]
 [ 0.38268343  0.92387953]]
inverse eigenvectors:
[[ 0.92387953  0.38268343]
 [-0.38268343  0.92387953]]

Let’s compare the eigenvalues to the roots (18.15) of equation (18.13) that we computed above.

roots = solve_λs((1, -2, (1 - σ)))
print(f"roots: {np.round(roots, 8)}")
roots: [ 2.41421356 -0.41421356]

Hence we confirmed (18.17).

Information about the square root we are after is also contained in the two eigenvectors.

Indeed, each eigenvector is just a two-dimensional subspace of R3 pinned down by dynamics of the form

(18.18)#yt=λiyt1,i=1,2

that we encountered above in equation (18.8) above.

In equation (18.18), the ith λi equals the Vi,1/Vi,2.

The following graph verifies this for our example.

Hide code cell source
# Plotting the eigenvectors
plt.figure(figsize=(8, 8))

plt.quiver(0, 0, V[0, 0], V[1, 0], angles='xy', scale_units='xy', 
           scale=1, color='C0', label=fr'$\lambda_1={np.round(Λ[0], 4)}$')
plt.quiver(0, 0, V[0, 1], V[1, 1], angles='xy', scale_units='xy', 
           scale=1, color='C1', label=fr'$\lambda_2={np.round(Λ[1], 4)}$')

# Annotating the slopes
plt.text(V[0, 0]-0.5, V[1, 0]*1.2, 
         r'slope=$\frac{V_{1,1}}{V_{1,2}}=$'+f'{np.round(V[0, 0] / V[1, 0], 4)}', 
         fontsize=12, color='C0')
plt.text(V[0, 1]-0.5, V[1, 1]*1.2, 
         r'slope=$\frac{V_{2,1}}{V_{2,2}}=$'+f'{np.round(V[0, 1] / V[1, 1], 4)}', 
         fontsize=12, color='C1')

# Adding labels
plt.axhline(0, color='grey', linewidth=0.5, alpha=0.4)
plt.axvline(0, color='grey', linewidth=0.5, alpha=0.4)
plt.legend()

plt.xlim(-1.5, 1.5)
plt.ylim(-1.5, 1.5)
plt.show()
_images/41de9d0e3a5d0365c34e10306c28f6d35fa3101995fe321cc36df6e008d3dcb2.png

18.7. Invariant subspace approach#

The preceding calculation indicates that we can use the eigenvectors V to construct 2-dimensional invariant subspaces.

We’ll pursue that possibility now.

Define the transformed variables

xt=V1xt

Evidently, we can recover xt from xt:

xt=Vxt

The following notations and equations will help us.

Let

V=[V1,1V1,2V2,1V2,2],V1=[V1,1V1,2V2,1V2,2]

Notice that it follows from

[V1,1V1,2V2,1V2,2][V1,1V1,2V2,1V2,2]=[1001]

that

V2,1V1,1+V2,2V2,1=0

and

V1,1V1,2+V1,2V2,2=0.

These equations will be very useful soon.

Notice that

[x1,t+1x2,t+1]=[λ100λ2][x1,tx2,t]

To deactivate λ1 we want to set

x1,0=0.

This can be achieved by setting

(18.19)#x2,0=(V1,2)1V1,1x1,0=V2,2V1,21x1,0.

To deactivate λ2, we want to set

x2,0=0

This can be achieved by setting

(18.20)#x2,0=(V2,2)1V2,1x1,0=V2,1V1,11x1,0.

Let’s verify (18.19) and (18.20) below

To deactivate λ1 we use (18.19)

xd_1 = np.array((x_0[0], 
                 V[1,1]/V[0,1] * x_0[0]),
                dtype=np.float64)

# Compute x_{1,0}^*
np.round(V_inv @ xd_1, 8)
array([-0.        , -5.22625186])

We find x1,0=0.

Now we deactivate λ2 using (18.20)

xd_2 = np.array((x_0[0], 
                 V[1,0]/V[0,0] * x_0[0]), 
                 dtype=np.float64)

# Compute x_{2,0}^*
np.round(V_inv @ xd_2, 8)
array([2.1647844, 0.       ])

We find x2,0=0.

# Simulate with muted λ1 λ2.
num_steps = 10
xs_λ1 = iterate_M(xd_1, M, num_steps)[0]
xs_λ2 = iterate_M(xd_2, M, num_steps)[0]

# Compute ratios y_t / y_{t-1}
ratios_λ1 = xs_λ1[1, 1:] / xs_λ1[1, :-1]
ratios_λ2 = xs_λ2[1, 1:] / xs_λ2[1, :-1] 

The following graph shows the ratios yt/yt1 for the two cases.

We find that the ratios converge to λ2 in the first case and λ1 in the second case.

Hide code cell source
# Plot the ratios for y_t / y_{t-1}
fig, axs = plt.subplots(1, 2, figsize=(12, 6), dpi=500)

# First subplot
axs[0].plot(np.round(ratios_λ1, 6), 
            label=r'$\frac{y_t}{y_{t-1}}$', linewidth=3)
axs[0].axhline(y=Λ[1], color='red', linestyle='--', 
               label=r'$\lambda_2$', alpha=0.5)
axs[0].set_xlabel('t', size=18)
axs[0].set_ylabel(r'$\frac{y_t}{y_{t-1}}$', size=18)
axs[0].set_title(r'$\frac{y_t}{y_{t-1}}$ after Muting $\lambda_1$', 
                 size=13)
axs[0].legend()

# Second subplot
axs[1].plot(ratios_λ2, label=r'$\frac{y_t}{y_{t-1}}$', 
            linewidth=3)
axs[1].axhline(y=Λ[0], color='green', linestyle='--', 
               label=r'$\lambda_1$', alpha=0.5)
axs[1].set_xlabel('t', size=18)
axs[1].set_ylabel(r'$\frac{y_t}{y_{t-1}}$', size=18)
axs[1].set_title(r'$\frac{y_t}{y_{t-1}}$ after Muting $\lambda_2$', 
                 size=13)
axs[1].legend()

plt.tight_layout()
plt.show()
_images/8b7dfe1cbf7118e2a3948a1757768a070853d113c9db39ea82ea0d4e36c23087.png

18.8. Concluding remarks#

This lecture sets the stage for many other applications of the invariant subspace methods.

All of these exploit very similar equations based on eigen decompositions.

We shall encounter equations very similar to (18.19) and (18.20) in Money Financed Government Deficits and Price Levels and in many other places in dynamic economic theory.

18.9. Exercise#

Exercise 18.1

Please use matrix algebra to formulate the method described by Bertrand Russell at the beginning of this lecture.

  1. Define a state vector xt=[atbt].

  2. Formulate a first-order vector difference equation for xt of the form xt+1=Axt and compute the matrix A.

  3. Use the system xt+1=Axt to replicate the sequence of at’s and bt’s described by Bertrand Russell.

  4. Compute the eigenvectors and eigenvalues of A and compare them to corresponding objects computed in the text of this lecture.