Pricing Bermudan Options in TensorFlow – Learning an optimal Early Exercise Strategy

In the previous post we used TensorFlow to price some exotic options like Asian and Barrier Options and used the automatic differentiation feature to calculate the greeks of the options.

Today we will see how to price a Bermudan option in TensorFlow with the Longstaff-Schwartz (a.k.a American Monte Carlo) algorithm. For simplicity we assume Bermudan equity call option without dividend payments in a Black-Scholes-Merton world.

The complete Jupyter notebook can be found on GitHub.

Let start with a brief recap of Bermudan options and the Longstaff-Schwartz pricing method. For detailed and more formalized description have a look into my previous posts about Bermudan Swaptions and the American Monte Carlo Simulation for CVA calculations or have a look into the book ‘Monte Carlo Methods in Financial-Engineering’ from Paul Glasserman.

A Bermudan option give the holder the right to exercise option not only at the maturity of the option but also earlier on pre specified dates. At each exercise date the holder of the option has to decide if it’s better to exercise the option now (exercise value) or to wait for a later exercise date (continuation value). He has to find a optimal exercise strategy (early exercise boundary of the option). He has to decide if the exercise value is higher than the continuation value of the option. There are some trivial cases:

1) If the option is not in the money its always better to wait and
2) if the option hasn’t been exercised before the last exercise date the Bermudan become an European option.

Therefore the option price have to be higher than the price of an European (because its included).

One approach to price the option is to use Monte-Carlo simulations, but the problem is calculation of the continuation value. On each simulated path we can easily calculate the exercise value of the option at each exercise date given the state of the path, but the continuation value is a conditional expectation given the current underlying price of the path. The naive approach is to calculate this expectation with a Monte-Carlo Simulation again but then we need to run a simulation for each exercise date on each path. These kind of nested simulations can become very slow.

Longstaff-Schwartz method

One solution is the Longstaff-Schwartz method, the basic idea is to approximate the continuation value through a linear regression model.

The pricing consists of two phases:

In the learning phase we go backward through the time starting with the last exercise date (it is trivial to price), then we go to the previous exercise date and we fit a linear model to predict the continuation values (the option value at time t+1 which we just calculated) given the state of the paths (usually a system of polynomials of the state). Then we use the prediction to decide if we exercise or not and update the value of the option at time t (either exercise or continuation value). And we continue these steps that until we reach the first exercise date.

After we learned the models to predict the continuation values we use these linear models to predict the continuation values in the pricing phase. We generate a new set of random paths and go forward through the time and at each exercise date we decide if the exercise depending on the exercise value and the predicted continuation value.

There are two ways to fit the linear model. One could solve the normal equation
or use gradient descent. I decided to go for the gradient descent, it give us the opportunity to exchange the linear model with more sophisticated models like a deep network.

I use three helper functions, the first function get_continuation_function creates the Tensorflow operators needed for training a linear model at an exercise date and a second function feature_matrix_from_state creates a feature matrix for the model from the paths values at a given time step. I use Chebyshev polynomials up to the degree 4 as features, as we can see below the fit could be better. Feel free to play with it and adjust the code.

The third helper function pricing_function create the computational graph for the pricing and it generates for each call date the needed linear model and the training operator with the helper functions and store it in a list of training_functions.

previous_exersies = 0
    npv = 0
    for i in range(number_call_dates-1):
        (input_x, input_y, train, w, b, y_hat) = get_continuation_function()
        training_functions.append((input_x, input_y, train, w, b, y_hat))
        X = feature_matrix_from_current_state(S_t[:, i])
        contValue = tf.add(tf.matmul(X, w),b)
        inMoney = tf.cast(tf.greater(E_t[:,i], 0.), tf.float32)
        exercise = tf.cast(tf.greater(E_t[:,i], contValue[:,0]), tf.float32) * inMoney 
        exercise = exercise * (1-previous_exersies)
        previous_exersies += exercise
        npv += exercise*E_t[:,i]
    # Last exercise date
    inMoney = tf.cast(tf.greater(E_t[:,-1], 0.), tf.float32)
    exercise =  inMoney * (1-previous_exersies)
    npv += exercise*E_t[:,-1]
    npv = tf.reduce_mean(npv)
    greeks = tf.gradients(npv, [S, r, sigma])

The npv operator is sum of the optimal exercise decisions. At each time we exercise if the exercise value is greater than the predicted continuation value and the option is in the money. We store the information about previous exercise decision since we can only exercise the option once.

In the actual pricing functionbermudanMC_tensorFlow we execute the graph to create the paths, the exercise values for the training path then will iterate backward through the training_functions and learn for each exercise date the linear model.

# Backward iteration to learn the continuation value approximation for each call date
        for i in range(n_excerises-1)[::-1]:
            (input_x, input_y, train, w, b, y_hat) = training_functions[i]
            y = exercise_values[:, i+1:i+2]
            X = paths[:, i]
            X = np.c_[X**1, 2*X**2-1, 4*X**3 - 3 * X]
            X = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
            for epoch in range(80):
                _ =, {input_x:X[exercise_values[:,i]>0], 
            cont_value =, {input_x:X, 
            exercise_values[:, i:i+1] = np.maximum(exercise_values[:, i:i+1], cont_value)
            plt.scatter(paths[:,i], y)
            plt.scatter(paths[:,i], cont_value, color='red')
            plt.title('Continuation Value approx')
            plt.ylabel('NPV t_%i'%i)

We take the prices of the underlying at time i and calculate the Chebyshey polynomials and store it in the predictor matrix X. The option value at the previous time is our target value y. We normalise our features and train our linear model with a stochastic gradient descent over 80 epochs. We exclude all samples where the option is not in the money (no decision to make). After we got our prediction for the continuation value we update the value of option at time i to be the maximum of the exercise value and the predicted continuation value.

After we learned the weights for our model we can execute the npv and greek operator to calculate the npv and the derivatives of the npv.

Pricing Example

Assume the spot is at 100 the risk free interest rate is at 3% and the implied Black vol is 20%. Lets price a Bermudan Call with strike level at 120 with yearly exercise dates and maturity in 4 years.

The fit off our continuation value approximation on the training set.

The runtime is between 7-10 second for a Bermudan option with 4 exercise dates. We could speed it up, if we would use the normal equations instead of a gradient descent method.

So thats it for today. As usual is the complete source code as a notebook on GitHub for download. I think with TensorFlow its very easy to try other approximations methods, since we can make use of TensorFlows deep learning abilities. So please feel free to play with the source code and experiment with the feature matrix (other polynomials, or higher degrees) or try another model (maybe a deep network) to get a better fit of the continuation values. I will play a bit with it and come back to this in a later post and present some results of alternative approximation approaches.

So long.

2 thoughts on “Pricing Bermudan Options in TensorFlow – Learning an optimal Early Exercise Strategy

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s