In this tutorial we will see how to speed up Monte-Carlo Simulation with GPU and Cloud Computing in Python using PyTorch and Google Cloud Platform. Motivated from my experience developing a RNN for anomaly detection in PyTorch I wanted to port the option pricing code from my previous posts from TensorFlow to PyTorch. PyTorch offers similar to TensorFlow auto-gradients, also known as algorithmic differentiation, but the programming style is quite different to TensorFlow. We will see how easy it is to run our code on a GPU with PyTorch. The calculation of a NPV and all first order Greeks of a simple Down-and-Out Option is 45 times faster (7.3 s vs. 160ms) on a Nvidia K80 compared to a calculation on the CPU of my MacBook Pro 15 from 2016 with a 2.6 GHz i7 and 16 GB Ram.

**Update 1***The purpose of this example is to illustrate how to use Algorithmic Differentiation and GPU Computing with PyTorch in Python. There are more appropriate pricing models and methods for Barrier Options. This is a very naive approach in Black Scholes setting without taking any volatility smile into account.*

The notebooks are available on GitHub https://github.com/mgroncki/IPythonScripts/PricingPyTorch

First we will look into analytical pricing of plain vanilla options and compare a pure Numpy versus a PyTorch and the previous TensorFlow implementation before we implement the Monte-Carlo Simulation in PyTorch.

## Analytical Plain Vanilla Pricing

Lets start with plain vanilla call option in a Black Scholes World.

Maturtiy: 1 year

Spot : 100

Strike : 101

Volatility: 30.0 %

Risk free rate: 1.0 %

### Numpy Implementation

We are using the same implementation as in the TensorFlow notebook

## Plain Vanilla Call in TensorFlow def blackScholes_py(S_0, strike, time_to_expiry, implied_vol, riskfree_rate): S = S_0 K = strike dt = time_to_expiry sigma = implied_vol r = riskfree_rate Phi = stats.norm.cdf d_1 = (np.log(S_0 / K) + (r+sigma**2/2)*dt) / (sigma*np.sqrt(dt)) d_2 = d_1 - sigma*np.sqrt(dt) return S*Phi(d_1) - K*np.exp(-r*dt)*Phi(d_2)

And as expected its super fast. No surprises here. Next we will implement the same formula in PyTorch.

### PyTorch Implementation

There are only minimal code changes compared to the numpy version required. In the actual pricing function we just need to replace `np`

with `torch`

and exchange the cdf function to use the PyTorch one and we have to convert our input into `torch.tensor`

.

def blackScholes_pyTorch(S_0, strike, time_to_expiry, implied_vol, riskfree_rate): S = S_0 K = strike dt = time_to_expiry sigma = implied_vol r = riskfree_rate Phi = torch.distributions.Normal(0,1).cdf d_1 = (torch.log(S_0 / K) + (r+sigma**2/2)*dt) / (sigma*torch.sqrt(dt)) d_2 = d_1 - sigma*torch.sqrt(dt) return S*Phi(d_1) - K*torch.exp(-r*dt)*Phi(d_2)

We see now significant speed difference between Numpy or PyTorch.

Seems the PyTorch version is even faster as the pure numpy version. How can we use the auto-grad function in PyTorch to get the greeks?

#### Greeks in PyTorch

We just need to call the `.backward()`

operator of the tensor which stores the prices and we can access the greeks with the `.grad`

properity.

%%timeit S_0 = torch.tensor([100.],requires_grad=True) K = torch.tensor([101.],requires_grad=True) T = torch.tensor([1.],requires_grad=True) sigma = torch.tensor([0.3],requires_grad=True) r = torch.tensor([0.01],requires_grad=True) npv_pytorch = blackScholes_pyTorch(S_0, K, T, sigma, r) npv_pytorch.backward()

639 µs ± 9.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Its almost 2.5-3 times slower but it gives us five greeks. A naive finite-difference approximation would costs us at least 6 calculations and would be only an numerical approximation. Here we have ‘exact’ derivates.

#### Second order greeks in Pytorch

The 2nd order greeks are a bit tricky and not so straight-forward. We need

to create a computational graph of the gradient. We use the function `.grad()`

from the autograd module. And can then call the backward method on it.

gradient = torch.autograd.grad(npv_pytorch, S_0, create_graph=True) delta, = gradient delta.backward(retain_graph=True) print('Delta: ', delta) print('Gamma', S_0.grad)

Now let’s recap the results from the TensorFlow implementation.

### TensorFlow implementation

Using the same code as in the original notebook (but I removed the calculation of the 2nd order greeks. There is a bit of overhead for constructing the computational graph.

def blackScholes_tf_pricer(): # Build the static computational graph S = tf.placeholder(tf.float32) K = tf.placeholder(tf.float32) dt = tf.placeholder(tf.float32) sigma = tf.placeholder(tf.float32) r = tf.placeholder(tf.float32) Phi = tf.distributions.Normal(0.,1.).cdf d_1 = (tf.log(S / K) + (r+sigma**2/2)*dt) / (sigma*tf.sqrt(dt)) d_2 = d_1 - sigma*tf.sqrt(dt) npv = S*Phi(d_1) - K*tf.exp(-r*dt)*Phi(d_2) greeks = tf.gradients(npv, [S, sigma, r, K, dt]) def execute_graph(S_0, strike, time_to_expiry, implied_vol, riskfree_rate): with tf.Session() as sess: res = sess.run([npv, greeks], { S: S_0, K : strike, r : riskfree_rate, sigma: implied_vol, dt: time_to_expiry}) return res return execute_graph

We are roughly factor 1000 times slower in TensorFlow. Maybe my implementation is just bad. Any feedback how to improve it would be very appreciated.

**First Observation**

So if anyone has already some pricing routines in Python using Numpy it seems to be much easier to port them to PyTorch instead to TensorFlow. And porting the code wouldn’t have any significant negative impacts on the speed.

Now we want to come to the core of this tutorial.

## Monte Carlo Pricing for Single Barrier Option

Our example option is a down-and-out barrier with

Maturtiy: 2 year

Spot : 100

Strike : 110

Volatility: 20.0 %

Risk free rate: 3.0 %

Barrier at 90

It’s the same option as in my previous post and we gonna use the same Numpy implementation

def monte_carlo_down_out_py(S_0, strike, time_to_expiry, implied_vol, riskfree_rate, barrier, steps, samples): stdnorm_random_variates = np.random.randn(samples, steps) S = S_0 K = strike dt = time_to_expiry / stdnorm_random_variates.shape[1] sigma = implied_vol r = riskfree_rate B = barrier # See Advanced Monte Carlo methods for barrier and related exotic options by Emmanuel Gobet B_shift = B*np.exp(0.5826*sigma*np.sqrt(dt)) S_T = S * np.cumprod(np.exp((r-sigma**2/2)*dt+sigma*np.sqrt(dt)*stdnorm_random_variates), axis=1) non_touch = (np.min(S_T, axis=1) > B_shift)*1 call_payout = np.maximum(S_T[:,-1] - K, 0) npv = np.mean(non_touch * call_payout) return np.exp(-time_to_expiry*r)*npv

For a simulation with 100.000 paths and 1.000 time steps we need approximately 4.8 sec.

And again the PyTorch implementation looks very familiar. We have to replace the random number generator and replace the numpy functions with torch functions. There are two cumberstones, there is no equivalent to `np.maximum`

, so we have to mask the negative payoff ourself and set it to zero and we have to convert the boolean tensor into a float tensor.

def monte_carlo_down_out_torch(S_0, strike, time_to_expiry, implied_vol, riskfree_rate, barrier, steps, samples): stdnorm_random_variates = torch.distributions.Normal(0,1).sample((samples, steps)) S = S_0 K = strike dt = time_to_expiry / stdnorm_random_variates.shape[1] sigma = implied_vol r = riskfree_rate B = barrier # See Advanced Monte Carlo methods for barrier and related exotic options by Emmanuel Gobet B_shift = B*torch.exp(0.5826*sigma*torch.sqrt(dt)) S_T = S * torch.cumprod(torch.exp((r-sigma**2/2)*dt+sigma*torch.sqrt(dt)*stdnorm_random_variates), dim=1) non_touch = torch.min(S_T, dim=1)[0] > B_shift call_payout = S_T[:,-1] - K call_payout[call_payout<0]=0 npv = torch.mean(non_touch.type(torch.FloatTensor) * call_payout) return torch.exp(-time_to_expiry*r)*npv

The Monte-Carlo Simulation including calculating the pathwise greeks take the same time as the pure NPV Monte-Carlo Simulation in Numpy.

**Second Observation**

Running Monte-Carlo Simulations in PyTorch on a CPU seems to be the same speed as Numpy implementation (double duration but calculate also greeks in the same time).

Now we want to make things really fast and run it on a GPU.

### Monte-Carlo Simulation of Barrier Options on GPUs

#### GCP Setup of GPU computing

Since I have no Nvidia GPU in my laptop we will run our experiment on Google Cloud Platform. You can easily signup for GCP and you can receive 300 USD free credit for 12 months. To use GPU computing you need to check in which zones GPUs are available. First time users need to request the GPU usage first, the approval takes usually less than 1 day. You can follow the tutorial here:

View story at Medium.com

Just replace the step 8 with the AISE PyTorch NVidia GPU Notebook.

The costs for instance with a Nvidia K80 is less than 2 USD per hour, and our example notebook will run only a few minutes.

Once you the virtual machine is started you can connect directly to the Jupyter notebook server and work in your browser as it would run on our local machine.

In my GitHub repository there is a dedicated notebook `BarrierOptionGPU.ipynb`

which we can upload to the server.

#### Cuda Implementation of our Monte-Carlo Simulation

Again we need only minimal changes to our code. We need to change the random number generator to run on the CUDA device `torch.cuda.FloatTensor(steps, samples).normal_()`

and after casting our boolean tensor we need to move it to the GPU `non_touch = non_touch.type(torch.cuda.FloatTensor)`

and we have to copy all our inputs to the GPU as well `torch.tensor([...],requires_grad=True, device='cuda')`

.

def monte_carlo_down_out_torch_cuda(S_0, strike, time_to_expiry, implied_vol, riskfree_rate, barrier, steps, samples): stdnorm_random_variates = torch.cuda.FloatTensor(steps, samples).normal_() S = S_0 K = strike dt = time_to_expiry / stdnorm_random_variates.shape[1] sigma = implied_vol r = riskfree_rate B = barrier # See Advanced Monte Carlo methods for barrier and related exotic options by Emmanuel Gobet B_shift = B*torch.exp(0.5826*sigma*torch.sqrt(dt)) S_T = S * torch.cumprod(torch.exp((r-sigma**2/2)*dt+sigma*torch.sqrt(dt)*stdnorm_random_variates), dim=1) non_touch = torch.min(S_T, dim=1)[0] > B_shift non_touch = non_touch.type(torch.cuda.FloatTensor) call_payout = S_T[:,-1] - K call_payout[call_payout<0]=0 npv = torch.mean(non_touch * call_payout) return torch.exp(-time_to_expiry*r)*npv

The pricing and greek calculation is 45 times faster than on CPUs.

## Conclusion

Writing and coding this tutorial was great fun for me. PyTorch feels for me much easier and cleaner to use for writing pricing algorithm compared to TensorFlow, which maybe will change with TensorFlow 2.0 which is a major redesign. With PyTorch it’s very easy to implement Monte-Carlo Simulations with Adjoint Greeks and running the code on GPUs is seamless even without experience in GPU code in C++. I wonder how fast a C++/CUDA implementation would be? Maybe something for future post.

So long…