In this tutorial we will see how to speed up Monte-Carlo Simulation with GPU and Cloud Computing in Python using PyTorch and Google Cloud Platform. Motivated from my experience developing a RNN for anomaly detection in PyTorch I wanted to port the option pricing code from my previous posts from TensorFlow to PyTorch. PyTorch offers similar to TensorFlow auto-gradients, also known as algorithmic differentiation, but the programming style is quite different to TensorFlow. We will see how easy it is to run our code on a GPU with PyTorch. The calculation of a NPV and all first order Greeks of a simple Down-and-Out Option is 45 times faster (7.3 s vs. 160ms) on a Nvidia K80 compared to a calculation on the CPU of my MacBook Pro 15 from 2016 with a 2.6 GHz i7 and 16 GB Ram.
- Update 1 The purpose of this example is to illustrate how to use Algorithmic Differentiation and GPU Computing with PyTorch in Python. There are more appropriate pricing models and methods for Barrier Options. This is a very naive approach in Black Scholes setting without taking any volatility smile into account.
The notebooks are available on GitHub https://github.com/mgroncki/IPythonScripts/PricingPyTorch
First we will look into analytical pricing of plain vanilla options and compare a pure Numpy versus a PyTorch and the previous TensorFlow implementation before we implement the Monte-Carlo Simulation in PyTorch.
Analytical Plain Vanilla Pricing
Lets start with plain vanilla call option in a Black Scholes World.
Maturtiy: 1 year
Spot : 100
Strike : 101
Volatility: 30.0 %
Risk free rate: 1.0 %
Numpy Implementation
We are using the same implementation as in the TensorFlow notebook
## Plain Vanilla Call in TensorFlow def blackScholes_py(S_0, strike, time_to_expiry, implied_vol, riskfree_rate): S = S_0 K = strike dt = time_to_expiry sigma = implied_vol r = riskfree_rate Phi = stats.norm.cdf d_1 = (np.log(S_0 / K) + (r+sigma**2/2)*dt) / (sigma*np.sqrt(dt)) d_2 = d_1 - sigma*np.sqrt(dt) return S*Phi(d_1) - K*np.exp(-r*dt)*Phi(d_2)
And as expected its super fast. No surprises here. Next we will implement the same formula in PyTorch.
PyTorch Implementation
There are only minimal code changes compared to the numpy version required. In the actual pricing function we just need to replace np
with torch
and exchange the cdf function to use the PyTorch one and we have to convert our input into torch.tensor
.
def blackScholes_pyTorch(S_0, strike, time_to_expiry, implied_vol, riskfree_rate): S = S_0 K = strike dt = time_to_expiry sigma = implied_vol r = riskfree_rate Phi = torch.distributions.Normal(0,1).cdf d_1 = (torch.log(S_0 / K) + (r+sigma**2/2)*dt) / (sigma*torch.sqrt(dt)) d_2 = d_1 - sigma*torch.sqrt(dt) return S*Phi(d_1) - K*torch.exp(-r*dt)*Phi(d_2)
We see now significant speed difference between Numpy or PyTorch.
Seems the PyTorch version is even faster as the pure numpy version. How can we use the auto-grad function in PyTorch to get the greeks?
Greeks in PyTorch
We just need to call the .backward()
operator of the tensor which stores the prices and we can access the greeks with the .grad
properity.
%%timeit S_0 = torch.tensor([100.],requires_grad=True) K = torch.tensor([101.],requires_grad=True) T = torch.tensor([1.],requires_grad=True) sigma = torch.tensor([0.3],requires_grad=True) r = torch.tensor([0.01],requires_grad=True) npv_pytorch = blackScholes_pyTorch(S_0, K, T, sigma, r) npv_pytorch.backward()
639 µs ± 9.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Its almost 2.5-3 times slower but it gives us five greeks. A naive finite-difference approximation would costs us at least 6 calculations and would be only an numerical approximation. Here we have ‘exact’ derivates.
Second order greeks in Pytorch
The 2nd order greeks are a bit tricky and not so straight-forward. We need
to create a computational graph of the gradient. We use the function .grad()
from the autograd module. And can then call the backward method on it.
gradient = torch.autograd.grad(npv_pytorch, S_0, create_graph=True) delta, = gradient delta.backward(retain_graph=True) print('Delta: ', delta) print('Gamma', S_0.grad)
Now let’s recap the results from the TensorFlow implementation.
TensorFlow implementation
Using the same code as in the original notebook (but I removed the calculation of the 2nd order greeks. There is a bit of overhead for constructing the computational graph.
def blackScholes_tf_pricer(): # Build the static computational graph S = tf.placeholder(tf.float32) K = tf.placeholder(tf.float32) dt = tf.placeholder(tf.float32) sigma = tf.placeholder(tf.float32) r = tf.placeholder(tf.float32) Phi = tf.distributions.Normal(0.,1.).cdf d_1 = (tf.log(S / K) + (r+sigma**2/2)*dt) / (sigma*tf.sqrt(dt)) d_2 = d_1 - sigma*tf.sqrt(dt) npv = S*Phi(d_1) - K*tf.exp(-r*dt)*Phi(d_2) greeks = tf.gradients(npv, [S, sigma, r, K, dt]) def execute_graph(S_0, strike, time_to_expiry, implied_vol, riskfree_rate): with tf.Session() as sess: res = sess.run([npv, greeks], { S: S_0, K : strike, r : riskfree_rate, sigma: implied_vol, dt: time_to_expiry}) return res return execute_graph
We are roughly factor 1000 times slower in TensorFlow. Maybe my implementation is just bad. Any feedback how to improve it would be very appreciated.
First Observation
So if anyone has already some pricing routines in Python using Numpy it seems to be much easier to port them to PyTorch instead to TensorFlow. And porting the code wouldn’t have any significant negative impacts on the speed.
Now we want to come to the core of this tutorial.
Monte Carlo Pricing for Single Barrier Option
Our example option is a down-and-out barrier with
Maturtiy: 2 year
Spot : 100
Strike : 110
Volatility: 20.0 %
Risk free rate: 3.0 %
Barrier at 90
It’s the same option as in my previous post and we gonna use the same Numpy implementation
def monte_carlo_down_out_py(S_0, strike, time_to_expiry, implied_vol, riskfree_rate, barrier, steps, samples): stdnorm_random_variates = np.random.randn(samples, steps) S = S_0 K = strike dt = time_to_expiry / stdnorm_random_variates.shape[1] sigma = implied_vol r = riskfree_rate B = barrier # See Advanced Monte Carlo methods for barrier and related exotic options by Emmanuel Gobet B_shift = B*np.exp(0.5826*sigma*np.sqrt(dt)) S_T = S * np.cumprod(np.exp((r-sigma**2/2)*dt+sigma*np.sqrt(dt)*stdnorm_random_variates), axis=1) non_touch = (np.min(S_T, axis=1) > B_shift)*1 call_payout = np.maximum(S_T[:,-1] - K, 0) npv = np.mean(non_touch * call_payout) return np.exp(-time_to_expiry*r)*npv
For a simulation with 100.000 paths and 1.000 time steps we need approximately 4.8 sec.
And again the PyTorch implementation looks very familiar. We have to replace the random number generator and replace the numpy functions with torch functions. There are two cumberstones, there is no equivalent to np.maximum
, so we have to mask the negative payoff ourself and set it to zero and we have to convert the boolean tensor into a float tensor.
def monte_carlo_down_out_torch(S_0, strike, time_to_expiry, implied_vol, riskfree_rate, barrier, steps, samples): stdnorm_random_variates = torch.distributions.Normal(0,1).sample((samples, steps)) S = S_0 K = strike dt = time_to_expiry / stdnorm_random_variates.shape[1] sigma = implied_vol r = riskfree_rate B = barrier # See Advanced Monte Carlo methods for barrier and related exotic options by Emmanuel Gobet B_shift = B*torch.exp(0.5826*sigma*torch.sqrt(dt)) S_T = S * torch.cumprod(torch.exp((r-sigma**2/2)*dt+sigma*torch.sqrt(dt)*stdnorm_random_variates), dim=1) non_touch = torch.min(S_T, dim=1)[0] > B_shift call_payout = S_T[:,-1] - K call_payout[call_payout<0]=0 npv = torch.mean(non_touch.type(torch.FloatTensor) * call_payout) return torch.exp(-time_to_expiry*r)*npv
The Monte-Carlo Simulation including calculating the pathwise greeks take the same time as the pure NPV Monte-Carlo Simulation in Numpy.
Second Observation
Running Monte-Carlo Simulations in PyTorch on a CPU seems to be the same speed as Numpy implementation (double duration but calculate also greeks in the same time).
Now we want to make things really fast and run it on a GPU.
Monte-Carlo Simulation of Barrier Options on GPUs
GCP Setup of GPU computing
Since I have no Nvidia GPU in my laptop we will run our experiment on Google Cloud Platform. You can easily signup for GCP and you can receive 300 USD free credit for 12 months. To use GPU computing you need to check in which zones GPUs are available. First time users need to request the GPU usage first, the approval takes usually less than 1 day. You can follow the tutorial here:
View at Medium.com
Just replace the step 8 with the AISE PyTorch NVidia GPU Notebook.
The costs for instance with a Nvidia K80 is less than 2 USD per hour, and our example notebook will run only a few minutes.
Once you the virtual machine is started you can connect directly to the Jupyter notebook server and work in your browser as it would run on our local machine.
In my GitHub repository there is a dedicated notebook BarrierOptionGPU.ipynb
which we can upload to the server.
Cuda Implementation of our Monte-Carlo Simulation
Again we need only minimal changes to our code. We need to change the random number generator to run on the CUDA device torch.cuda.FloatTensor(steps, samples).normal_()
and after casting our boolean tensor we need to move it to the GPU non_touch = non_touch.type(torch.cuda.FloatTensor)
and we have to copy all our inputs to the GPU as well torch.tensor([...],requires_grad=True, device='cuda')
.
def monte_carlo_down_out_torch_cuda(S_0, strike, time_to_expiry, implied_vol, riskfree_rate, barrier, steps, samples): stdnorm_random_variates = torch.cuda.FloatTensor(steps, samples).normal_() S = S_0 K = strike dt = time_to_expiry / stdnorm_random_variates.shape[1] sigma = implied_vol r = riskfree_rate B = barrier # See Advanced Monte Carlo methods for barrier and related exotic options by Emmanuel Gobet B_shift = B*torch.exp(0.5826*sigma*torch.sqrt(dt)) S_T = S * torch.cumprod(torch.exp((r-sigma**2/2)*dt+sigma*torch.sqrt(dt)*stdnorm_random_variates), dim=1) non_touch = torch.min(S_T, dim=1)[0] > B_shift non_touch = non_touch.type(torch.cuda.FloatTensor) call_payout = S_T[:,-1] - K call_payout[call_payout<0]=0 npv = torch.mean(non_touch * call_payout) return torch.exp(-time_to_expiry*r)*npv
The pricing and greek calculation is 45 times faster than on CPUs.
Conclusion
Writing and coding this tutorial was great fun for me. PyTorch feels for me much easier and cleaner to use for writing pricing algorithm compared to TensorFlow, which maybe will change with TensorFlow 2.0 which is a major redesign. With PyTorch it’s very easy to implement Monte-Carlo Simulations with Adjoint Greeks and running the code on GPUs is seamless even without experience in GPU code in C++. I wonder how fast a C++/CUDA implementation would be? Maybe something for future post.
So long…