In the last two posts we priced exotic derivates with TensorFlow in Python. We implemented Monte-Carlo-Simulations to price Asian Options, Barrier Options and Bermudan Options. In this post we use deep learning to learn a optimal hedging strategy for Call Options from market prices of the underlying asset. This approach is purely data-driven and ‘model free’ as it doesnt make any assumptions of the underlying stochastic process.

We follow and adopt the ideas presented in the paper ‘Deep Hedging’ (https://arxiv.org/abs/1802.03042) by Hans Bühler, Lukas Gonon, Josef Teichmann, Ben Wood. The model can easily extended to incorporate transaction costs and trading limits.

In this part we will train a four layer Long-Short-Term-Memory (LSTM) Recurrent neural network (RNN) to learn a optimal hedging strategy given the individual risk aversion of the trader (we will minimize the Conditional Value at Risk also known as the Expected Shortfall of the hedging strategy) and derive an lower bound for a price which the risk-averse trader should charge if the trader follows the optimal hedging strategy.

For simplicty we use synthetic market prices of the underlying gernerated by a Black Scholes Process with a drift of zero (risk free interest is zero) and a volatility of 20%. We will train the network on one option on these simulated market values which matures in one month and has a moneyness of one ().

This can easily adopted to more spohisticated models (see the reference paper for a Heston Model example) or even use real market data.

We will compare the effectivness of the hedging strategy and compare it to the delta hedging strategy using the delta from the Black Scholes model. We will evaluate the influence of the risk aversion of the trader on the hedging strategy and we will see how well the network can generalize the hedging strategy if we change the moneyness of the option, the drift (should have no effect in theory) and the volatility of the underlying process.

**Start Spoiler** *This simple network is very good in pricing this particular option even on not observed market paths, but it will fail to generalize the strategy for options with different levels of moneyness or volatilities (but the Black Scholes strategy fails as well if we use a wrong volatility for pricing).* **End Spoiler**

In the comming parts we will try to improve the network by changing the network architecture (e.g. simple MLP or Convolutional Networks) or using other and more features and try some approach to generate training data from market data or try transfer learning methods (since we dont have millions of training paths of one underlying) and include transaction costs.

## Brief recap: Delta / Delta Hedge

The delta of an option is the sensitivity of the option price to a change of underlying stock’s price.

The idea of the delta hedging if to immunise a portfolio of option against changes in the market price of the underlying. If you have a long position of delta units of the underlying stock and you are one option short then your portfolio is not sensitive to changes of the market price (but its only a local approximation, if the price change the delta will change (2nd order greeks) and you need to adjust your position).

Black, Scholes and Merton used a (continuous time) self-financing delta hedge strategy to derive their famous pricing formula.

## Setting

We are in Black Scholes setting. Current stock price is 100 and the volatility is 20%. The risk free interest rates is 0. We have a Call option with maturity in one month at a strike of 100. We adjust our hedge portfolio daily. Our training set will consists of 500,000 samples.

Our portfolio will consists of one short position of a call and units of the underlying stock.

The PnL of our hedging / trading strategy is:

with $$\delta_{n}=\delta_{-1}=0.$ At time i we sell our previous position $\delta_{i-1}$ (cash inflow) and buy $\delta_{I}$ stocks (cash outflow). At the maturity $n$ we liquidate our position in stocks. The sum of all these transactions is our profit or loss.

The final value of our portfolio is given by the difference of the option payout and the PnL of our hedging strategy

Under particular assumptions there exists a unique trading strategy and one fair price of the option that almost surely $\Pi + p_0 = 0$ (almost surely). One of the assumptions is continuous time trading.

But if we not hedge in continuous time, the quantity hold only in average.

In this example we sampled 10,000 paths from the Black Scholes process and applied the delta hedge strategy with different trading intervals (from once to twice a day).

We follow the idea of the paper Deep Hedging and try to find a hedging strategy which minimize the CVaR given the risk aversion $\alpha$

We will develop two trading strategies (alpha=0.5 and 0.99) and test them versus the black and Scholes delta hedge strategy.

For our test set we generate 100,000 paths from the same underlying process (not in the training set). We will test the hedging strategy for 3 different options (strike K=100, 95 and 105). Additionally we test the hedging strategies on market paths from a process with a shifted drift and from a process with shifted volatility.

## Implementation

The code is as usual in my GitHub repository. Since the network needs approximately two and half hours for training, I also uploaded the pre-trained model. So one can skip the training step and directly restore the models.

We have a lot of helper function to generate the paths, calculate the Black Scholes prices and deltas and evaluate and compare the trading strategies, but the core class is our model.

<br />class RnnModel(object): def __init__(self, time_steps, batch_size, features, nodes = [62,46,46,1], name='model'): tf.reset_default_graph() self.batch_size = batch_size self.S_t_input = tf.placeholder(tf.float32, [time_steps, batch_size, features]) self.K = tf.placeholder(tf.float32, batch_size) self.alpha = tf.placeholder(tf.float32) S_T = self.S_t_input[-1,:,0] dS = self.S_t_input[1:, :, 0] - self.S_t_input[0:-1, :, 0] #dS = tf.reshape(dS, (time_steps, batch_size)) #Prepare S_t for the use in the RNN remove the last time step (at T the portfolio is zero) S_t = tf.unstack(self.S_t_input[:-1, :,:], axis=0) # Build the lstm lstm = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.LSTMCell(n) for n in nodes]) self.strategy, state = tf.nn.static_rnn(lstm, S_t, initial_state=lstm.zero_state(batch_size, tf.float32), dtype=tf.float32) self.strategy = tf.reshape(self.strategy, (time_steps-1, batch_size)) self.option = tf.maximum(S_T-self.K, 0) self.Hedging_PnL = - self.option + tf.reduce_sum(dS*self.strategy, axis=0) self.Hedging_PnL_Paths = - self.option + dS*self.strategy # Calculate the CVaR for a given confidence level alpha # Take the 1-alpha largest losses (top 1-alpha negative PnLs) and calculate the mean CVaR, idx = tf.nn.top_k(-self.Hedging_PnL, tf.cast((1-self.alpha)*batch_size, tf.int32)) CVaR = tf.reduce_mean(CVaR) self.train = tf.train.AdamOptimizer().minimize(CVaR) self.saver = tf.train.Saver() self.modelname = name def _execute_graph_batchwise(self, paths, strikes, riskaversion, sess, epochs=1, train_flag=False): sample_size = paths.shape[1] batch_size=self.batch_size idx = np.arange(sample_size) start = dt.datetime.now() for epoch in range(epochs): # Save the hedging Pnl for each batch pnls = [] strategies = [] if train_flag: np.random.shuffle(idx) for i in range(int(sample_size/batch_size)): indices = idx[i*batch_size : (i+1)*batch_size] batch = paths[:,indices,:] if train_flag: _, pnl, strategy = sess.run([self.train, self.Hedging_PnL, self.strategy], {self.S_t_input: batch, self.K : strikes[indices], self.alpha: riskaversion}) else: pnl, strategy = sess.run([self.Hedging_PnL, self.strategy], {self.S_t_input: batch, self.K : strikes[indices], self.alpha: riskaversion}) pnls.append(pnl) strategies.append(strategy) #Calculate the option prive given the risk aversion level alpha CVaR = np.mean(-np.sort(np.concatenate(pnls))[:int((1-riskaversion)*sample_size)]) if train_flag: if epoch % 10 == 0: print('Time elapsed:', dt.datetime.now()-start) print('Epoch', epoch, 'CVaR', CVaR) self.saver.save(sess, r"/Users/matthiasgroncki/models/%s/model.ckpt" % self.modelname) self.saver.save(sess, r"/Users/matthiasgroncki/models/%s/model.ckpt" % self.modelname) return CVaR, np.concatenate(pnls), np.concatenate(strategies,axis=1) def training(self, paths, strikes, riskaversion, epochs, session, init=True): if init: sess.run(tf.global_variables_initializer()) self._execute_graph_batchwise(paths, strikes, riskaversion, session, epochs, train_flag=True) def predict(self, paths, strikes, riskaversion, session): return self._execute_graph_batchwise(paths, strikes, riskaversion,session, 1, train_flag=False) def restore(self, session, checkpoint): self.saver.restore(session, checkpoint)

The constructor creates the computational graph, we using the `tf.contrib.rnn.MultiRNNCell()`

cell to stack the LSTM Cells `tf.contrib.rnn.LSTMCell()`

.

We can pass the timesteps, batch_size and number of nodes in each layer to the constructor.

At the moment the network is quite simple, we use in standard 4 Layers with 62, 46, 46, and 1 node. For a introduction in RNNs and LSTM I can recommend to read http://colah.github.io/posts/2015-08-Understanding-LSTMs/ or http://adventuresinmachinelearning.com/keras-lstm-tutorial/ but there are plenty of resources online.

Our class provides a function to train the model and to predict a trading strategy.

## Results

We compare the average PnL and the CVaR of the trading strategies assuming we can charge the Black Scholes price for the option.

For the first test set (strike 100, same drift, same vola) the results looks quite good.

** alpha = 0.5 **

A trader with such a risk aversion should charge is 0.25 above the Black Scholes price.

** alpha = 0.99 **

A trader with such a risk aversion should charge about 0.99 above the Black Scholes price.

We see with a higher risk aversion extreme losses will be less likely. And the trader will need a higher compensation to take the risk to sell the option. Both strategies have a lower CVaR and a higher average profit compared to the Black Scholes strategy while the trading strategy of the RNN has a higher volatility as the Black Scholes one.

But what happen if we need a strategy for an option with a different strike:

Alpha = 0.5 and Strike @ 95:

We see that the PnL of the RNN strategy is significantly worse than then BS trading strategy. If we compare the deltas we see that the model assume a strike at 100.

We see a similar picture for higher strikes and different alphas.

If we change the drift of the observe market value, both hedges still hold (as expected). But its a different picture when we change the volatility.

In that case both models fails similar bad:

## Conclusion

Overall it is a very interesting application of deep learning to option pricing and hedging and I am very curious about the future developments in this field.

The RNN is able to learn a hedging strategy for a particular option without any assumption of the underlying stochastic process. The hedging strategy outperforms the Black Scholes delta hedge strategy but the neural network fails at generalising the strategy for options at different strike levels. To be fair we have to admit that our training set consisted only of one option at one strike level. In the next part we will try to improve the model with a more diverse training set and add more features to it.

But the next post will be most likely the 3rd part of the fraud detection series (From Logistic Regression to Deep Learning – A fraud detection case study Part I, Part II).

So long…