[SOLVED] Programming - Return Predictions From Trade Flow

30.00 $

Category:

Description

5/5 - (1 vote)

1             Introduction

Here you will assess trade flow as means of generating profit opportunities in 3 cryptotoken markets. We stress the word “opportunity” because at high data rates like these, and given the markets’ price-time priority, it is far easier to identify desirable trades in the data stream than it is to inject oneself profitably into the fray.

2             Data

We have preprocessed level 2 exchange messages from the Coinbase WebSocket API for you into a more digestible format.

2.1           Treatment

Load the  data for all 3 pairs from the class website. For each one, split it into test and training sets, with your training set containing the first 20% of the data and the test set containing the remainder.

2.2           Format

The data has the following structure[1]

2.2.1     Trades

          1618090137140737000                            1618090137157544000                                35690 1000000 -1
          1618090137851379000                            1618090137864544000                                35700 29801980 2
          1618270615253262000                            1618270615358639000                                35760 2926932560 -1
          1618270616012160000                            1618270616105583000                                35760

The Side is actually a sum of trade sides at the same price and time.

2.2.2     Book

16673940 -1
Ask1PriceMillionths                                                                       35700                                             35700 35770 35770
Bid1PriceMillionths                                                                       35690                                             35690 35760 35760
Ask1SizeBillionths                                                             11872084060                                  11872084060 1255039420 1255039420
Bid1SizeBillionths                                                              32957203990                                  32957203990 24752612680 24752612680
Ask2PriceMillionths                                                                       35710                                             35710 35780 35780
Bid2PriceMillionths                                                                       35680                                             35680 35750 35750
Ask2SizeBillionths                                                             31032423370                                  30332423370 31011776970 31011776970
Bid2SizeBillionths                                                              45284575470                                  45284575470 41785630850 41785630850

received utc nanoseconds

3             Exercise

Write code to find τ-interval trade flow just prior[2] to each trade data point[3][4] i. Compute T-second forward returns. Regress them against each other in your training set, to find a coefficient β of regression.

For each data point in your test set you already have, so your return prediction is ˆ . Define a threshold j for ˆri and assume you might attempt to trade whenever j < |rˆi| .

4             Analysis

Assess the trading opportunities arising from using these return predictions in your test set. As part of this assessment, comment on the reliability of β, how you chose j, and what you might expect from using much longer training and test periods.

2

[1] Note that inaccuracies in clock settings, i.e. “clock skew”, can cause timestamps to appear later than the time at which they are recorded as having been received.

1

[2] We do not include the trade i data itself, because we are evaluating trade i in terms of the flow we would have been aware of just before it happened.

[3] NOTE: the trade data series does not necessarily have strictly increasing timestamps. Be sure not to include other trades at the same timestamp in your computation of Fi.

[4] It is not necessary to handle latency in your homework, but for your edification: a more careful implementation would account for lags. For a pessimistic approach we could choose L as, say, twice the 99th percentile of computational and communications lag. Then, it would use book data (not just trade data) to help compute return from time ti+L to ti+L+T and run regressions using that. The idea here is that it takes approximately time L to “do anything” about trade information.