机器学习模型与加密币预测
对于任何在市场上交易的人(股票/密码等)来说,你知道信息可以而且通常是决定是否投资某只股票或在这种情况下是投资加密货币的关键。在预测j加密货币时,看看机器学习能为我们做些什么是很重要的。在这个例子中,我将展示比特币(BTC)的情况,因为它是拥有最多可用数据的coin 。
第一个难题是数据。即使是股票也是高度不可预测和波动的。对于加密货币来说,情况更糟,尤其是在过去三年里,自从比特币席卷全球以来。在如此动荡的空间里,预测比特币的这种结果将是困难的。我们还必须考虑某些coin流出的时间长短。以比特币为例,我们有大约4年的每日数据。
另一个需要考虑的问题是加密货币还很年轻。即便是试图预测比特币的价格也会很困难,因为这些数据只能追溯到2014年(直到几年前,比特币的价格一直处于停滞状态)。这就是为什么最好使用某种类型的学习网络或神经网络,如LSTM(长-短期记忆)来描述这些发现。
LSTM是RNN(循环神经网络)的单位。LSTM网络的特别之处在于,它非常擅长对时间序列数据进行分类和处理,因为数据中可能存在“滞后”或经常未知的事件(尤其是比特币事件)。与RNN相比,LSTM网络的主要优点是对数据长度的不敏感。像谷歌这样的公司使用LSTM来翻译谷歌,苹果也使用这种类型的网络来在iPhone上实现“Quicktype”功能。我们也不能忘记LSTM弥补了今天的Alexa。
抓取数据
使用Alphavision API获取数据实际上非常简单。您必须获得自己的API密钥(它是免费的),但在获得数据后,最好将文件保存在本地以备将来使用,就像我们在下面使用pandas一样。Python的代码如下:
data_source = ‘alphavantage’
# For future data IF statement for API
if data_source == ‘alphavantage’:
api_key = config.api_key
symbol = ‘BTC’
market = ‘USD’
url_string = “https://www.alphavantage.co/query?function=DIGITAL_CURRENCY_DAILY&symbol=%s&market=USD&apikey=%s"%(symbol,api_key)
file_to_save = ‘currency_daily_BTC-%s.csv’%symbol
#If file for csv does not exist it will turn it into pandas dataframe and save
if not os.path.exists(file_to_save):
with urllib.request.urlopen(url_string) as url:
data = json.loads(url.read().decode())
#extract data
data = data[‘Time Series (Digital Currency Daily)’]
df = pd.DataFrame(columns = [‘Date’, ‘Open’, ‘High’, ‘Low’, ‘Close’, ‘Volume’, ‘Market Cap(USD)’])
for k,v in data.items():
date = dt.datetime.strptime(k, ‘%Y-%m-%d’)
data_row = [date.date(), float(v[‘1a. open (USD)’]), float(v[‘2a. high (USD)’]),
float(v[‘3a. low (USD)’]), float(v[‘4a. close (USD)’]), float(v[‘5. volume’]), float(v[‘6. market cap (USD)’])]
df.loc[-1,:] = data_row
df.index = df.index + 1
print(‘Data saved to : %s’%file_to_save)
df.to_csv(file_to_save)
#if data is already there load it from CSV
else:
print(‘File already exists. Loading data from CSV’)
df = pd.read_csv(file_to_save)
#df = panda dataframe for local use
df = pd.read_csv(‘currency_daily_BTC-BTC.csv’)
df = df.sort_values(‘Date’)
在从比特币中获取所有每日高点和低点后,数据集的数据集减少到1500行左右。当我开始这个项目时,这是我的第一个顾虑因为没有太多的数据需要处理。我们有可能在预测中加入市值和成交量,从而做出更准确的预测,但随后我们必须找出数据的高点和低点之间的相关性,然后希望它能给我们提供更好的预测。
尽管如此,我们仍然对数据进行特征化、缩放和归一化,以帮助我们进行预测。
#Setting high/low and mid prices for data
high_prices = df.loc[:,'High'].as_matrix()
low_prices = df.loc[:,'Low'].as_matrix()
mid_prices = (high_prices+low_prices)/2.0
#split
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(mid_prices, test_size=0.2)
##Now we Scale the data to be between 0 and 1
scaler = MinMaxScaler()
train_data = train_data.reshape(-1,1)
test_data = test_data.reshape(-1,1)
#Train the scaler with training data and smooth data
smooth_window_size = 300
for di in range(0, 1200, smooth_window_size):
scaler.fit(train_data[di:di+smooth_window_size,:])
train_data[di:di+smooth_window_size,:] = scaler.transform(train_data[di:di+smooth_window_size,:])
#Normalize the last of the remaining data
scaler.fit(train_data[di+smooth_window_size:,:])
train_data[di+smooth_window_size:,:] = scaler.transform(train_data[di+smooth_window_size:,:])
#Reshape train and test data to the shape of data size
train_data = train_data.reshape(-1)
test_data = scaler.transform(test_data).reshape(-1)
实施LSTM
使用LSTM的最好和最快的方法是使用tensorflow的RNN API,它实现了LSTM模块。设置顺序测试点后,运行训练数据循环,验证每个预测,然后您需要做的就是更新该预测并将其可视化。Python代码如下:
#points to start test predictions from
seq_test_points = np.arange(1100, 1500, 20).tolist()
for ep in range(epochs):
# ==========================Training==========================
for step in range(train_seq_value//batch_size):
u_data, u_labels = data_gen.unroll_batches()
feed_dict = {}
for ui,(dat,lbl) in enumerate(zip(u_data, u_labels)):
feed_dict[train_inputs[ui]] = dat.reshape(-1,1)
feed_dict[train_outputs[ui]] = lbl.reshape(-1,1)
feed_dict.update({tf_learning_rate: 0.0001, tf_min_learning_rate: 0.00001})
one = 1
_, one = session.run(([optimizer, loss]), feed_dict=feed_dict)
average_loss += 1
#===============================VALIDATION========================================
if (ep + 1) % valid_summary == 0:
average_loss = average_loss/(valid_summary*(train_seq_value//batch_size))
if (ep+1)%valid_summary == 0:
print('Average loss at step %d: %f' % (ep + 1, average_loss))
train_mse_ot.append(average_loss)
average_loss = 0 #reset
seq_predictions = []
test_mse_loss_seq = []
#====================== Updating State and Making Predictions ===========================
for w_ix in seq_test_points:
mse_test_loss = 0.0
predictions = []
if (ep + 1)-valid_summary==0:
#calculate x_axis values in first valid epoch
x_axis = []
#Feed past values of stock prices to make predictions from there
for re_i in range(w_ix-num_unrolling+1,w_ix-1):
now_price = all_mid_data[re_i]
feed_dict[sample_inputs] = np.array(now_price).reshape(1,1)
_ = session.run(sample_prediction, feed_dict = feed_dict)
feed_dict = {}
now_price = all_mid_data[w_ix]
feed_dict[sample_inputs] = np.array(now_price).reshape(1,1)
#Making predictions for x steps each one uses previous as it's input
for pred_ix in range(n_predict_once):
pred = session.run(sample_prediction, feed_dict=feed_dict)
predictions.append(np.asscalar(pred))
feed_dict[sample_inputs] = np.asarray(pred).reshape(-1,1)
if (ep+1) - valid_summary == 0:
#only calculate x_axis values in first epoch validation
x_axis.append(w_ix+pred_ix)
mse_test_loss += 0.5*(pred-all_mid_data[w_ix+pred_ix])**2
session.run(sample_state_reset)
seq_predictions.append(np.array(predictions))
mse_test_loss /= n_predict_once
test_mse_loss_seq.append(mse_test_loss)
if (ep + 1) - valid_summary == 0:
x_axis_seq.append(x_axis)
current_mse_test = np.mean(test_mse_loss_seq)
#logic for learning rate decay
if len(test_mse_ot) > 0 and current_mse_test > min(test_mse_ot):
loss_nondecrease_count += 1
else:
loss_nondecrease_count = 0
if loss_nondecrease_count > loss_nondecrease_threshold :
session.run(inc_glstep)
loss_nondecrease_count = 0
print(' Decreasing learning rate by 0.5')
test_mse_ot.append(current_mse_test)
print(' Test MSE %.5f'% np.mean(test_mse_loss_seq))
predictions_over_time.append(seq_predictions)
print(' Finished Prediction')
最后,在绘制了进化和最佳测试预测后,我们得到了一个好看的图表!
LSTM网络与比特币
结论
通过图表查看最终产品,我们可以看到LSTM网络在很大程度上随着时间的流逝而出现了起起伏伏。