K-Means Clustering of STI Component Stocks

In this tutorial of machine learning on algorithmic trading, we will show how to perform k-means clustering of STI component stocks.
First we will import all the necessary python packages such as pandas, talib, yahoo finance
Next, we use pandas read_html to scrape the web and get all the STI components from the Wikipedia page.
Next, we use yahoo finance package to extract 3 features of financial fundamental ratios such as beta, eps and PE ratio. You can get more features if you need more clustering. In general, the more features you have, the more clusters will be needed.
Next, we will scale the data for K-Means clustering using scikit learn standard scalar.
Next, we determine the minimum k needed to cluster the data. It seems that the minimum is 4. We will choose 5 for this tutorial.
Next, we will perform the K-Means clustering on the features using scikit learn package.
We can check the cluster assignment here. We note that there is only stock for label 3 and 4. Label 4 only has Jardine Matheson Holding and Label 3 only has STATS.
Next, we visualize the clusters on the 2D plots of eps vs beta. We see that generally there are about 4 quadrants, high beta-low eps, low beta-low-eps, high beta-high eps, and high beta-high eps. It is interesting that OCBC is not the same cluster as DBS and UOB.

The code can be downloaded below
https://github.com/tertiarycourses/Algorithmic-Trading/blob/main/k-means-clustering-sti-components.ipynb