Stock Market AnalysisStock Market Analysis: Use Python with

Stock Market Analysis: Use Python with Historical Data to Identify Returns and Risks for Stock Investing

1. Aims, objectives and background

1.1 Introduction

Stock investing is a widely practiced and lucrative way to increase one’s wealth over time. However, it also entails a lot of uncertainty and volatility, as stock prices can change dramatically due to various factors, such as company performance, industry trends, economic events and investor sentiment.

Therefore, it is important for people who invest in the stock market to have a sound understanding of the returns and risks associated with their investments, and to be able to measure and manage them effectively.

I have many years of investing experience. When I started to study the module, it gave me the idea to use python and the stock market historical data to do such an interesting project.

Python is a popular programming language and its libraries are good tools for data manipulation and analysis. It is helpful to access, process historical stock data and apply various statistical and mathematical techniques to reach my goal.

In this project, I would like to choose some stocks from the US stock market, such as Microsoft, Apple, Google, Nvidia, Amazon and and other stocks to show the basic stock market knowledge, and analyze the returns of the portfolio and risks with statistical methods.

1.2 Aims and objectives

1.2.1 Aims

The aim of this article is to demonstrate how to use Python with historical data to perform stock market analysis, and to identify the returns and risks for stock investing. The objectives of this article are:

To show how to get historical data using Python and relative libraries.
To explain the concepts and methods of calculating and comparing the returns and risks of different stocks or portfolios.
To illustrate how to apply various statistical and mathematical techniques to measure and evaluate the returns and risks of different stocks or portfolios using Python tools and techniques, such as variance, correlation, beta, sharpe ratio, performance ratio and parabolic SRA.
To demonstrate how to create and customize charts and graphs using Python libraries such as matplotlib and seaborn.
To show how to use portfolio investment and Parabolic SAR to optimize investment returns.

1.2.2 Analysis steps

In this project, I will use the following steps to structure the analysis:

How to use yfinance API and web scraping to fetch historical data from Yahoo Finance.
Briefly explain the data structure and the meaning of each column of data.
How to calculate the stock returns with log algorithms, and show the returns distribution on the histogram.
Create a simple cumulative returns for each stock, and show the performance.
Use statistical methods, such as variance, correlations, beta, sharpe ratio, information ratio to show the relations of the stock returns and the risk.
Choose stocks with balanced returns and risks and apply weighted portfolio strategy, plot the data to show the performance.
Optimize the investment strategy, compare the returns of portfolios, and plot the results.
Make a conclusion, state the limitation of the analysis, and plan further analysis.

1.3 Data

1.3.1Data requirements

The project requires historical stock data of Microsoft, Apple, Google, Nvidia, Amazon and S&P (“Standard & Poor’s”) 500 that can be obtained using Python and finance library of yfinance. The data should includes dates, low price, high price, open price, close price and volume. Other data for analysing such as returns of portfolio, risks of loss will be calculated and generated with statistical methods.

1.3.2 Limitations and constrains of the data

The historical stock data obtained from Yahoo Finance may have some limitations and constraints that need to be considered when conducting the analysis. Some of them are:

The data may contain missing values, outliers, or errors that could affect the accuracy and reliability of the results. Therefore, it is necessary to perform some data cleaning and preprocessing steps before the analysis, such as checking for data quality, handling missing values, removing outliers, etc.

The data may not capture the full picture of the stock market performance, as it only covers a limited period of time (5 years) and a limited number of stocks. Therefore, it is important to acknowledge the scope and limitations of the data, and how they may affect the generalizability and validity of the findings. For example, the data may not reflect the long-term trends or the diversification benefits of the stock market investing.

1.4 Ethical considerations

1.4.1 Use of historical data

The historical data provided by yfinance is obtained from Yahoo Finance, which is a service that offers financial information, news, data, analysis, and tools to the public. Yahoo Finance is part of the Yahoo family of companies, which is owned by Verizon Media. The use of historical data from yfinance requires compliance with the Yahoo Terms of Service, which apply to all Yahoo products and services. Some of the important points in the Yahoo Terms of Service are:

You must be at least the minimum age required in your region to use the Services.

You must not use the Services for any illegal or unauthorized purpose.

You must not violate any intellectual property rights or privacy rights of Yahoo or others.

You must not interfere with the proper functioning of the Services or harm the security of the Services.

You must not use the Services for commercial purposes without Yahoo’s prior written consent.

You must comply with any community guidelines and supplemental terms that apply to specific Services.

You can find more information about Yahoo’s policies and Terms of Service in the Terms Center.

You can also read the Yahoo Terms of Service, which outline the expected behavior of Yahoo users and the consequences of violating them. By using yfinance, you agree to respect the Yahoo community and follow these guidelines.

1.4.2 Onward use / reuseage of data and derived data

If you want to reuse the historical data in the project, you have to follow their terms and conditions and yfinance’s rules. You also have to ask for permission if needed. The same applies to any data that is based on the historical data and has information from Yahoo Finance. I own the analysis and conclusions in this project.

1.4.3 Declaration of the project

The purpose of this report is to explore the returns and risks of stock investing based on different factors such as time period, portfolio composition, and market trends. The report also provides some visualizations and statistics to illustrate the findings. This report is only for personal study and does not intend to offer any investment advice.

2 Preparing and explore the data

2.1 import the relative libraries

import pandas as pd
import numpy as np
import math
from collections import deque
import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf
# Show all matplotlib graphs inline
%matplotlib inline

2.2 Obtaining historical stock data

Within this section, I will use both web scraping and yfinance api to show how to obtain historical data and write it into the localStorage, and explain the meaning of each column data.

#  The historical stocks data ready to load
def read_csv_data(tickers, start, end, method="with_api"):
    stocks = {}
    for ticker in tickers:
        # Looping the tickers and read it from local storage
        # If there is no data in localStorage, load it with yfinance and write it to localStorage

        try:
            stocks[ticker] = pd.read_csv(f"./{ticker}.csv")
        except Exception:
            # Two ways of obtaining data from yahoo finance
            if method == "with_api":
                stocks[ticker] = write_csv_data(ticker, start, end)  
            else:
                stocks[ticker] = get_data_with_web_scraping (ticker, start, end)
            
            # write data to localstorage
            stocks[ticker].to_csv(f"{ticker}.csv")
    return stocks 

# Load data with yfinance
def write_csv_data(ticker, start, end):
    stock = yf.download(ticker, start, end)
    return stock

# Load data with web scraping (start and end params should be timestamp)
# Url: https://finance.yahoo.com/quote/AAPL/history
# Query: ?period1=1670716800&period2=1702080000&interval=1d
# &filter=history&frequency=1d&includeAdjustedClose=true
def get_data_with_web_scraping (ticker, start, end):
    mozilla = "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
    appleWebkit = " AppleWebKit/537.36 (KHTML, like Gecko)"
    chrome = " Chrome/58.0.3029.110"
    safari = " Safari/537.36"
    headers = {"User-Agent": mozilla + appWebkit + chrome + safari}
    # Construct the URL for the historical data page
    base_url = "https://finance.yahoo.com/quote/"
    p1 = f"period1={start}&period2={end}&interval=1d"
    p2 = "&filter=history&frequency=1d&includeAdjustedClose=true"
    params = f"{symbol}/history?AAPL/history?" + p1 + p2
    url = base_url + params
    # Make a GET request and parse the HTML content
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")

    # Find the table element that contains the data
    table = soup.find("table", class_ = "W(100%) M(0)")
    # Extract the table headers and the table rows
    table_headers = [th.text for th in table.find("tr").find_all("th")]
    rows = table.find("tbody").find_all("tr")

    # Create an empty list to store the data
    data = []

    # Loop through the rows and append the data to the list
    for row in rows:
        # Skip the rows that have a colspan attribute (they are not data rows)
        if row.get("colspan"):
            continue
        # Get the text values of the cells
        values = [td.text for td in row.find_all("td")]
        # Create a dictionary with the headers and the values
        record = dict(zip(table_headers, values))
        # Append the record to the data list
        data.append(record)

    # Convert the data list to a pandas dataframe
    stock = pd.DataFrame(data)
    stock.columns = ["Date", "Open", "High", "Low", "Close", "Adj Close", "Volume"]
    stock["Date"] = pd.to_datetime(stock["Date"])
    
    return stock
# Set date to be index

def setDataIndex(stocks):
    for ticker in stocks:
        stock = stocks[ticker]
        stock.index = pd.to_datetime(stocks[ticker]["Date"])

Stocks use in the project:

MSFT: Microsoft Corporation
GOOG: Alphabet Inc.
AAPL: Apple Inc.
INTC: Intel Corporation
ADBE: Adobe Inc.
XOM: Exxon Mobil Corporation
CVX: Chevron Corporation
BP: BP p.l.c.
PG: The Procter & Gamble Company
KO: The Coca-Cola Company
WMT: Walmart Inc.
PFE: Pfizer Inc.
JNJ: Johnson & Johnson
MRK: Merck & Co., Inc.
^GSPC: S&P 500 Index

# Calling the function and obtaining the historical stock data

tickers = (
    'MSFT', 
    'GOOG', 
    'AAPL', 
    'INTC', 
    'ADBE', 
    "XOM", 
    "CVX", 
    "BP", 
    "PG", 
    "KO", 
    "WMT", 
    "PFE", 
    "JNJ", 
    "MRK", 
    "^GSPC")

# This pieces of timestamp is use for fetching data with webscraping 
# start_timestamp = 1544323200
# end_timestamp = 1702540800
# stocks = read_csv_data(tickers, start, end, method="web_scraping")

start = datetime(2018, 12, 9)
end = datetime(2023, 12, 8)

stocks = read_csv_data(tickers, start, end)
setDataIndex(stocks)
# check if all stocks data loaded successfully
stocks.keys()
stocks["MSFT"].head()

The above is Microsoft stock data. There are six columns in the data above (exclude index and date column), let me explain the meaning of each column.

Open: The price at which a stock started trading when the opening bell rang.

High: The highest price at which a stock is traded during a period.

Low: The lowest price of the period.

Close: The price of an individual stock when the stock exchange closed shop for the day.

Adj Close: The adjusted Close price.

Volume: the number of shares of a security traded during a given period of time (a day in the data above).

The difference between “Close” and “Adj Close” in stock market is that “Close” is the raw price of the stock at the end of the trading day, while “Adj Close” is the price that reflects any corporate actions that affect the stock value, such as dividends, stock splits, or rights offerings12. The “Adj Close” is often used to analyze the historical performance of a stock over time, as it gives a more accurate picture of the stock’s value[1].

3 Stocks Return

3.1 Calculate logarithmic return

The logarithmic return is a way of calculating the rate of return on an investment[2].

$Logarithmic\space return = ln(\frac{P_{t}}{P_{t-1}})$

Where:

ln = Natural Logarithm

$P{t}$ =Actual Price

$P{t-1}$ =Previous Price

# Loop over the stocks dictionary
def setColumnData(column_name, fn):
    for stock in stocks:
        # Calculate the logarithmic returns for each stock using the adjusted close prices
        stocks[stock][column_name] = fn(stock)
        # Fill NaN with 0
        stocks[stock][column_name].fillna(0, inplace=True)
def getColumnData(column_name, stocks=stocks, tickers=tickers):
    stocks_return = pd.DataFrame(data=np.array(
    [stocks[stock][column_name] for stock in stocks]).T,
    columns=tickers)
    stocks_return.index = pd.to_datetime(stocks[tickers[0]]["Date"])
    # Fill NaN with 0
    stocks_return.fillna(0, inplace=True)
    return stocks_return
# Calculate and set log return for each stock
setColumnData(
    column_name="Log Return", 
    fn=lambda x: np.log(stocks[x]["Adj Close"] / stocks[x]["Adj Close"].shift(1)))

stocks["MSFT"]
# Get return of each stock
stocks_return = getColumnData(column_name="Log Return", stocks=stocks, tickers=tickers)
stocks_return

The data above is the daily stock returns of each company.

A return is the change in price or value of an asset, investment, or project over time. A positive return means that the asset has increased in value, while a negative return means that it has decreased in value[3].

We can also see the returns on the plot. Below are plots of three stocks: Microsoft, Google, and Apple, where the x-axis is the date and the y-axis is the return value. The horizontal orange line is the baseline, which indicates whether the returns are positive or negative. The lines above the baseline represent positive returns, and the lines below the baseline represent negative returns.


def plot_daily_return(stocks_return):
    returns = stocks_return.copy()
    returns["Baseline"] = 0
    plt.suptitle("Stocks returns", fontsize=36)
    plt.figure(figsize=(12,18))
    index = 1
    for stock_return in stocks_return:  
        plt.subplot(5,3,index)
        plt.title(stock_return)
        plt.plot(returns[stock_return])
        plt.plot(returns["Baseline"])
        index += 1

plot_daily_return(stocks_return)

3.1.2 Distribution of returns

The above plot does not show much information with the random lines. To show the returns distribution, we need to use a histogram.

In the histgram, the x-axis is the returns value, and the y-axis is the frequency of the returns value. The values to the left of the blue vertical line are negative returns, and the values to the right are positive returns.

def returns_hist_plot(stocks_return):
    index = 1
    plt.figure(figsize=(16, 18))
    plt.suptitle("Stocks Returns", fontsize=24)

    for stock_return in stocks_return:  
        ax = plt.subplot(5,3,index)
        plt.hist(stocks_return[stock_return], label=stock_return, alpha=0.5, bins=20)
        plt.axvline(0, color='b', linewidth=0.5, label='Baseline')
        plt.title(stock_return)
        index += 1;

returns_hist_plot(stocks_return)

The above histogram shows the return distribution of the chosen 14 stocks and the S&P 500 index. The x-axis is the daily return, and the y-axis is the frequency. The blue line marks the zero return, on the left are the negative returns, and on the right are the positive returns. From the histogram, we can easily see the shape, center, and spread of the distribution for each stock.

3.2 Calculate the cumulative returns of each stock

Stock investing involves long term investment, which means we hold the investment for a long period, not one or two short days. This requires us to calculate the long term return or cumulative return.

The formula for calculating the cumulative returns of each stock is[3]:

$CR_t = \prod_{i=1}^t (1 + r_i) - 1$

where $CR_t$ is the cumulative return at time $t$ , and $r_i$ is the simple return at time $i$ .

# Calculate and add Cumulative Return to each stock
setColumnData(
    column_name="Cumulative Return",
    fn=lambda x: (1 + stocks[x]["Log Return"]).cumprod() - 1)    
cumulative_return = getColumnData(column_name="Cumulative Return")
cumulative_return.tail(1)

The returns data show the different returns for each stock; AAPL had the highest return of 2.66, and PFE had the lowest of -0.31. This means investing \ $100 in AAPL would earn us \\$ 266, but investing in PFE would lose us more than a third of our money. Investing in a single stock is very risky, because we can’t predict its performance.

4 Value and assess the risk of stock

4.1 Covariance and Correlation

As we mentioned above, investing in a single stock is a high risk, since the stock may have bad performance. It’s better to invest in a combination of stocks to lower down the risks, but it does not always work. If we chose the stocks that perform the same, meaning they move in the same direction, the risks would not be eliminated.

4.1.1 Covariance

Covariance is a statistical measure of how two assets move in relation to each other. It can help investors diversify their portfolio and reduce their risk. A positive covariance means that two assets tend to move together, while a negative covariance means that they move in opposite directions. Covariance is calculated by multiplying the deviations of each asset’s return from its mean return[4].

$Cov(X,Y) = E[(X - E[X])(Y - E[Y])]$

where E[X] and E[Y] are the expected values or means of X and Y, respectively.

stocks_return = getColumnData("Log Return")
covariance = stocks_return.cov()
covariance

In the matrix above, we can see that all the covariance values are positive, which means that they all move together. However, we cannot tell to what extent they are related to each other.

4.1.2 Correlation

According to Hayes (2019), the correlation coefficient can vary from −1.0 to 1.0 . A negative correlation coefficient (−1.0) indicates a perfect inverse relationship between the variables, meaning that they move in opposite directions. A positive correlation coefficient (1.0) implies a perfect direct relationship between the variables, meaning that they move in the same way. A common criterion for a strong correlation is a value of 0.8 or higher, either negative or positive[5].

$Corr(Ra, Rb) = \frac{Cov(Ra, Rb)}{σ(R_a)σ(R_b)}$

stocks_return = getColumnData("Log Return")
correlation = stocks_return.corr()
correlation

MSFT    0.832454

GOOG    0.762484

AAPL    0.808666

INTC    0.668664

ADBE    0.738879

XOM     0.560119

CVX     0.617409

BP      0.538555

PG      0.591189

KO      0.647339

WMT     0.470798

PFE     0.472251

JNJ     0.558886

MRK     0.487312

Name: ^GSPC, dtype: float64

Use SP&500 as benchmark

The matrix above shows too much information, which makes it hard to analyze. We need to use the S&P 500 index as a benchmark, to see the correlation of each stock with it.

correlation = correlation.drop("^GSPC", axis=1).loc["^GSPC"]
correlation

The above table shows the correlation of each stock with the S&P 500 index. We can see that:

MSFT, GOOG, and AAPL have strong positive correlations with the index, meaning that they tend to increase or decrease together.

INTC, ADBE, KO, and CVX have moderate positive correlations with the index, meaning that they tend to move in the same direction, but not as consistently as the previous group.

The rest of the stocks have weak positive correlations with the index, meaning that they have a slight tendency to move in the same direction, but not very reliably.

4.2 The Beta

Beta is a way of measuring how much a security or portfolio fluctuates or systematically risks compared to the whole market. The capital asset pricing model (CAPM) uses beta to estimate the expected return of an asset, taking into account both the risk and the cost of capital. Beta also helps investors to see how a stock behaves relative to the market and to pick stocks that suit their risk tolerance.

A security with a beta of 1.0 moves the same way and amount as the market. A security with a beta higher than 1.0 is more volatile and magnifies the market movements. A security with a beta lower than 1.0 is less volatile and moderates the market movements. A security with a negative beta moves the opposite way of the market[8].

$β_e= \frac{Covariance(R_e, R_m)}{Variance(R_m)}$

where:

$R_e$ = the return on an individual stock

$R_m$ = the return on the overall market

Covariance = how changes in a stock’s returns are related to changes in the market’s returns

Variance = how far the market’s data points spread out from their average value

stocks_return = getColumnData("Log Return")
corvariance = stocks_return.cov()

# Annual corvariance
corvariance = corvariance.drop("^GSPC")["^GSPC"] * 252
corvariance

market_variance = stocks_return["^GSPC"].var() * 252
market_variance

beta = corvariance / market_variance

beta.sort_values(ascending=False)

The table shows how much the stocks change compared to the market.

MSFT, GOOG, AAPL, INTC, and ADBE have beta values above 1. This means they change more than the market. They can give higher profits, but they are also riskier.

XOM, CVX, BP, PG, KO, WMT, PFE, JNJ, and MRK have beta values between 0 and 1. This means they change less than the market. They are more stable and less risky.

PG, KO, WMT, PFE, JNJ, and MRK have beta values below 1. This means they are even less risky than the market. They are good for investors who want stability.

4.3 Expected return

Expected return is the average profit or loss that an investor can expect from an investment based on its historical performance or future projections. It is calculated by multiplying the possible outcomes by their probabilities and adding them up. Expected return can be used to compare different investments and to estimate the risk-adjusted return of a portfolio. Expected return is not a guarantee, as it is based on assumptions and uncertainties[8].

$\text{Expected return} = \text{Risk-free rate} + \beta \times (\text{Market return} - \text{Risk-free rate})$

Where:

$\text{Risk-free rate}$ is the return of a risk-free asset, such as a government bond.
$\beta$ is a measure of the systematic risk or volatility of an investment relative to the market.
$\text{Market return}$ is the return of the market portfolio, such as an index fund.

The US risk-free rate is the return of a US government issued treasury security that has no risk of default. The most commonly used risk-free rate is the 10-year treasury rate, which is the yield of a 10-year treasury bond. According to the latest data from YCharts, the 10-year treasury rate is 4.23% as of December 8, 2023[9]

risk_free_rate = 0.0423
# The value in the last row, which is the final return
market = stocks["^GSPC"]["Cumulative Return"].iloc[-1]

expected_return = (risk_free_rate + beta * (market / 5 - risk_free_rate)) * 5
expected_return.sort_values(ascending=False)

The expected returns of the stocks are very different, from 37.36% for WMT to 63.80% for ADBE. This shows that some stocks have a higher chance of making more money than others, but they also have more uncertainty.

4.3 Excess return

Excess return shows how well an investment performs compared to the market or a risk-free alternative. Excess return can be positive, negative, or zero. A positive excess return means that the investment beats the benchmark or the risk-free asset. A negative excess return means that the investment underperforms the benchmark or the risk-free asset. A zero excess return means that the investment matches the benchmark or the risk-free asset[10].

$Excess\space return\space =\space Total\space return\space -\space Expected\space return$

cumulative_return = getColumnData(column_name="Cumulative Return").drop("^GSPC", axis=1)

total_return = cumulative_return.iloc[-1]

excess_return = total_return - expected_return
excess_return

The results of excess return indicate the following:

MSFT and AAPL significantly outperformed the market average, with excess returns greater than 1. AAPL had the highest excess return of 2.042077, followed by MSFT with 1.254250.

GOOG, ADBE, PG, WMT, and MRK moderately outperformed the market average, with excess returns between 0 and 1.

INTC, XOM, CVX, BP, KO, PFE, and JNJ underperformed the market average, with excess returns less than 0.

4.4 The Sharpe ratio

The Sharpe ratio was created by William F. Sharpe based on the importance of understanding the relation between risk and returns, which is a measure of risk-adjusted return, which compares an investment’s excess return to its standard deviation of returns. The Sharpe ratio is commonly used to gauge the performance of an investment by adjusting for its risk[11].

$Sharpe\space Ratio = \frac{R_p − R_f}{σ_p}$

where:

$R_p$ = returns of the portfolio

$R_f$ = risk-free rate

$σ_p$ = standard deviation of the portfolio excess returns.

daily_risk_free_rate = risk_free_rate / 252

sharpe_ratio = (stocks_return.mean() - daily_risk_free_rate) / stocks_return.std()
sharpe_ratio.sort_values(ascending=False)

Based on these results, we can see that AAPL has the highest Sharpe ratio, followed by MSFT and GOOG. These stocks have performed well relative to their risk. On the other hand,JNJ PFE, INTC, and BP have negative Sharpe ratios, meaning they have lost money compared to the risk-free rate. These stocks have performed poorly relative to their risk. The rest of the stocks have positive but low Sharpe ratios, indicating moderate performance relative to their risk.

4.5 Information Ratio

The information ratio evaluates the additional return that an investment portfolio generates over a risk-free portfolio. It also indicates how much the portfolio differs from the benchmark (Murphy 2019). The ratio is named after the assumption that the portfolio manager has superior information and can thus outperform the benchmark[12].

$Information\space Ratio\space = \frac{(R_p − R_m)}{TE}$

TE = standard deviation of the difference between the portfolio and the benchmark

$R_p$ = Porfolio Return

$β_p$ = Beta of the portfolio

$R_m$ = return of the market

information_ratio = (stocks_return.mean() - daily_risk_free_rate) / beta
information_ratio.sort_values(ascending=False)

Based on these results, we can see that AAPL has the highest IR, followed by MSFT and WMT. These stocks have generated higher returns than the benchmark, given the risk. On the other hand, PFE, INTC, and BP have the lowest IR, meaning they have generated lower returns than the benchmark, given the risk. The rest of the stocks have negative or close to zero IR, indicating poor performance relative to the benchmark.

5 Applying to investment

5.1 Calculate the portfolio

We have analysed the risks of the stocks and now we will create a portfolio with the selected ones. There are different approaches to invest in stocks and create a portfolio. Investors could allocate more weight to the companies that they believe are the right ones and are willing to take the risk. A simple and popular way is to look at the current market value of the company.[13]

# Stocks name with positive excess return
excess_return_positive_tickers = list(excess_return[excess_return > 0].index)
# Stocks name with position sharpe ratio value
sharpe_ratio_positive_tickers = list(sharpe_ratio[sharpe_ratio > 0].index)
# Stocks name with position information ratio value
information_ratio_positive_tickers = list(information_ratio[information_ratio > 0].index)

# Filtering stocks ticker which value are positive
choosed_stocks_ticker = np.intersect1d(excess_return_positive_tickers, sharpe_ratio_positive_tickers, information_ratio_positive_tickers)

choosed_stocks_ticker

# Calculate weight of market values
def calculateWeight(tickers):
    market_values = []
    for ticker in tickers:
        volume = stocks[ticker]["Volume"].iloc[-1]
        last_price = stocks[ticker]["Adj Close"].iloc[-1]
        market_values.append(volume * last_price)
    return [value / sum(market_values) for value in market_values]
        
portfolio_weights = calculateWeight(choosed_stocks_ticker)
portfolio_weights

# Calculate weighted return of each stock
weighted_returns = stocks_return[choosed_stocks_ticker].mul(portfolio_weights, axis=1)
# Calculate cumulative return
cumulate_weighted_return = ((1 + weighted_returns).cumprod() - 1)
# Calculate stocks cumulative portfolio
portfolio = cumulate_weighted_return.sum(axis=1) * 10000

portfolio.tail()

The result above shows that if we had invested \ $10,000 to the selected stocks in 2019, we would have made \\$ 13918.55 in returns on our portfolio at the end of 2023.

SP500_portfolio = ((1 + stocks_return["^GSPC"]).cumprod() - 1) * 10000

portfolio.plot(figsize=(12,8), label="Combining Portfolio", title="Returns of the Portfolio", ylabel="Frequency", xlabel="Date")
SP500_portfolio.plot(label="SP500 Portfolio")
plt.legend()

The above chart shows the returns of a portfolio that combines stocks that we choosed versus the SP&500 index. Investing in the combined stocks yields higher returns than the SP&500, but it also entails higher risk and volatility.

6 Optimise the investing strategy

6.1 Parabolic SAR

The the parabolic SAR, created by J. Wells Wilder, helps traders find out the direction and possible changes of price trends. The indicator applies a technique known as “SAR,” or stop and reverse, which uses a trailing stop to spot the best points to enter and exit the market. The indicator is also called the parabolic stop and reverse, parabolic SAR, or PSAR by traders[14].

Uptrend and Downtrend SAR Equation:

$Uptrend\space Parabolic\space SAR = Prior\space SAR\space +\space Prior\space Acceleration\space Factor\space ∗\space (Prior\space Extreme\space Point − Prior\space SAR)$

$Downtrend\space Parabolic\space SAR = Prior\space SAR\space -\space Prior\space Acceleration\space Factor\space ∗\space (Prior\space Extreme\space Point − Prior\space SAR)$

Implementing the Parabolic SAR is not easy, so I chose to use the code from Raposa[15]

# The code is copied from Raposa 
# https://raposa.trade/blog/the-complete-guide-to-calculating-the-parabolic-sar-in-python/

# Define a class for the parabolic SAR indicator
class PSAR:

  # Initialize the class with the parameters for the indicator
  def __init__(self, init_af=0.02, max_af=0.2, af_step=0.02):
    self.max_af = max_af # The maximum acceleration factor
    self.init_af = init_af # The initial acceleration factor
    self.af = init_af # The current acceleration factor
    self.af_step = af_step # The step size for increasing the acceleration factor
    self.extreme_point = None # The extreme point, the highest high or lowest low in the current trend
    self.high_price_trend = [] # A list to store the high prices in the uptrend
    self.low_price_trend = [] # A list to store the low prices in the downtrend
    self.high_price_window = deque(maxlen=2) # A deque to store the last two high prices
    self.low_price_window = deque(maxlen=2) # A deque to store the last two low prices

    # Lists to track results
    self.psar_list = [] # A list to store the PSAR values
    self.af_list = [] # A list to store the acceleration factor values
    self.ep_list = [] # A list to store the extreme point values
    self.high_list = [] # A list to store the high prices
    self.low_list = [] # A list to store the low prices
    self.trend_list = [] # A list to store the trend values (1 for uptrend, 0 for downtrend)
    self._num_days = 0 # A counter to track the number of days

  # Define a method to calculate the PSAR value for a given day
  def calcPSAR(self, high, low):
    # If the number of days is less than 3, initialize the PSAR values
    if self._num_days < 3:
      psar = self._initPSARVals(high, low)
    # Otherwise, calculate the PSAR value using the formula
    else:
      psar = self._calcPSAR()
    # Update the current values with the new PSAR value and the high and low prices
    psar = self._updateCurrentVals(psar, high, low)
    # Increment the number of days
    self._num_days += 1
    # Return the PSAR value
    return psar

  # Define a helper method to initialize the PSAR values
  def _initPSARVals(self, high, low):
    # If the low price window is not full, set the trend to None and the extreme point to the high price
    if len(self.low_price_window) <= 1:
      self.trend = None
      self.extreme_point = high
      return None
    # If the high price is increasing, set the trend to 1 (up), the PSAR to the minimum low price, and the extreme point to the maximum high price
    if self.high_price_window[0] < self.high_price_window[1]:
      self.trend = 1
      psar = min(self.low_price_window)
      self.extreme_point = max(self.high_price_window)
    # Otherwise, set the trend to 0 (down), the PSAR to the maximum high price, and the extreme point to the minimum low price
    else: 
      self.trend = 0
      psar = max(self.high_price_window)
      self.extreme_point = min(self.low_price_window)
    # Return the PSAR value
    return psar

  # Define a helper method to calculate the PSAR value using the formula
  def _calcPSAR(self):
    # Get the previous PSAR value
    prev_psar = self.psar_list[-1]
    # If the trend is 1 (up), calculate the PSAR as the previous PSAR plus the product of the acceleration factor and the difference between the extreme point and the previous PSAR
    # and set the PSAR to the minimum of the PSAR and the minimum low price in the window
    if self.trend == 1: # Up
      psar = prev_psar + self.af * (self.extreme_point - prev_psar)
      psar = min(psar, min(self.low_price_window))
    # If the trend is 0 (down), calculate the PSAR as the previous PSAR minus the product of the acceleration factor and the difference between the previous PSAR and the extreme point
    # and set the PSAR to the maximum of the PSAR and the maximum high price in the window
    else:
      psar = prev_psar - self.af * (prev_psar - self.extreme_point)
      psar = max(psar, max(self.high_price_window))
    # Return the PSAR value
    return psar

  # Define a helper method to update the current values with the new PSAR value and the high and low prices
  def _updateCurrentVals(self, psar, high, low):
    # If the trend is 1 (up), append the high price to the high price trend list
    if self.trend == 1:
      self.high_price_trend.append(high)
    # If the trend is 0 (down), append the low price to the low price trend list
    elif self.trend == 0:
      self.low_price_trend.append(low)
    # Check if there is a trend reversal and update the PSAR value accordingly
    psar = self._trendReversal(psar, high, low)
    # Append the PSAR value, the acceleration factor, the extreme point, the high price, the low price, and the trend to their respective lists
    self.psar_list.append(psar)
    self.af_list.append(self.af)
    self.ep_list.append(self.extreme_point)
    self.high_list.append(high)
    self.low_list.append(low)
    # Append the high price and the low price to their respective windows
    self.high_price_window.append(high)
    self.low_price_window.append(low)
    # Append the trend to the trend list
    self.trend_list.append(self.trend)
    # Return the PSAR value
    return psar

  # Define a helper method to check if there is a trend reversal and update the PSAR value accordingly
  def _trendReversal(self, psar, high, low):
    # Initialize a flag for reversal
    reversal = False
    # If the trend is 1 (up) and the PSAR is greater than the low price, set the trend to 0 (down), the PSAR to the maximum high price in the trend, the extreme point to the low price, and the reversal flag to True
    if self.trend == 1 and psar > low:
      self.trend = 0
      psar = max(self.high_price_trend)
      self.extreme_point = low
      reversal = True
    # If the trend is 0 (down) and the PSAR is less than the high price, set the trend to 1 (up), the PSAR to the minimum low price in the trend, the extreme point to the high price, and the reversal flag to True
    elif self.trend == 0 and psar < high:
      self.trend = 1
      psar = min(self.low_price_trend)
      self.extreme_point = high
      reversal = True
    # If there is a reversal, reset the acceleration factor to the initial value and clear the high price trend and low price trend lists
    if reversal:
      self.af = self.init_af
      self.high_price_trend.clear()
      self.low_price_trend.clear()
    # Otherwise, update the acceleration factor and the extreme point based on the trend and the high and low prices
    else:
        # If the high price is greater than the extreme point and the trend is 1 (up), increase the acceleration factor by the step size up to the maximum value and set the extreme point to the high price
        if high > self.extreme_point and self.trend == 1:
          self.af = min(self.af + self.af_step, self.max_af)
          self.extreme_point = high
        # If the low price is less than the extreme point and the trend is 0 (down), increase the acceleration factor by the step size up to the maximum value and set the extreme point to the low price
        elif low < self.extreme_point and self.trend == 0:
          self.af = min(self.af + self.af_step, self.max_af)
          self.extreme_point = low
    # Return the PSAR value
    return psar

# Apply PSAR to the stocks
for x in stocks:
    # Create an instance of the PSAR class
    indic = PSAR()
    # Calculate the PSAR value for each row of the stock data using the calcPSAR method
    stocks[x]['PSAR'] = stocks[x].apply(
    lambda x: indic.calcPSAR(x['High'], x['Low']), axis=1)
    # Add supporting data from the PSAR class attributes
    stocks[x]['EP'] = indic.ep_list # The extreme point values
    stocks[x]['Trend'] = indic.trend_list # The trend values (1 for uptrend, 0 for downtrend)
    stocks[x]['AF'] = indic.af_list # The acceleration factor values

for ticker in stocks:
    # Get the dataframe for the current ticker
    stock = stocks[ticker]
    # Create a column for the signal (1 for buy, -1 for short, 0 for hold)
    stock["Signal"] = 0
    # Set the signal to 1 when the trend changes from -1 to 1 (buy signal)
    stock.loc[stock['Trend'].diff() == 1, 'Signal'] = 1 # Buy when trend changes from -1 to 1
    # Set the signal to -1 when the trend changes from 1 to -1 (short signal)
    stock.loc[stock['Trend'].diff() == -1, 'Signal'] = -1 # Short when trend changes from 1 to -1

I applied the PSAR (Parabolic Stop and Reverse) indicator to each stock using the above code. The following plot shows the stock closing price and PSAR values, which indicate the direction and strength of the price trend. I chose 252 days of AAPL (Apple) data to demonstrate how the PSAR works, because shorter data would be clearer and easier to show.

AAPL = stocks["AAPL"].copy()
AAPL_annual = AAPL[-252:]
AAPL_annual.set_index("Date", inplace=True)
AAPL_annual.tail()

6.1.2 Plot the PSAR

The PSAR plot is a technical indicator that shows the direction and strength of a price trend, as well as possible reversal points. It consists of a series of dots that are either above or below the price, depending on whether the trend is up or down. The dots move closer to the price as the trend accelerates, and further away as the trend decelerates. The PSAR plot can help traders identify entry and exit points for their trades, as well as stop-loss levels.

AAPL_annual["Adj Close"].plot(label="Apple Close Price", figsize=(12,6), xlabel="Date")
AAPL_annual["PSAR"].plot(label="PSAR", figsize=(12,6))
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Apple PSAR")
plt.legend()

The plot helps us analyse the stock trends. The trend went down from December 2022 to January 2022, then went up from January 2022 to July 2023, then went down again until November 2023.

6.1.3 Trading with the Parabolic SAR

Now we can use PSAR to make a trading strategy, to buy when the PSAR shows an uptrend, and sell when it shows a downtrend. I will still use the AAPL(Apple) annual data to show the plot.

# Get the default color cycle from the matplotlib parameters
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

# Get the adjusted close prices of Apple stock when the signal is 1 (buy) from the AAPL_annual dataframe
buy_sigs = AAPL_annual.loc[AAPL_annual['Signal']==1]['Adj Close']

# Get the adjusted close prices of Apple stock when the signal is -1 (short) from the AAPL_annual dataframe
short_sigs = AAPL_annual.loc[AAPL_annual['Signal']==-1]['Adj Close']

# Get the parabolic SAR values when the trend is 1 (up) from the AAPL_annual dataframe
psar_bull = AAPL_annual.loc[AAPL_annual['Trend']==1]['PSAR']

# Get the parabolic SAR values when the trend is 0 (down) from the AAPL_annual dataframe
psar_bear = AAPL_annual.loc[AAPL_annual['Trend']==0]['PSAR']

# Create a new figure with a size of 12 by 6 inches
plt.figure(figsize=(12, 6))

# Plot the adjusted close prices of Apple stock
plt.plot(AAPL_annual['Adj Close'], label='Close', linewidth=1 )

# Scatter plot the buy signals
plt.scatter(buy_sigs.index, buy_sigs, color=colors[2], 
            label='Buy', marker='^', s=100)

# Scatter plot the short signals
plt.scatter(short_sigs.index, short_sigs, color=colors[4], 
            label='Short', marker='v', s=100)

# Scatter plot the parabolic SAR values for the up trend 
plt.scatter(psar_bull.index, psar_bull, marker=".", color=colors[2], alpha=0.2, label='Up Trend')

# Scatter plot the parabolic SAR values for the down trend 
plt.scatter(psar_bear.index, psar_bear, marker=".", color=colors[4], alpha=0.2, label='Down Trend')

# Set the x-axis ticks to be every 30th index of the AAPL_annual dataframe and rotate them by 45 degrees
plt.xticks(range(0, len(AAPL_annual.index), 30), rotation=45)

plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.title('Apple Price and Parabolic SAR')
plt.legend()

By ploting the PSAR which shown above, we can easy to identify down trend and up trend. And the up and down arrows mark the points where we can enter long or short positions respectively. Except for the first arrow, each arrow also signifies an exit point and a position change from long to short or the other way around.

6.1.4 Implement the strategy to stocks

# Get the column data for the signal column
stocks_signal = getColumnData(column_name="Signal")

# Calculate the returns for the chosen stocks based on the signal
PSAR_returns = stocks_return[choosed_stocks_ticker].multiply(stocks_signal[choosed_stocks_ticker])

# Calculate the portfolio weights for the chosen stocks
portfolio_weights = calculateWeight(choosed_stocks_ticker)

# Display the portfolio weights
portfolio_weights

# Calculate the weighted returns for the PSAR strategy
PSAR_weighted_Returns = PSAR_returns.mul(portfolio_weights, axis=1)

# Calculate the cumulative weighted returns for the PSAR strategy
PSAR_cumulate_weighted_return = ((1 + PSAR_weighted_Returns).cumprod() - 1)

# Calculate the portfolio value for the PSAR strategy
PSAR_portfolio = PSAR_cumulate_weighted_return.sum(axis=1) * 10000
PSAR_portfolio.tail()

Plot the stocks performance

After apply PSAR buy and short signal, the returns on the portfolio increases from \ $13,918.55 to \\$ 29,347.72. To compare the performance of the common weighted strategy and the PSAR strategy, let’s plot the returns of the portfolio directly.

# Plot the portfolio value and the PSAR portfolio value
## I made a mistake of using label names, please switch them.
plt.figure(figsize=(12,6))
portfolio.plot(label="PSAR Portfolio")
PSAR_portfolio.plot(label="Combining Portfolio")
plt.title("Stocks Performance")
plt.xlabel("Date")
plt.ylabel("Frequency")
plt.legend()

The plot above shows that the returns of the portfolio with the PSAR strategy have a smoother line, and perform better than the common weighted investment strategy.

7 Conclusion

7.1 Summary

This project introduces the basics of stock market investment analysis using Python, yfinance, and other libraries. It shows that market investing requires careful planning and research, as different stocks have different returns and risks. Among the selected stocks, Microsoft (MSFT), Google (GOOG), and Apple (AAPL) have had the highest returns in the last 5 years, with Apple leading the pack at 226%. However, they also have high volatility and risk.

On the other hand, BP p.l.c. (BP), Intel Corporation (INTC) and Pfizer Inc. (PFE) have had negative returns in the last 5 years, which means investing in them would result in losses.

Investing based on intuition can be dangerous and lead to high losses. Therefore, investors need various tools and statistical methods to evaluate the performance and risk of stocks, and to make informed decisions. In the last part, we use weighted portfolio investment and Parabolic SAR to optimize the investment strategy, and to show an ideal investing return.

7.2 Limitations

This project has many limitations, despite my efforts:

It only uses yfinance API and web scraping to collect the stock data from yahoo finance, which may not be reliable or accurate. The data may have errors or discrepancies, such as missing values, outliers, or incorrect adjustments.
It only covers five years of historical data with 14 company stocks, which is not representative of the entire stock market.
It only uses some basic and common techniques and tools to analyze the stock price data, which may not capture the full complexity and dynamics of the stock market. Other factors and variables may affect the stock prices, such as market sentiment, macroeconomic conditions, industry trends, company news, etc.
It cannot predict the future market trends, and it does not provide investment advice.

7.3 Future planning of project improvement

NLP and Machine learning are increasingly important for Quaint and financial analysis. NLP can be used to analyze company news, to show the market sentiments and future price predictions. Machine learning can be used to predict future trends and create investing models with historical data analysis. I did not include this part in the project, because I lack the knowledge of NLP and ML to do further analysis in a short time. In the future, I will do more research on this area and use it to enhance the analysis in the project, to make the result more accurate and valuable.

References

[1] Balasubramaniam, K. (2022, January 1). How to calculate a stock’s adjusted closing price. Investopedia. www.investopedia.com/ask/answers…

[2] Rate of Return Expert. (n.d.). Log return. Retrieved April 8, 2023, from www.rateofreturnexpert.com/log-return/

[3] “Calculating Cumulative Returns of a Stock with Python and Pandas.” All the Snippets, 10 Dec. 2020, www.allthesnippets.com/notes/finan….

[4] Kenton, W. (2021, June 29). Negative return. Investopedia. www.investopedia.com/terms/n/neg…

[5] Mauricio Garita (2021). Applied quantitative finance: Using Python for financial analysis. Springer. p. 109.

[6] Mauricio Garita (2021). Applied quantitative finance: Using Python for financial analysis. Springer. p. 111.

[7] WILL KENTON (June 30, 2022). Beta: Definition, Calculation, and Explanation for Investors www.investopedia.com/terms/b/bet…

[8] JAMES CHEN (September 26, 2023). Expected Return: Formula, How It Works, Limitations, Example www.investopedia.com/terms/e/exp…

[9] 10 Year Treasury Rate (I:10YTCMR) ycharts.com/indicators/…

[10] JAMES CHEN (December 03, 2021) Excess Returns Meaning, Risk, and Formulas www.investopedia.com/terms/e/exc…

[11] JASON FERNANDO (October 24, 2023). Sharpe Ratio: Definition, Formula, and Examples www.investopedia.com/terms/s/sha…

[12] Mauricio Garita (2021). Applied quantitative finance: Using Python for financial analysis. Springer. p. 201.

[13] Steve Milano, MSJ. (Oct 29, 2021) How to Calculate Portfolio Value www.sapling.com/5872650/cal…

[14] Cory Mitchell(March 17, 2022). Parabolic SAR Indicator: Definition, Formula, Trading Strategies www.investopedia.com/terms/p/par…

[15] Raposa (Jan. 24, 2022) The Complete Guide to Calculating the Parabolic SAR in Python Step-by-step examples and detailed explanations raposa.trade/blog/the-co…