Generalized AutoRegressive Conditional Heteroscedasticity

05 July 2022
examples

What is a Generalized AutoRegressive Conditional Heteroscedasticity (GARCH) process?

That’s a long fancy name for something that’s actually relatively simple. The overall context is time series analysis. Let’s sort out the terminology…

Scedasticity

Scedasticity (or skedasticity) is the distribution of error terms (of some random variable). If the variance (and standard deviation) of the error term distribution is constant, we have homoscedasticity. If it’s varying with some kind of pattern we instead have heteroscedasticity. This is what GARCH models are used for – to model time series (financial markets) with turbulent periods of higher volatility than otherwise.

Conditional

In statistics conditional means that you already know something. Conditional probability is the probability of an event occurring, given that another event has already occurred. The concept of conditional probability is primarily related to Bayes’ theorem.

Stationary

Stationary simply means that it doesn’t move or change. A stationary process always has the same mean/expected value. A series of stock prices or index values is not stationary. To use a GARCH model it has to be transformed. Instead of studying the actual values, look at the increments and make those independent of (relative to) the current value (look at the log differences). That’s done with essentially any/all financial time series analysis. GARCH models are used on stationary processes. What changes is the magnitude of the temporary/local errors. It is because we have observed some previous values/errors (conditionality) we can calculate the current expected magnitude of the errors.

AutoRegressive

An autoregressive (regressive on itself) model is when a value from a time series can be expressed as a function of previous values.

AutoRegressive Conditional Heteroscedasticity

An AutoRegressive Conditional Heteroskedasticity (ARCH) model is a statistical model for the variance of a time series, that describe the variance of the current/next error term as a function of the actual (previous) error terms.

GARCH – Generalized AutoRegressive Conditional Heteroscedasticity

The ARCH model refers to a specific formula:

${isplaystyle igma _{t}^{2}=mega +lpha _{1}psilon _{t-1}^{2}+dots +lpha _{q}psilon _{t-q}^{2}+eta _{1}igma _{t-1}^{2}+dots +eta _{p}igma _{t-p}^{2}=mega +um _{i=1}^{q}lpha _{i}psilon _{t-i}^{2}+um _{i=1}^{p}eta _{i}igma _{t-i}^{2}}$

https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity

The term “AutoRegressive Conditional Heteroscedasticity” could refer to something much more general than that, but ARCH or ARCH(q) refers to that formula and nothing else. Generalized ARCH, GARCH or GARCH(p, q) also refers to a specific formula:

https://en.wikipedia.org/wiki/Autoregressive_conditional_heteroskedasticity

In fact it’s not that much of a generalisation. Instead of just a weighted sum of past squared errors, there is also a weighted sum of past variances. Since variance is estimated as the average squared error there is an obvious connection between the ARCH and GARCH formulas. In fact an ARCH(∞) model is equivalent to GARCH(1, 1).

GARCH(2,3) model of the N225 index (57 years weekly data)

Example Code

TimeSeriesGARCH.java

import java.io.File;
import java.io.IOException;
import java.time.LocalDate;

import org.ojalgo.OjAlgoUtils;
import org.ojalgo.data.domain.finance.series.DataSource;
import org.ojalgo.data.domain.finance.series.FinanceDataReader;
import org.ojalgo.data.domain.finance.series.YahooParser;
import org.ojalgo.data.domain.finance.series.YahooParser.Data;
import org.ojalgo.matrix.store.R032Store;
import org.ojalgo.netio.ASCII;
import org.ojalgo.netio.BasicLogger;
import org.ojalgo.netio.TextLineWriter;
import org.ojalgo.netio.TextLineWriter.CSVLineBuilder;
import org.ojalgo.random.SampleSet;
import org.ojalgo.random.process.RandomProcess.SimulationResults;
import org.ojalgo.random.process.StationaryNormalProcess;
import org.ojalgo.random.scedasticity.GARCH;
import org.ojalgo.series.BasicSeries;
import org.ojalgo.series.primitive.CoordinatedSet;
import org.ojalgo.series.primitive.PrimitiveSeries;
import org.ojalgo.type.CalendarDateUnit;
import org.ojalgo.type.PrimitiveNumber;
import org.ojalgo.type.StandardType;

/**
 * Example use of GARCH models, and in addition some general time series handling.
 *
 * @see https://www.ojalgo.org/2022/07/generalized-autoregressive-conditional-heteroscedasticity/
 */
public abstract class TimeSeriesGARCH {

    /**
     * A file containing historical data for the Nikkei 225 index, downloaded from Yahoo Finance.
     *
     * @see https://finance.yahoo.com/quote/^N225/history
     */
    static final File N225 = new File("/Users/apete/Developer/data/finance/^N225.csv");
    /**
     * Where to write output data for the chart.
     */
    static final File OUTPUT = new File("/Users/apete/Developer/data/finance/output.csv");
    /**
     * A file containing historical data for the Russell 2000 index, downloaded from Yahoo Finance.
     *
     * @see https://finance.yahoo.com/quote/^RUT/history
     */
    static final File RUT = new File("/Users/apete/Developer/data/finance/^RUT.csv");

    public static void main(final String... args) {

        BasicLogger.debug();
        BasicLogger.debug(TimeSeriesGARCH.class);
        BasicLogger.debug(OjAlgoUtils.getTitle());
        BasicLogger.debug(OjAlgoUtils.getDate());
        BasicLogger.debug();

        FinanceDataReader<Data> readerN225 = FinanceDataReader.of(N225, YahooParser.INSTANCE);
        BasicSeries<LocalDate, PrimitiveNumber> seriesN225 = readerN225.getPriceSeries();
        BasicLogger.debug("N225: {}", seriesN225.toString());

        FinanceDataReader<Data> readerRUT = FinanceDataReader.of(RUT, YahooParser.INSTANCE);
        BasicSeries<LocalDate, PrimitiveNumber> seriesRUT = readerRUT.getPriceSeries();
        BasicLogger.debug("RUT: {}", seriesRUT.toString());

        BasicLogger.debug();

        /*
         * If you want to calculate something like correlations between a set of series they need to be
         * coordinated – the same start, finish and sampling frequency, as well as no missing values.
         */

        CoordinatedSet<LocalDate> coordinatedDaily = DataSource.coordinated(CalendarDateUnit.DAY).add(readerN225).add(readerRUT).get();
        BasicLogger.debug("Coordinated daily   : {}", coordinatedDaily);

        /*
         * We have 1 value per day, but also know that we're missing data for weekends, holidays and such
         * (which need to be filled-in). Could be a good idea to instead coordinate weekly data.
         */

        CoordinatedSet<LocalDate> coordinatedWeekly = DataSource.coordinated(CalendarDateUnit.WEEK).add(readerN225).add(readerRUT).get();
        BasicLogger.debug("Coordinated weekly  : {}", coordinatedWeekly);

        /*
         * Note that we don't feed the coordinator series, but something that can provide series. In this case
         * it's file readers/parsers, but it could just as well be something that calls web services, queries
         * a database or whatever. A coordinated data source will populate itself (lazily) on request, and
         * then clean up / reset itself using a timer.
         */

        CoordinatedSet<LocalDate> coordinatedMonthly = DataSource.coordinated(CalendarDateUnit.MONTH).add(readerN225).add(readerRUT).get();
        BasicLogger.debug("Coordinated monthly : {}", coordinatedMonthly);

        CoordinatedSet<LocalDate> coordinatedAnnually = DataSource.coordinated(CalendarDateUnit.YEAR).add(readerN225).add(readerRUT).get();
        BasicLogger.debug("Coordinated annually: {}", coordinatedAnnually);

        BasicLogger.debug();

        /*
         * Once we have a CoordinatedSet we can easily calculate the correlation coefficients.
         */

        R032Store correlations = coordinatedWeekly.log().differences().getCorrelations(R032Store.FACTORY);
        BasicLogger.debugMatrix("Correlations", correlations);

        /*
         * That was a bit of a detour. Now let's focus on just one series, and the volatility of that index.
         */

        PrimitiveSeries logDifferences = seriesN225.resample(DataSource.FRIDAY_OF_WEEK).asPrimitive().log().differences();
        SampleSet logDifferenceStatistics = SampleSet.wrap(logDifferences);
        BasicLogger.debug("Log difference statistics: {}", logDifferenceStatistics.toString());

        PrimitiveSeries errorTerms = logDifferences.subtract(logDifferenceStatistics.getMean());
        SampleSet errorTermStatistics = SampleSet.wrap(errorTerms);
        BasicLogger.debug("Error term statistics: {}", errorTermStatistics.toString());

        /*
         * If we assume homoscedasticity then the (weekly) variance of the N225 time series is simply the
         * variance of the error term sample set. Note also that the variance of the log differences series
         * and the error term differences are the same.
         */

        BasicLogger.debug();
        BasicLogger.debug("Variance");
        BasicLogger.debug(1, "of log differences: {}", logDifferenceStatistics.getVariance());
        BasicLogger.debug(1, "of error terms    : {}", errorTermStatistics.getVariance());

        /*
         * Any/all financial markets experience periods of turbulence with higher volatility than otherwise.
         * To model that we cannot simply have 1 fixed number to describe the volatility of a time series
         * spanning almost 60 years. We need something else.
         */

        GARCH model = GARCH.estimate(errorTerms, 2, 3);
        /*
         * GARCH model with 2 lagged variance values and 3 squared error terms
         */

        try (TextLineWriter writer = TextLineWriter.of(OUTPUT)) {

            // CSV line builder - to help write CSV files
            CSVLineBuilder lineBuilder = writer.newCSVLineBuilder(ASCII.HT);

            lineBuilder.append("Day").append("Error Term").append("Standard Deviation").write();

            /*
             * Looping through the error term series. For each item update the GARCH model and write the error
             * term and the estimated standard deviation to file. The contents of that file is then used to
             * generate the chart in the blog post.
             */

            for (int i = 0; i < errorTerms.size(); i++) {

                double standardDeviation = model.getStandardDeviation(); // estimated std dev
                double value = errorTerms.value(i); // actual value/error (the mean is 0.0)

                lineBuilder.append(i).append(value).append(standardDeviation).write();

                model.update(value); // update model with actual value
            }

        } catch (IOException cause) {
            throw new RuntimeException(cause);
        }

        /*
         * Presumably the main benefit of using GARCH models is to get a better estimate of current (near
         * future) volatility.
         */

        /*
         * We used weekly values. To annualize volatility it needs to be scaled.
         */
        double annualizer = Math.sqrt(52);

        BasicLogger.debug();
        BasicLogger.debug("Volatility (annualized)");
        BasicLogger.debug(1, "constant (homoscedasticity): {}", StandardType.PERCENT.format(annualizer * errorTermStatistics.getStandardDeviation()));
        BasicLogger.debug(1, "of GARCH model (current/latest value): {}", StandardType.PERCENT.format(annualizer * model.getStandardDeviation()));

        /*
         * We can also use the GARCH model to simulate future scenarios.
         */

        int numberOfScenarios = 100;
        int numberOfProcessSteps = 52;
        double processStepSize = 1.0;
        /*
         * This will simulate 100 scenarios, stepping/incrementing the process 52 times, 1 week at the time.
         */

        SimulationResults simulationResults = StationaryNormalProcess.of(model).simulate(numberOfScenarios, numberOfProcessSteps, processStepSize);

    }

}

Console Output

class TimeSeriesGARCH
ojAlgo
2022-07-05

N225: First:1965-01-05=1965-01-05: 1257.719971 Last:2022-07-05=2022-07-05: 26423.470703 Size:14141
RUT: First:1987-09-10=1987-09-10: 168.970001 Last:2022-07-01=2022-07-01: 1727.76001 Size:8773

Coordinated daily   : FirstKey=1987-09-10, LastKey=2022-07-01, NumberOfSeries=2, NumberOfSeriesEntries=9046
Coordinated weekly  : FirstKey=1987-09-11, LastKey=2022-07-01, NumberOfSeries=2, NumberOfSeriesEntries=1817
Coordinated monthly : FirstKey=1987-09-30, LastKey=2022-07-31, NumberOfSeries=2, NumberOfSeriesEntries=419
Coordinated annually: FirstKey=1987-12-31, LastKey=2022-12-31, NumberOfSeries=2, NumberOfSeriesEntries=36

Correlations
        1 0.506761
 0.506761        1
Log difference statistics: Sample set Size=2997, Mean=0.0010084690273360743, Var=7.006664819240809E-4, StdDev=0.026470105438476835, Min=-0.27884405377306365, Max=0.15817096723925594
Error term statistics: Sample set Size=2997, Mean=1.2822020166779524E-17, Var=7.006664819240812E-4, StdDev=0.026470105438476842, Min=-0.27985252280039974, Max=0.15716249821191988

Variance
	of log differences: 7.006664819240809E-4
	of error terms    : 7.006664819240812E-4

Volatility (annualized)
	constant (homoscedasticity): 19.09%
	of GARCH model (current/latest value): 21.94%