Multi-Factor Alpha Competition

During the second semester of the 2019-2020 academic year, while some team members worked on developing necessary infrastructure to prepare for our first portfolio launch, the junior quantitative analysts participated in a factor combining competition. Designed by the graduate research leads, this project aimed to help the analysts learn about the power of combining factors and acted as a playground for them to research and test their own multi-factor strategies. The analysts used several resources to aid in their strategy’s development including a python file to develop plots of returns for nearly a hundred known factors, a second python file that could run a Fama MacBeth regression as well as provide important statistics about the correlation between chosen factors, and an R file that could be used to backtest a strategy on out of sample data. The goal of the competition was to choose any number of factors from the given pool and assign a weighting scheme to these factors in an attempt to obtain a high Sharpe ratio in both the first (2009-2013) and second (2014-2019) backtesting periods. 70% of the overall score was based on the Sharpe ratio from period two, while only 30% was based on the Sharpe ratio from period one. To prevent overtesting, participants were only allowed to see the results from period one before submitting.

As one of the analysts, I developed a strategy using a data first approach that resulted in a 2.8 Sharpe for the first period and 1.14 for the second period (1.75 overall). While I tried to use economic theory to refine my strategy, the initial stages relied primarily on statistical reasoning. To start off, I chose twenty factors by examining the performance charts of each individual factor.

Example of graph generated for all factors.

For example, factors that had charts like the one pictured above appeared to be promising because they had high t-statistics, relatively low p-values, and a strong upward or downward trend. I saw these characteristics as a sign of reliable returns over time, because a high t-statistic indicates that the returns are statistically significantly different from the average. After picking what I believed to be the twenty most promising factors based on these in-sample returns plots, I checked for correlations and ran a Fama MacBeth regression. This generated a table of Fama MacBeth t-statistics. From this table I filtered out any factors with particularly low t-statistics (<|0.5|) and was left with thirteen remaining factors. Since the Fama MacBeth regression is done through a series of cross sectional regressions, I didn’t want to choose factors with low Fama MacBeth t-statistics because that could mean the factor is unreliable due to an unsteady coefficient over time. I once again ran the Fama MacBeth regression and from the table pictured below, I determined a preliminary weighting scheme.

Fama MacBeth t-statistics of factors used.

I thought that using the Fama MacBeth t-statistics as weights would be a good starting point because in general a better t-statistic would assert that the factor had a more consistent coefficient over time. Using these t-statistics as weights, I was able to generate a Sharpe of 1.19 for the first backtest period. By looking at the Carhart 4 factor regression table generated by the backtest, I noticed that both “high minus low” and “up minus down” had high significance codes (***). This let me know that most of my returns were being explained by two already well known factors. I then tried to revise my weighting scheme so that the coefficient of the intercept would have a higher significance code and the other factors would have no significance codes. Since the coefficient of the intercept is determined by the returns that are unexplained by any of the other factors, a high significance code on the intercept would mean that the returns were mostly explained by my own unique strategy.

Carhart 4-Factor regression summary for strategy

In an attempt to lower the “up minus down” and “high minus low” coefficients, I made a few changes. One change I made was flipping the sign of ‘ptb’, the price to book ratio. Since ‘hml’ favors value stocks over growth stocks, I thought that by favoring stocks with higher price to book ratios, this effect could be counteracted. A higher price to book ratio could indicate a growth stock which is the opposite of what ‘hml’ favors. As a result of getting rid of the significance codes on certain factors, unintended effects such as other factors becoming more significant occured. Following a similar pattern as before, I used economic reasoning to highlight factors that I could increase or decrease the weight of to counteract whichever Carhart factor still had a significance code.

Carhart 4-Factor regression summary after changes

I didn’t want to make too many adjustments in order to avoid overfitting, so I stopped when I reached a Sharpe of 2.8 and no significance codes remained on the four Carhart factors. Shown below is my final weighting dictionary with my selected factors, the returns plot and Carhart 4 factor regression from the first testing period (2009-2013), and the returns plot for the entire ten year backtesting window (2009-2019). In the future, I would not hesitate to add even more factors into this model, as it seems like the larger number of factors in comparison to other strategies helped to smooth out the returns plot. I would also further develop my economic reasoning in regards to which factors best compliment each other and adjust the weightings accordingly. Finally, I would try to find out if this strategy would be sustainable for future use, because some of the factors could have become outdated.

Full performance summary of my strategy, including in and out of sample performance.

Alpha Research Workflow

Setting Up Our Database