Bootstrapping, originally proposed by Bradley Efron, is a statistic technique to approximate the sampling distribution of a parameter . The term bootstrap was coined from the phrase "to pick oneself up from his own bootstraps". Something seemingly impossible for a person, just like the bootstrap technique of obtaining more information from the sample. The prominent use of the Bootstrap rose when computing power and speed became faster as well as cheaper. The bootstrap (certain usages) often outperform other mathematical measures because it makes less assumptions such the pop. distribution, relevant parameters, etc. Furthermore, the bootstrap can approximate most measures whereas analytically deriving them may be mathematically challenging and requires more assumptions.
The bootstrap is conducted by first obtaining an underlying sample of size of the population. Assuming each observation is independent of each other (sampled randomly), number of bootstrap samples are taken of the size sample. The bootstrap sample consists of randomly sampling the sample with replacement. This is to ensure that the sampling is truly random and consistent with our sampling of the population as it is possible we are taking the same observation more than once when obtaining the original sample. The parameter is taken from each bootstrap sample and plotting it on a histogram would give us an ideal look at the sampling distribution of the parameter.
To apply this concept, I want to initially calculate my portfolio risk and then obtain a confidence level measure of this risk which can also be called the volatility of risk. I need to make two strong assumptions:
1. Risk can be measured using Variance/Standard Deviation of a portfolio
2. The sampling estimate is a good estimate to infer for the population
I take two stocks that we're all familiar with: Comcast and P&G. Using daily log returns in a two year time frame since today, I can first estimate my portfolio risk using the variance of this system of equations. Let equal the proportion of my portfolio allocated to Comcast () and equal to the proportion to P&G(). I will use equal weighting for both stocks
The portfolio risk () is then:
where is the covariance of asset Comcast and P&G. Using our daily log return data, we can calculate this using the matrix notation () where sigma is the covariance matrix of the two stocks
weight = [0.5;0.5]; % row vector var_p = weight' * cov(cc,pg) * weight;
This yields a sample portfolio variance of ~0.7155% or majority of returns are within 0.84% of the mean each day . Using the bootstrap, we will resample 5000 bootstrap samples:
1 2 3 4 5
for i=1:5000 randpos = round(rand(size(cc))*(size(cc,1)-1)+1); cc_b(i,:) = cc(randpos)'; pg_b(i,:) = pg(randpos)'; % Same positions for both to retain correct covariance end
Then we calculate the portfolio variance of each bootstrap sample (there might be a matrix notation for this but I don't know it):
1 2 3
for i=1:size(cc_b,1) %since both bootstrap matrices are same size, we just select 1 var_p_b(i) = weight'*cov(cc_b(i,:),pg_b(i,:))*weight; end
var_p_b is the portfolio variance bootstrap sample vector of the size 5000 x 1. Below is a histogram of this sample:
Having the sampling distribution we can now estimate our population standard error using our bootstrapped samples. This can be calculated by:
Respectively in matlab:
var_se_p = sqrt(1/5000*sum((var_p_b-var_p).^2));
The final measure we get for standard error is a very small number, applying this to a confidence interval with the assumption of bootstrap sample normality, we can estimate the population portfolio risk (variance) with 95% confidence level is between 0.6058% and 0.8252%.
Bootstrapping can be a useful methodology when assessing the volatility of certain measures. While it only produces an estimate of the population, it can help in creating a general idea of its variability as well as be used in criterion selection. I will certainly be using more of the bootstrap method to measure bias, standard error, quantiles, etc in the future. It is very robust, non-parametric method that can be applied to many statistical parameters under the correct conditions.