A Quick Comparison Plot using LSE Share Prices Sourced from Quandl

If you have read my last blog post you will know that I have now ditched Yahoo Finance as my information source for UK shares because of the data quality problems. Instead I’m using Quandl. The Quandl website has a nice plotting facility but unfortunately only for a single share, making comparisons more difficult. This short R script allows you to obtain data from quandl (for as many securities as you want) and then do a quick comparison plot. You can change the frequency of plotting and the start date. Because it is specifically for comparison, the script discards rows of data where information is missing. Here is the script for ISF (ISH CORE FTSE100 ETF) and CUKX (ISH FTSE100 ACC ETF). If you ever needed an example of the power of reinvesting dividends, this has to be a good one!

library(Quandl)
library(PerformanceAnalytics)

tickers <- c("ISF", "CUKX")  ## put as many securities as you want here
mydata <- Quandl(paste0("LSE/", tickers, ".1"), type = "xts", api_key = "XXXXXX")
mydata <- na.omit(mydata)
colnames(mydata) <- tickers

startdate <- "2010-01-01/"  
mydataused <- mydata[startdate]
indx <- xts(apply(mydataused, 2, function(x) x/x[1]), order.by = index(mydataused) )

indx2plt <- indx[endpoints(indx, on="months"),]    # "days", “weeks”, “months”, “quarters” or “years”

chart.TimeSeries(indx2plt, main  =  "Cumulative Returns", legend.loc = "topleft",
                 minor.ticks = F, lwd = 2 )


 

Image

Time to ditch Yahoo Finance as the go-to source for stock price data?

I am a UK based, private investor. I like to do my own analysis of potential stock purchases and R is my tool of choice for much of this. One of the biggest problems as a private individual is obtaining high quality data – I just can’t justify the cost of a commercial data supplier. The normal approach is to use Yahoo Finance to download the data you need and this can easily be automated into R – either manually or using the quantmod package. However I’m increasingly frustrated by Yahoo finance. In fairness I’m only concerned with shares and ETFs that trade on the London Stock Exchange and it’s entirely possible that Yahoo doesn’t have these problems for US quoted shares. My main concerns are as follows.

1. Data Quality

I’ve written about this before. Here is another example, this time for iShares Core FTSE 100 Dist (ISF.L). I’m assuming the currency keeps changing between  GBX, GBP and either USD or EUR.
Image

2. Dividends

Dividends quoted on Yahoo are sometimes in the wrong currency. (e.g VUSA).

3. Adjusted Close Prices

This is a bit hit and miss. As you can see from the screen shot above the price has been adjusted. Frequently there is no adjustment and no dividends listed when they have been paid.

4. Questions about the long term future of the service

People are frequently predicting the imminent demise of the service. This may or may not be true, but doesn’t fill me with confidence given the other more serious problems listed.

The solution

Enter Quandl and specifically the data provided by the London Stock exchange. R has a specific package (Quandl) which makes downloading data incredibly easy. You can download data for a number of different securities in one go, select what data (columns and date ranges) you actually want and download directly as an xts object. You will need to register to obtain a free api key if you are going to make more than 50 calls per day on the api. (Replace your XXXXXX in my code with your key). The documentation is here and the cran page is here.
library(Quandl)
## Download all available data for ticker ISF
mydata <- Quandl("LSE/ISF", type = "xts", api_key= XXXXXXX)
tail(mydata)

                 Price   High    Low  Volume Last Close Change  Var%
2017-06-19 743.8 744.80 740.45 1788670      743.8    5.9  0.80
2017-06-20 738.4 761.45 738.20 2543313      738.4   -5.4 -0.73
2017-06-21 735.7 742.45 732.80 3275787      735.7   -2.7 -0.37
2017-06-22 734.9 735.90 730.95 2692453      734.9   -0.8 -0.11
2017-06-23 733.0 744.80 685.95 4659435      733.0   -1.9 -0.26
2017-06-26 735.2 738.85 735.10 2919664      735.2    2.2  0.30

## Download the first column (close) for ISF and VMID from 2016-12-31 until the end of the data
mydata2 <- Quandl(c("LSE/ISF.1", "LSE/VMID.1"), type = "xts", start_date="2016-12-31", 
                api_key= XXXXXXX)
head(mydata2)

                 LSE.ISF - Price LSE.VMID - Price
2017-01-03           708.9            29.17
2017-01-04           709.3            29.19
2017-01-05           710.0            29.46
2017-01-06           711.3            29.50
2017-01-09           714.0            29.61
2017-01-10           717.4            29.66

Obtaining the meta data which goes with each data set is a little more tricky as I can’t get the metaData function within the Quandl package to work for these data sets. The approach is therefore to download the xml file that contains the meta data, convert that into a data frame within R and then do some further processing to get the data I actually want.
library(XML)
library(RCurl)
xml.url <- "https://www.quandl.com/api/v3/datasets/LSE/ISF/metadata.xml?api_key=XXXXXXX"
xmldata <- getURL(xml.url)
xmldf <- xmlToDataFrame(xmldata, stringsAsFactors = F)
stockinfo <- xmldf[, c(2, 4, 8, 7)]
colnames(stockinfo) <- c("Code", "Name", "From", "To")
stockinfo$Currency <- substr(stockinfo$Name, nchar(stockinfo$Name)-2, nchar(stockinfo$Name))
stockinfo$Name <-  substr(stockinfo$Name,1,nchar(stockinfo$Name)-14)
stockinfo

Code  Name                               From          To              Currency
ISF   ISH COREFTSE100 ETF price (ISF)    2006-03-16    2017-06-26      GBX

Issues with Quandl Data

The solution is not perfect
  • You do not get full OHLC data if that is important to you.  However there is high, low, close and volume so you could use the previous close as open
  • There is no index data. Because data for total return indexes are really difficult to find I have always used accumulating ETFs if possible (e.g. CUKX for FTSE100). Personally if I’m using an index for comparison purposes then I would prefer to use something I could actually invest in ie an ETF, rather than a theoretical uninvestible index.
  • No dividends or adjusted close – more on this in a later blog post

Saving Reports Generated in R to Evernote

Evernote is my digital filing cabinet. I keep lots of information in it because it is backed up, easy to find and with me wherever I am on all my devices. It is not surprising therefore that I frequently want to store analysis that I carry out in R as an Evernote note.
To produce reports I prefer to use the spin function in Knitr. I love spin because you don’t need a separate markdown file. You can still run the entire script if you want to, because all the markdown formatting instructions just look like comments to regular R. More information on the advantages of spin vs regular markdown here.
Evernote (at least on Windows, I don’t know about other systems) has a wonderful device called an import folder. Any files added to this folder are automatically added to Evernote. In most cases the file is simply attached to a new note. However in the case of html, txt and rtf files, the contents of the file is added to the body of a new note and the file is not attached.  Initially I thought that saving my html document produced by spin to the Evernote import folder would be all I would need to do. Sadly it wasn’t straightforward because it seems that the import folder doesn’t play nicely with local html files –  the more advanced formatting produced by spin is destroyed and images are sometimes not included.
After some experimentation my preferred option (option 6 in the table below) is as follows:
1. Save the output of spin as an html file to Dropbox.
2. Go to Dropbox on the web and open the document (ctrl clicking opens it in a new window without Dropbox clutter). It looks like a regular html web page.
3. Use the Evernote webclipper to save the file to Evernote
Although this is not ideal because there are still some manual steps, it is the only way I could find of achieving goals 1 and 3 in the table below and being able to at least semi-automate the process.
The very short script required to produce the file and take you to Dropbox is as follows, where the R script file referred to (spinexample.R) reproduced below is a properly formatted spin script see here for details . You will need to wait for a few seconds for Dropbox to upload the file to the web before you can see it.
Script that produces report:
library(knitr)
library(rmarkdown)
opts_chunk$set(warning=FALSE,  results = "markup",
              out.width = "100%", echo = TRUE)
setwd("C:/Users/Mike/Documents/R/spin")
render("spinexample.R",
              output_dir = "C:/Users/Mike/Dropbox/R2EN")
browseURL("https://www.dropbox.com/home/R2EN")

The report script (spinexample.R)

#' ---
#' title: "Spin to Evernote Example"
#' author: "Mike"
#' ---
#'
set.seed(1678)
x <- rnorm(1000)
#' Here is the mean. **It's a very important number.**
mean(x)
#' Here is a pretty histogram
#+ echo=FALSE
hist(x)
#' This is the summary of the random numbers
#'
summary(x)
#' Here is a sum
43+27

For reference here is a brief summary of the different approaches I tried. Note that none of these have a full line of crosses. An Evernote note showing the results for options 3, 4 and 6 can be accessed here

Description
Goal 1 – Advanced formatting (highlighted code etc)
Goal 2 – Completely Automatic
Goal 3 – Content forms the body of the note
1
Manually copying and pasting output into Evernote
X
2
Use spin to save an html file to the Evernote import folder
X
X
html must not be self contained (otherwise it leaves out any images)
3
Use spin to save a Word document or pdf document to the Evernote import folder
X
X
4
Use spin to save a rtf document to the Evernote import folder
X
X
Because the the render to rtf doesn’t support advanced formatting anyway you actually see in EN what you get in the rtf document
5
Render as html. Copy and paste content into a gmail. Either webclip or send to Evernote
X
X
6
Preferred option
Use spin to save an html file to Dropbox and then use the webclipper
X
X

Updated Code for The Effect of a Cash Buffer on Returns

After some useful comments about the previous post (see comments section in that post), here is the updated R code.

mysummary <- function(x){
  s <- as.numeric(summary(x))
  s[7] <- sd(x)
  s[8] <- sum(x <= 0)
  names(s) <-  c("Min." ,   "1st Qu.", "Median" , "Mean"  ,  "3rd Qu.", "Max."  , 
                 "s.d." ,"No. of fails" )
  round(s, digits = 0)
}

years <- seq(1900, 2015) 
ret <- c(0.115,-0.035,0.035,0.018,0.123,0.038,0.186,-0.089,0.019,0.109,0.021,-0.015,0,-0.034,-0.005,
         -0.17,-0.173,-0.096,0.05,0.074,-0.34,0.34,0.393,0.028,0.176,0.173,0.055,0.148,0.173,-0.141,
         0.031,-0.162,0.383,0.248,0.13,0.116,0.159,-0.177,-0.079,-0.079,-0.153,0.192,0.184,0.121,0.113,
         0.048,0.173,-0.053,-0.083,-0.089,0.074,-0.031,-0.061,0.229,0.429,0.048,-0.117,-0.055,0.452,
         0.548,-0.001,-0.025,-0.022,0.177,-0.098,0.066,-0.074,0.311,0.398,-0.159,-0.105,0.344,0.081,
         -0.35,-0.581,0.996,-0.111,0.325,0.002,-0.049,0.171,0.013,0.219,0.223,0.258,0.137,0.227,0.048,
         0.044,0.258,-0.174,0.157,0.168,0.251,-0.086,0.192,0.131,0.193,0.106,0.217,-0.086,-0.138,-0.245,
         0.169,0.088,0.189,0.114,0.01,-0.304,0.259,0.089,-0.078,0.087, 0.174, -0.004, -0.001)

muret <- mean(ret)
sdret <- sd(ret)



## These are the variables to change

I0 <- 600 ## starting capital
nyrs <- 30 ## Number of years that money needs to last
spend <- 20 ## Annual expenditure
by <- c(0:5) ## buffer as years of expenditure

##########

CBfin <- NULL
Ifin <- NULL
smryI <- NULL
smryCB <- NULL

## ret <- 0.04 For testing only!!

for(k in 1:length(by)){
  CB0 <- by[k]*spend
  I0 <- I0 - CB0  ## Model now has constant total wealth  
  
  for(j in 1:10000){
    I <- I0
    CB <- CB0
    
    for(i in 1:nyrs){
      
      
      r <- sample(ret, 1)
      
      
      ## r <- rnorm(1,muret,sdret) ## Alternative way of estimating future return
      
      ## First section recalculates capital given earnings and spending
      
      e <- r*I  ## e = earnings
      
      if((I+e)>=(I0+spend)){
        I <- I+e-spend  ### If there is sufficient earnings to cover expenditure use that
      }else{
        
        if(CB>=spend){
          CB <- CB-spend  ### If there isn't use the cash buffer
          I <- I+e        ### This is what I forgot. Even though you haven't touched the capital it still needs to change
        }else{
          I <- I + e + CB - spend  ## if all else fails you eat into the capital and exhaust your cash buffer. I'd forgotten to include e in this.
          CB <- 0
        }
        
        
      }
      
      ## Second section recalculates cash buffer
      
      if((CB < CB0) & (I > I0)){    ### Not enough cash in the buffer but excess capital
        CB <- CB+I-I0               ### increase cash buffer by the growth in the capital
        I <- I0                     ### Set capital back to initial figure
        
        if(CB >= CB0){
          CB <- CB0                 ### If there is now too much cash in the buffer reinvest the excess
          I <- I + CB - CB0
        }
      }
      
      if(I<=0){
        I <- 0
        break 
      }
    }
    
    
    CBfin[j] <- CB
    Ifin[j] <- I
    
  }
  
  
  
  smryI[[k]] <- mysummary(Ifin)
  smryCB[[k]] <- mysummary(CBfin)
  
  
}

completesmryI <- do.call("rbind", smryI)
rownames(completesmryI) <- paste(by, "Year(s)")
completesmryI

completesmryCB <- do.call("rbind", smryCB)
rownames(completesmryCB) <- paste(by, "Year(s)")
completesmryCB

 

The Effect of a Cash Buffer on Returns

Simon Saves over at MyRichFuture.com wrote a fantastic article entitled “Cash Buffer Will Save You In Retirement“. He argues that holding cash (he plans to hold three years worth of expenditure) in addition to investments, means that if the stock market falls you can use the cash buffer rather than being forced to eat into your capital. He illustrates this with an example.  I wanted to look in more detail at what effect holding cash might have and how much it would reduce the chance of running out of money.

Assumptions

My first assumption is that the annual returns in the future will be similar to the past. I have used the UK real returns (i.e. including reinvesting dividends and accounting for inflation) from the Barclay’s Equity Gilt Study. This has 116 years of returns from 1900 to 2015. Rather than just assuming that the returns form a normal distribution, I have sampled from the 116 known returns. This should give a better estimate of the “fat tails” so prevalent in stock market returns. I have not chosen 30 years of sequential returns (as many people like to do) in order to do a larger number of simulations. There is therefore a slightly greater chance of having a fabulous 30 years or an absolutely catastrophic 30 years. However the evidence for auto-correlation in annual returns in the stock market is very weak (i.e. a bad year doesn’t necessarily lead to a good year in the near future) and if each year’s return really is independent of others then that catastrophic 30 year period could be just round the corner!
I’ve assumed that 100% of the investments is in shares and none is in bonds. This might be considered reckless!
I have assumed that the cash buffer has a real return of zero. i.e. it keeps pace with inflation but nothing more.
I’ve assumed transaction costs, taxes etc are zero and that your actual return is the UK real return.

Features of the model

30 year period starting with £500k of capital (stock market investments) and spending £20k pa (4% withdrawal rate).
An additional cash buffer ranging from zero to  5 years of expenditure i.e. from nothing to £100k

Process

Each year
  • Work out the return for that year by randomly picking  a number from the 116 possible returns in the Barclay study.
  • Ideally this growth (and any retained excess from previous years) would be sufficient for our expenditure. Any excess is added to the capital for next year.
  • If there is not enough money in the the capital pot (i.e. it would go below £500k) use the cash buffer to keep it at £500k
  • If the cash buffer runs out, use capital (this will take it below £500k)
  • Pay back money into the cash buffer if the capital exceeds £500k up to the target amount in the buffer
  • Stop iteration if capital falls to zero (This is a failure!)
  • Do the whole process again for each of the 30 years using the new capital  and cash buffer amounts from the end of the previous year.
I then repeated the whole process 10,000 (yes ten thousand!) times to simulate 10,000 possible futures each based on sampling 30 times from the distribution. I also repeated the process for the different cash buffer amounts.

The results

Holding no cash buffer means that there is roughly a 10% chance of failure i.e. running out of money at some point in the  30 year period. Holding a three year cash buffer (£60k in our example) reduces this chance to approximately 1%. You could, of course, argue that I’m not comparing like with like, as in one case I’m starting with £500k and in the other case I effectively have £560k. Rerunning the simulation with a starting capital pot of £560k and no cash buffer gives a failure rate of about 8%. The full results are listed below. Note that the sums of money you could be left with at the end of 30 years are eye-watering. The problem is, that after 30 years the width of the distribution of possible final returns is so wide that the upside is huge.  This seems like a good argument for reducing both  returns and risk by introducing bonds or other less volatile assets.

 Capital remaining after 30 years. All values in £k

Cash Buffer Min. 1st Qu. Median Mean 3rd Qu. Max. s.d. No. of fails
0 Year(s) 0 758 1482 2608 3080 104300 3860 1034
1 Year(s) 0 948 1758 2912 3413 69700 3809 395
2 Year(s) 0 1002 1810 2975 3472 72290 3775 200
3 Year(s) 0 1026 1822 3051 3497 303800 4965 67
4 Year(s) 0 1046 1842 3007 3466 210600 4201 32
5 Year(s) 0 1039 1830 3081 3521 175200 4461 16

 

Finally this is the R code. Regular readers will know that I’m really only learning R and it’s quite possible the code could be made faster or that I’ve made some huge error. Comments welcome as normal.

mysummary <- function(x){
        s <- as.numeric(summary(x))
        s[7] <- sd(x)
        s[8] <- sum(x <= 0)
        names(s) <-  c("Min." ,   "1st Qu.", "Median" , "Mean"  ,  "3rd Qu.", "Max."  , 
                                          "s.d." ,"No. of fails" )
        round(s, digits = 0)
}

years <- seq(1900, 2015) 
ret <- c(0.115,-0.035,0.035,0.018,0.123,0.038,0.186,-0.089,0.019,0.109,0.021,-0.015,0,-0.034,-0.005,
         -0.17,-0.173,-0.096,0.05,0.074,-0.34,0.34,0.393,0.028,0.176,0.173,0.055,0.148,0.173,-0.141,
         0.031,-0.162,0.383,0.248,0.13,0.116,0.159,-0.177,-0.079,-0.079,-0.153,0.192,0.184,0.121,0.113,
         0.048,0.173,-0.053,-0.083,-0.089,0.074,-0.031,-0.061,0.229,0.429,0.048,-0.117,-0.055,0.452,
         0.548,-0.001,-0.025,-0.022,0.177,-0.098,0.066,-0.074,0.311,0.398,-0.159,-0.105,0.344,0.081,
         -0.35,-0.581,0.996,-0.111,0.325,0.002,-0.049,0.171,0.013,0.219,0.223,0.258,0.137,0.227,0.048,
         0.044,0.258,-0.174,0.157,0.168,0.251,-0.086,0.192,0.131,0.193,0.106,0.217,-0.086,-0.138,-0.245,
         0.169,0.088,0.189,0.114,0.01,-0.304,0.259,0.089,-0.078,0.087, 0.174, -0.004, -0.001)

muret <- mean(ret)
sdret <- sd(ret)



## These are the variables to change

I0 <- 500 ## starting capital
nyrs <- 30 ## Number of years that money needs to last
spend <- 20 ## Annual expenditure
by <- c(0:5) ## buffer as years of expenditure

##########

CBfin <- NULL
Ifin <- NULL
smryI <- NULL
smryCB <- NULL

## ret <- 0.04 For testing only!!

for(k in 1:length(by)){
CB0 <- by[k]*spend
                
for(j in 1:10000){
        I <- I0
        CB <- CB0
        
for(i in 1:nyrs){

       
r <- sample(ret, 1)


## r <- rnorm(1,muret,sdret) ## Alternative way of estimating future return


e <- r*I  ## e = earnings

if((I+e)>=(I0+spend)){
        I <- I+e-spend
        }else{
                
                if(CB>=spend){
                        CB <- CB-spend
                }else{
                        I <- I + CB - spend
                        CB <- 0
                }
        
        
}


if((CB < CB0) & (I > I0)){
        CB <- CB+I-I0
        if(CB >= CB0){
                CB <- CB0
                I <- I + CB - CB0
        }
}

if(I<=0){
        I <- 0
        break 
        }
}
     

CBfin[j] <- CB
Ifin[j] <- I

}



smryI[[k]] <- mysummary(Ifin)
smryCB[[k]] <- mysummary(CBfin)


}

completesmryI <- do.call("rbind", smryI)
rownames(completesmryI) <- paste(by, "Year(s)")
completesmryI

completesmryCB <- do.call("rbind", smryCB)
rownames(completesmryCB) <- paste(by, "Year(s)")
completesmryCB

 

R for Excel Users – Part 5 – How to Save the Results of Your Analysis

One of the things you need to adapt to, if you are used to Excel and start using R, is producing nice output. It’s fairly straightforward in Excel to produce pretty tables and graphs and place them in a coherent order with accompanying text or alternatively to dynamically link those tables and graphs to a word document. R, in this respect at least, is much more like a pocket calculator than a spreadsheet – it produces great results but you then have to do something with those results to present or record them.
The answer, of course, is the R package knitr. This is built in to RStudio and produces fantastic, publication quality, reports containing text, tables, figures and graphs pretty much out of the box. It is very easy to produce a markdown document that contains all these elements and there are lots of written and video tutorials available to tell you how to do it. If, like me, you often just want ti print the results and not the code, this is easy to configure as well.
The problem is that isn’t how I work! I, like most other people I expect, develop my ideas in a script. I add things, then tinker and tweak the script until I’m happy with it. Only then do I think about how I want to save or report the results of my analysis. If I used knitr in “beginner mode” I would then need to copy the script into a markdown document and work from there. But if I changed the script I would need to copy and paste the chunks again. The other problem I have is that pressing the knitr button doesn’t allow you to control the output file name or destination. I routinely run the same script with different data. Unless I remembered to manually rename my last analysis, knitr would simply overwrite it with no warnings.
It is possible to run the code in a markdown document without rendering the whole document, but it can get messy. For example If you use xtable to produce html tables you will get the html code for the table in the console not the actual table.

Solution 1

A quick and dirty way of producing an output document directly from your script in Rstudio is file -> knit (ctrl-shift-k). This produces an html, pdf or word document including all the code and the results of running the code with the graphs inline in the correct place. However, it doesn’t solve the overwrite problem. To get around this problem I have a separate script file containing the following code. It assumes that the script you want to render (“script.R”), is in the current working directory. It produces an html document with a name based on the current date and time and then displays it in your browser. This is quite sufficient for me in almost all cases.

library(rmarkdown )
filename <- paste("Test_" ,format (Sys.time(), format = "%Y%m%d_%H%M" ),".html" , sep="" )
render("script.R" ,"html_document" , output_file = filename )
browseURL(paste ('file:///', getwd(), "/",filename, sep=''))

 

Solution 2

If you want to include some nice markdown text in your document but don’t want to copy the R code from elsewhere, for the reasons described above, you can simply run sections of code contained in the script file from within the markdown document. All you need to do is firstly source the R script file and then put the section of the script you want to run after the in “`r in a markdown document and before any options you choose.

library(knitr )
 
```{r cache =FALSE, echo =F}
read_chunk("script.R" )
```
 
```{r Section12, echo =F}
```

The section is indicated by putting # —- Section12 —- in the script document.

I tend to put all the major data analysis in my main script document. At the bottom I put some code to then render the R markdown document. The R markdown document contains text, instructions on printing the tables and links to code in the script document to print graphs etc. Because I force knitr to work in the global environment (which it doesn’t by default), I don’t need to recalculate things that have already been calculated in the main script.
Here’s an example to see how it works. Both files should be in the same directory. Note that the style tags just add some css  information to improve the formatting of the table which is rendered using xtable.  Running the script file is all you need to do to produce the html output.

R script file (correlation2.R)

# ---- Section1 ----

library(rmarkdown)
# Calculate x

x <- rnorm(500, 0.001, 0.02)
xbar <- mean(x)
xsd <- sd(x)

# Calculate y which is inversely proportional to x

y <- 2*xbar-x + rnorm(500, sd = 0.02)
ybar <- mean(y)
ysd <- sd(y)

# ---- Section2 ----

xysummary <- matrix(data = c(xbar, xsd, ybar, ysd), 2, 2)
colnames(xysummary) <- c("x","y")
rownames(xysummary) <- c("Mean","sd")

# ---- Section3 ----

plot(x,y, main = "Correlation between y and x")

# ---- Section4 ----

cc <- cor(x,y)

# ---- Section5 ----

render("correlation2.Rmd","html_document", output_file  = "CorExpt.html", envir = .GlobalEnv)
browseURL(paste('file:///',getwd(),"/","CorExpt.html", sep='')) 

R markdown file (correlation2.Rmd)

---
output: html_document
---
<style type="text/css">

th{  

border:1px solid black;
padding:5px;
text-align: center;
}

td{

border:1px solid black;
padding:5px;
text-align: center;
}

table{
border-collapse:collapse;

}
 
</style>

```{r cache=FALSE, echo=F}
library(knitr)
read_chunk('correlation2.R')
```

# Correlation Experiment

First we are going to generate some normally distributed weekly returns (x) and then we're then going to calculate a second negatively correlated stock (y) from this data.

<br>

```{r echo=F, results='asis'}
library(xtable)
print(xtable(xysummary, digits = 4), type="html")
```
<br>
<br>

Let's now check the correlation by producing a scatter plot and calculating the correlation coefficient

```{r Section3, echo=F}
```

Correlation coefficient = `r cc` 

Final output (Screen print so that the styling is retained)

ScreenShot313

Getting round some Yahoo Finance problems using Google Finance and R

Yahoo Finance is probably the most widely used free source of historic share price data. Normally it works really well and the integration into R packages like quantmod makes it an easy to use and powerful solution. However, it is not without problems. Here are a couple of them and some solutions using Google Finance. I’m only going to talk about London traded shares and UK based mutual funds. I will be using the UK version of Google finance https://www.google.co.uk/finance throughout.

Obviously spurious data

This normally only happens for shares with small trading volumes or with ETFs and not for FTSE350 companies. Here are charts for the same ETF (Vanguard S&P 500, VUSA.L) from Yahoo Finance and Google Finance. Using quantmod you can easily change the source to use Google instead of Yahoo. The moral of the story is to always check the basic chart before attempting any more sophisticated analysis.

VUSA YahooVUSA Google

Missing data

It is often impossible to find mutual fund data or more exotic indicies on Yahoo Finance. Sometimes this data is available on Google and bringing up the page shows historic data but frequently with no download option. The reason, I understand, is to do with the licence that Google has with the data provider. In this case, calls to the API using quantmod or a manual solution seem to fail as well.
You could of course manually cut and paste the data into a spreadsheet. However as the data appears on several pages this becomes tedious if you want larger amounts of data. It is worth noting that although the drop down at the bottom of the page gives a maximum of 30 lines per page, manually typing in 200 in the appropriate place in the url gives you 200 lines.
It is very easy to set up R to do the data scraping for you. The script below works for me.
You will need to use the Google finance web page to find the stock code you are interested in. I’m going to use the FTSE100 minimum variance index (Google code is INDEXFTSE:UKXMV) in this example – an index I couldn’t find on Yahoo and where there is no option to download the data on Google. You will need to manually add symb, fname, startdate, and enddate to the script.  If there is less data available than you have asked for you may get an error when it tries to bind a non existent file to your data frame.
library(XML)
library(RCurl)
library(lubridate)
library(xts)


### Complete data required below

symb <- "INDEXFTSE:UKXMV"
fname <- "UKXMV"
startdate <- ymd("2014-01-31")
enddate <- ymd("2015-11-16")

## Construct basic URL 

sd <- day(startdate)
sm <- month(startdate, label = T)
sy <- year(startdate)

ed <- day(enddate)
em <- month(enddate, label = T)
ey <- year(enddate)

basicURL <- paste("https://www.google.co.uk/finance/historical?q=",symb,
                  "&startdate=",sm,"+",sd,"+",sy,"&enddate=",em,"+",ed,"+",ey,"&num=200&start=", sep ="")

nrows <- as.numeric(difftime( enddate, startdate, units = "days")/7*5)

### Download data and merge into a data frame called DF


start <- seq( 0, nrows, by = 200) 

DF <- NULL
for(i in 1:length(start)){

theurl <- getURL(paste(basicURL,start[i],sep = ""))  
temp <- readHTMLTable(theurl, which= 4, header = TRUE, stringsAsFactors = FALSE ) 
DF <- rbind(DF, temp)
}

## Tidy up DF
colnames(DF) <- gsub("\\s|\\\n",'',colnames(DF))

DF[DF == "-"] <- NA

head(DF)
tail(DF)

DF[,1] <- mdy(DF[,1])
row.names(DF) <- DF[,1]
DF[,1] <- NULL

id <- c(1:ncol(DF)) 
DF[,id] <- as.numeric(as.character(unlist(DF[,id])))

head(DF)

## Convert the data frame DF into an XTS object

DF.xts <- as.xts(DF)
assign(fname, DF.xts)


head(DF.xts)
plot(DF.xts)