Showing posts with label web scrape. Show all posts
Showing posts with label web scrape. Show all posts

Wednesday, 21 January 2015

FOMC Dates - Full History Web Scrape

As I delve into the existing academic research regarding price patterns around US Federal Open Market Committee (FOMC) meetings, it’s clear that I will need more data than I collected in the previous post FOMC Dates - Scraping Data From Web Pages.

Which reminds me of the quote by Google’s Research Director Peter Norvig:

We don’t have better algorithms. We just have more data.

In particular, I’ll need FOMC dates from at least February 1994 (when the Federal Reserve began issuing statements describing monetary policy decisions following an FOMC meeting) and whether the meeting was scheduled or not - there are 8 scheduled meetings per year but they also hold inter-meeting conference calls as and when needed (for interesting background info on the Fed’s move to greater transparency over time see the article posted on their web site).

With such data it should be relatively easy to reproduce some of the results from the academic research.

All the data is available on the Fed’s web site but unfortunately it requires scraping it off many web pages. To do this I decided to use XPath (XML Path Language) and regular expressions in R. I created 4 R functions and saved them in a separate file “FOMC Dates Functions.R”, which you will need to download from GitHub in order to run the R code below (save the file in your working directory).

## install.packages(c("httr", "XML"), repos = "http://cran.us.r-project.org")
library(httr)
library(XML)

# load fomc date functions
source("FOMC Dates Functions.R")

# extract data from web pages and parse dates
fomcdatespre2009 <- get.fomc.dates.pre.2009(1936, 2008)
fomcdatesfrom2009 <- get.fomc.dates.from.2009()

# combine datasets and order chronologically
fomcdatesall <- do.call(rbind, list(fomcdatespre2009, fomcdatesfrom2009))
fomcdatesall <- fomcdatesall[order(fomcdatesall$begdate), ]

# save as RData format
save(fomcdatesall, file = "fomcdatesall.RData")
# save as csv file
write.csv(fomcdatesall, "fomcdatesall.csv", row.names = FALSE)

# check results
head(fomcdatesall)

This will scrape the full history of FOMC meetings from 1936 to the present. The data is stored in a dataframe with a row for each meeting/conference call and 6 columns for beginning and end dates, whether a press conference was held, whether it was a regularly scheduled meeting, type of document published to record meeting details, and the url of that document. For example:

##      begdate    enddate pressconf scheduled
## 1 1936-03-18 1936-03-18         0         1
## 2 1936-03-19 1936-03-19         0         1
## 3 1936-05-25 1936-05-25         0         1
## 4 1936-11-19 1936-11-19         0         1
## 5 1936-11-20 1936-11-20         0         1
## 6 1937-01-26 1937-01-26         0         1
##                                document
## 1       Historical Minutes (400 KB PDF)
## 2 Record of Policy Actions (271 KB PDF)
## 3 Record of Policy Actions (143 KB PDF)
## 4       Historical Minutes (176 KB PDF)
## 5 Record of Policy Actions (198 KB PDF)
## 6 Record of Policy Actions (272 KB PDF)
##                                             url
## 1 /monetarypolicy/files/FOMChistmin19360318.pdf
## 2    /monetarypolicy/files/fomcropa19360319.pdf
## 3    /monetarypolicy/files/fomcropa19360525.pdf
## 4 /monetarypolicy/files/FOMChistmin19361119.pdf
## 5    /monetarypolicy/files/fomcropa19361120.pdf
## 6    /monetarypolicy/files/fomcropa19370126.pdf

The data is also saved to disk as an RData file for use in my future posts and as a csv file (if you need the dates for your own research).

Click here for the above R code on GitHub.

Click here for the “FOMC Dates Functions.R” code file on GitHub.

UPDATE 2019: Please note that the Fed has once again changed the HTML format of their webpage and so my old code won't work now. Also I don't use the code nor the dates now, so I won't be updating the code anymore. But you can download the csv file below for dates up until mid-2018.

Click here for the csv file of FOMC dates from 1936-2018.

Sunday, 30 November 2014

FOMC Dates - Scraping Data From Web Pages

Before we can do some quant analysis, we need to get some relevant data - and the web is a good place to start. Sometimes the data can be downloaded in a standard format like .csv files or available via an API e.g. http://www.quandl.com but often you’ll need to scrape data directly from web pages.

In this post I’ll show how to obtain the US Federal Reserve FOMC Announcement dates (i.e. those when a statement is published after the meeting) from their web page http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm. At the time of writing, this web page had dates from 2009 onward.

First, install and load the httr and XML R packages.

install.packages(c("httr", "XML"), repos = "http://cran.us.r-project.org")
library(httr)
library(XML)

Next, run the following R code.

# get and parse web page content
webpage <- content(GET(
    "http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm"), 
    as = "text")
xhtmldoc <- htmlParse(webpage)
# get statement urls and sort them
statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr,
    "href")
statements <- sort(statements)
# get dates from statement urls
fomcdates <- sapply(statements, function(x) substr(x, 28, 35))
fomcdates <- as.Date(fomcdates, format = "%Y%m%d")
# save results in working directory
save(list = c("statements", "fomcdates"), file = "fomcdates.RData")

Finally, check the results by looking at their structures and first few values.

# check data
str(statements)
head(statements)
str(fomcdates)
head(fomcdates)

And you should see output similar to this below.

##  chr [1:49] "/newsevents/press/monetary/20090128a.htm" ...
## [1] "/newsevents/press/monetary/20090128a.htm"
## [2] "/newsevents/press/monetary/20090318a.htm"
## [3] "/newsevents/press/monetary/20090429a.htm"
## [4] "/newsevents/press/monetary/20090624a.htm"
## [5] "/newsevents/press/monetary/20090812a.htm"
## [6] "/newsevents/press/monetary/20090923a.htm"
##  Date[1:49], format: "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" ...
## [1] "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" "2009-08-12"
## [6] "2009-09-23"

So what can we do with this data? Here are a few ideas:

  • Go deeper and download the actual statements and use a machine learning algorithm (Natural Language Processing (NLP)) to analyze the statement e.g. positive or negative sentiment. Actually, this is quite a complex task but is something on my list of research topics in 2015…
  • Collect price data e.g. Treasury yields or S&P500 and do some visual / initial exploratory analysis around the FOMC announcement dates
  • Conduct an event study like the academics do to identify whether or not there are any statistically significant patterns around these dates
  • Incorporate the dates into a trading or investment program and backtest to see whether there are economically significant patterns i.e. tradeable alpha opportunities

Click here for the R code on GitHub.

UPDATE 2019: Please note that the Fed has once again changed the HTML format of their webpage and so my old code won't work now. Also I don't use the code nor the dates now, so I won't be updating the code anymore. But you can download the csv file below for dates up until mid-2018.

Click here for the csv file of FOMC dates from 1936-2018.