Sunday, 30 November 2014

FOMC Dates - Scraping Data From Web Pages

Before we can do some quant analysis, we need to get some relevant data - and the web is a good place to start. Sometimes the data can be downloaded in a standard format like .csv files or available via an API e.g. but often you’ll need to scrape data directly from web pages.

In this post I’ll show how to obtain the US Federal Reserve FOMC Announcement dates (i.e. those when a statement is published after the meeting) from their web page At the time of writing, this web page had dates from 2009 onward.

First, install and load the httr and XML R packages.

install.packages(c("httr", "XML"), repos = "")

Next, run the following R code.

# get and parse web page content
webpage <- content(GET(
    as = "text")
xhtmldoc <- htmlParse(webpage)
# get statement urls and sort them
statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr,
statements <- sort(statements)
# get dates from statement urls
fomcdates <- sapply(statements, function(x) substr(x, 28, 35))
fomcdates <- as.Date(fomcdates, format = "%Y%m%d")
# save results in working directory
save(list = c("statements", "fomcdates"), file = "fomcdates.RData")

Finally, check the results by looking at their structures and first few values.

# check data

And you should see output similar to this below.

##  chr [1:49] "/newsevents/press/monetary/20090128a.htm" ...
## [1] "/newsevents/press/monetary/20090128a.htm"
## [2] "/newsevents/press/monetary/20090318a.htm"
## [3] "/newsevents/press/monetary/20090429a.htm"
## [4] "/newsevents/press/monetary/20090624a.htm"
## [5] "/newsevents/press/monetary/20090812a.htm"
## [6] "/newsevents/press/monetary/20090923a.htm"
##  Date[1:49], format: "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" ...
## [1] "2009-01-28" "2009-03-18" "2009-04-29" "2009-06-24" "2009-08-12"
## [6] "2009-09-23"

So what can we do with this data? Here are a few ideas:

  • Go deeper and download the actual statements and use a machine learning algorithm (Natural Language Processing (NLP)) to analyze the statement e.g. positive or negative sentiment. Actually, this is quite a complex task but is something on my list of research topics in 2015…
  • Collect price data e.g. Treasury yields or S&P500 and do some visual / initial exploratory analysis around the FOMC announcement dates
  • Conduct an event study like the academics do to identify whether or not there are any statistically significant patterns around these dates
  • Incorporate the dates into a trading or investment program and backtest to see whether there are economically significant patterns i.e. tradeable alpha opportunities

Click here for the R code on GitHub.