Skip to content

UBC-MDS/DSCI_522_Group_302

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hong Kong Horse Race Predictor

  • Authors: Derek Kruszewski, Yi Liu, Rob Blumberg, Carlina Kim

Data analysis project for Group 302 for DSCI (Data Science Workflows): a Master of Data Science Course at the University of British Columbia.

About

This project attempts to build a regression model to answer the research question:

Given a set of features related to racing horses, can we predict the outcome of a race?

The model produced is able to predict finish times with an R^2 correaltion of 0.909.

The dataset used to answer this question is the Hong Kong Horse Racing Dataset for Experts, publicly available through Kaggle (HorseBaby 2018). This data has been rehosted on github for use with this project's scripts:

https://raw.githubusercontent.com/v5y8/horse_race_data/master

Please ensure the above github repository is used for downloading with Makefile.

Final Report

The final report can be found here, and can be viewed here.

Usage

There are two ways to replicate the analysis on your local machine. Either method will take 15-20 minutes to fully execute.

Method 1: Using Docker

Note - the instructions below depends on running this in a unix shell (e.g., terminal or Git Bash), if you are using Windows Command Prompt, replace /$(pwd) with PATH_ON_YOUR_COMPUTER.

  1. Install and run Docker

  2. Clone this Github repository and run the following command at the command line/terminal from the root directory of this project:

docker run --rm -v /$(pwd):/home/DSCI_522_Group_302 v5y8/group_302_environment make -C /home/DSCI_522_Group_302 all
  1. To reset the repo to a clean slate, run the following command at the command line/terminal from the root directory of this project:
docker run --rm -v /$(pwd):/home/DSCI_522_Group_302 v5y8/group_302_environment make -C /home/DSCI_522_Group_302 clean

Method 2: Using Make

This method requires all dependencies listed below to be installed before running the analysis. Run the following command in the terminal at the root directory of this project to replicate the analysis:

make all

To reset this repository to a clean state, run the following command in the terminal at the root directory of this project:

make clean

Dependencies diagram of Makefile

The relationships between scripts, data files, images, and final outputs are summarised in the dependency diagram below:

Makefile_diagram

Dependencies

Python 3.7.5 and Python Packages:

ChromeDriver (for use with selenium package):

R version 3.6.1 and R packages:

Contributions

We welcome all contributions to this project! If you notice a bug, or have a feature request, please open up an issue here. If you'd like to contribute a feature or bug fix, you can fork our repo and submit a pull request. We will review pull requests within 7 days. All contributors must abide by our code of conduct.

References

HorseBaby. 2018. “Horse Racing Dataset for Experts (Hong Kong).” https://www.kaggle.com/hrosebaby/horse-racing-dataset-for-experts-hong-kong.