Final Project Template

This workbook provides the template for the final project.

Instructions

  • Work individually or in pairs
  • Each team is to complete 1 copy of this template.
    • Complete all sections.
    • Feel free to include supporting material / slides / documents as needed.
  • At the end of the project, you will get 15 minutes to present this workbook to the class.

Submission Instructions

  • Submit the .ipynb with the Output cells showing the results
    • Naming convention:
        <name1>-<name2>-<project_short_name>.ipynb
  • If you provide your own datasets, include the data with your .ipynb

Section 0: Team Members

  • Member 1
  • Member 2

Section 1: Project Title

  • The title should 1 sentence that describes the goal of this project.
  • Example: Clustering analysis of COE premiums and quotas

Section 2: Project Definition

Goals

Describe the goal of this project.

Example: The goal of this project is to determine if the bid quotas and premiums can be > used to predict the vehicle category.

Important:

  • If this is your first project, keep the project definition as simple as possible.
  • As a rule of thumb, pick something that can be completed in 2-3 days. There is always more you can add to it if you finish early.
  • If you are not sure, use the workshop problems as a reference.

Dataset

Briefly describe the source(s) of data you are using.

Example:

We will use the dataset from: https://data.gov.sg/dataset/coe-bidding-results

Format: CSV

Columns:

Name Type Unit of Measure Description
month Datetime, YYYY-MM none date range: Jan 1, 2010 to Mar 31, 2018
bidding_no Numeric No. of Bids Number of Bids
vehicle_class Text none Vehicle category: A to E
quota Numeric No. of Bids Number of Quota
bids_success Numeric No. of Bids Number of Successful Bids
bids_received Numeric No. of Bids Number of Bids Received
premium Numeric S$ COE premium

Tasks

List the tasks you will perform.

Example:

  1. Process the dataset to convert strings into labels.
  2. Shuffle and split into train and test sets
  3. Train a clustering algorithm, using Gaussian Mixture Model with 5 components, where each component is a vehicle category.
  4. Compute the metrics for the algorithm.
  5. Perform analysis for possible improvements.

Section 3: Prepare Dataset

Write your code below to prepare the dataset using pandas

Section 4: Select Features

Write your code below to create X_train, X_test, y_train, y_test

Section 5: Train the algorithm(s)

Write your code below to initialize and train the algorithm(s)

Section 6: Evaluate metrics

Write your code below to evaluate metrics for the trained algorithm(s).

Feel free to plot the algorithm to visualize it, as appropriately.

Section 7: Observations and analysis

Answer the following questions:

  1. How did you measure the algorithm? Specify the metrics you used.

  2. What is the outcome of the measurement? Explain the interpretation of the metrics.

    • Is there overfitting or underfitting?
    • Is there low accuracy or high error? If so, why do you think this is the case?
  3. What improvements do you propose?

  4. What is the most challenging part of this project?