(image: xkcd)
Discussions, questions: mldds03.slack.com
Structure: 5 modules
Assignments: two graded projects and presentations
Python coding is required
Lots of experimentation
Practice
Patience
High level strategy
Solving math equations
Implementing altogithms from scratch
Exhaustive coverage of all algorithms
Label things [e.g. Classification]
Predict trends [e.g. Linear prediction]
Find groups of things [i.e. Clustering]
Find outliers [i.e. Anomaly detection]
... maybe more
Informal survey: https://github.com/szilard/kaggle-scripts-R-pydata
Programming language: Python 3
Environment: Jupyter and Anaconda
Libraries: IPython, numpy, pandas, matplotlib, scikit-learn, keras, nltk, DeepSpeech, ...
Experiment! Don't limit yourself only to the tools we cover
Data Collection: not covered because data-source dependent (HTML, XML, JSON, databases, images, audio, video, ...)
Data Visualization: numpy, pandas, matplotlib
Data Transforming: numpy, pandas
Model: scikit-learn, keras, etc
Validation: scikit-learn, keras, etc
Download and install Anaconda for Python 3.6
Open the cheatsheet: https://conda.io/docs/_downloads/conda-cheatsheet.pdf
Get a command prompt with conda
in your path:
Windows: Start Button -> "Anaconda Prompt"
Ubuntu / MacOS: conda
should be in your path
Locate the command in the cheatsheet on "create a new environment ..."
Something that starts with conda create ...
Create an environment called mldds01
for Python 3:
conda create -n mldds01 python=3
Activate the environment, following the output from conda create
.
You can also find this from the cheatsheet.
We'll be using Jupyter notebooks as our IDE (interactive development environment) for machine learning experiments.
In your brand new conda environment, install Jupyter:
conda install jupyter
Browse your command line to the location of this notebook (you can use cd
)
cd path/to/mldds-courseware
jupyter notebook
In the browser window, navigate to 01_GettingStarted
, then the first workshop: numpy.ipynb
.