Module 2: Machine Learning Basics
AI Glossary and Key Concepts
This is machine learning at its most practical: not artificial intelligence in any grand sense, but pattern-matching software that makes predictions about what you'll do based on what millions of similar people have done.
Machine Learning in One Sentence
Machine learning is software that improves at a task by learning from data, rather than being explicitly programmed.
Traditional programming: if genre == "thriller" AND rating > 4.0, recommend()
Machine learning: show the system 100 million viewing histories and let it figure out the patterns itself.
The machine learning approach wins when the rules are too complex for a human to write. Nobody can write explicit rules for "what movies will this specific person enjoy" โ there are too many variables. But patterns exist in the data, and ML finds them.
The Three Types (and When Each Matters)
Supervised learning โ You give the AI labelled examples. "Here are 10,000 emails. These are spam, these aren't. Learn the difference." The system learns to classify new emails. This is the most common type and powers: email filtering, medical diagnosis, credit scoring, image classification.
Unsupervised learning โ You give the AI unlabelled data. "Here are 100,000 customer records. Find patterns." The system discovers groupings you didn't know existed โ maybe customers who buy organic food also tend to purchase pet insurance. Used for: customer segmentation, anomaly detection, data exploration.
Reinforcement learning โ The AI learns by trial and error, receiving rewards for good outcomes. This is how DeepMind's AlphaGo learned to beat the world Go champion in 2016 โ it played millions of games against itself, reinforcing strategies that won. Used for: game playing, robotics, autonomous vehicles, resource optimisation.
Match each type of machine learning to its approach:
The Concepts That Actually Matter
Features: The input variables the model uses to make predictions. For a spam classifier: email length, sender reputation, keyword frequency, link count. Choosing the right features is half the battle โ garbage features in, garbage predictions out.
Training data: The examples the model learns from. The single most important factor in ML quality. Insufficient data, biased data, or low-quality data produces bad models regardless of the algorithm. Google's AI advantage isn't better algorithms โ it's more data than anyone else.
Overfitting: When a model learns the training data too well, including its noise and quirks, and performs badly on new data. Like a student who memorises exam answers but can't solve novel problems. This is the most common failure mode in ML.
Underfitting: The opposite โ the model is too simple to capture the actual patterns. Like explaining Netflix preferences with only "genre" when mood, time of day, and social context all matter.
The bias-variance tradeoff: Simple models are reliable but miss nuance (high bias). Complex models capture nuance but may be unreliable (high variance). The art of ML is finding the sweet spot.
What is 'overfitting' in machine learning?
Why This Matters for You
You don't need to build ML models. But you need to understand how they work because:
- When your loan is denied by an algorithm, understanding ML tells you it's based on statistical patterns from historical data โ which may encode historical discrimination
- When an AI tool at work makes a recommendation, you need judgment about when to trust it and when to override it
- When companies claim "AI-powered," understanding ML lets you evaluate whether that's meaningful or marketing
The most dangerous position is believing AI is magic. The second most dangerous is believing it's useless. The truth is in between: it's powerful pattern-matching with specific, predictable limitations.
---
ML Concepts Made Concrete
Explain these machine learning concepts using a real-world analogy of a restaurant predicting how much food to prepare each day: 1. Training data 2. Features 3. Overfitting 4. Underfitting 5. The bias-variance tradeoff For each, explain what going wrong looks like in the restaurant scenario and what the real-world consequences would be.
Evaluate an ML Claim
A company is selling me an "AI-powered" HR tool that predicts which employees are likely to leave the company within 6 months. They claim 85% accuracy. Help me ask the right questions: 1. What does "85% accuracy" actually mean? What are the possible misleading interpretations? 2. What training data would they need, and what biases might it contain? 3. What features (inputs) would this model likely use, and which ones could be problematic? 4. What are the ethical risks of using this in my organisation? 5. What should I ask the vendor that they probably don't want to answer?
ML Type Identifier
For each of these real applications, identify whether it's supervised, unsupervised, or reinforcement learning, and explain why: 1. Spotify's Discover Weekly playlist 2. A bank's fraud detection system 3. A self-driving car learning to navigate 4. Amazon's "customers who bought this also bought..." 5. ChatGPT generating text For each, also explain what the "training data" is and what could go wrong with it.
1. Pick a decision you make regularly (what to eat, which route to take, what to watch)
2. Write down the "features" you consider (time of day, mood, weather, who you're with)
3. Write down the "training data" โ your past experiences with this decision
4. Try to write explicit rules for your decision ("if tired AND cold, order pizza")
5. Notice how quickly the rules become too complex โ this is why ML exists
You've just experienced the fundamental insight behind machine learning: some decisions are too complex for rules but follow patterns that can be learned from examples.
---
- 1Machine learning is software that improves at tasks by learning from data, not explicit programming
- 2Three types: supervised (labelled examples), unsupervised (find hidden patterns), reinforcement (trial and error)
- 3Training data quality is the single most important factor โ more important than the algorithm
- 4Overfitting (memorising noise) is the most common failure; underfitting (too simple) is the second
- 5You don't need to build ML โ but understanding it helps you evaluate AI claims, challenge AI decisions, and use AI tools wisely