Python Data Types & Structures
Start Here: The Mental Model
Think of Python's data structures like containers at a hardware store.
- A list is like a shopping cart — ordered, you can add or remove items, and things can repeat.
- A tuple is like a locked display case — ordered, but nobody's changing what's inside.
- A set is like a bag where duplicates automatically fall out — unordered, unique items only.
- A dictionary is like a filing cabinet — every item has a label (key) so you can find it instantly.
That mental model alone will carry you pretty far. Now let's go deeper.
Lists — Your Everyday Workhorse
A list is an ordered, mutable sequence. "Mutable" means you can change it after creation.
| 1 | fruits = [class=class="syn-str">"syn-str">class="syn-str">"apple", class=class="syn-str">"syn-str">class="syn-str">"banana", class=class="syn-str">"syn-str">class="syn-str">"cherry"] |
| 2 | fruits.append(class=class="syn-str">"syn-str">class="syn-str">"mango") class="syn-comment"># adds to end |
| 3 | fruits[0] = class=class="syn-str">"syn-str">class="syn-str">"avocado" class="syn-comment"># changes first item |
When to use a list:
- You care about order.
- You need to add, remove, or modify items.
- Duplicates are fine or even expected.
Common pitfall: Lists are slow for membership checks. If you're writing if item in my_list in a tight loop with thousands of items, switch to a set. Checking membership in a list is O(n) — it scans every element. A set does it in O(1).
Tuples — Lightweight and Immutable
A tuple looks like a list but uses parentheses and cannot be changed after creation.
| 1 | coordinates = (40.7128, -74.0060) class="syn-comment"># New York City lat/long |
| 2 | rgb = (255, 128, 0) |
When to use a tuple:
- The data shouldn't change — like coordinates, RGB values, or database records.
- You need to use it as a dictionary key (lists can't be keys because they're mutable).
- You want a small performance edge — tuples use slightly less memory than lists.
Common confusion: Some devs avoid tuples entirely and just use lists everywhere. That works, but it misses a signal. When you use a tuple, you're saying "this data is fixed." It communicates intent, and Python also enforces it.
Sets — When Uniqueness Is the Point
A set stores only unique values and doesn't preserve insertion order.
| 1 | tags = {class=class="syn-str">"syn-str">class="syn-str">"python", class=class="syn-str">"syn-str">class="syn-str">"backend", class=class="syn-str">"syn-str">class="syn-str">"api"} |
| 2 | tags.add(class=class="syn-str">"syn-str">class="syn-str">"python") class="syn-comment"># duplicate — silently ignored |
| 3 | print(tags) class="syn-comment"># {class=class="syn-str">"syn-str">class="syn-str">"python", class=class="syn-str">"syn-str">class="syn-str">"backend", class=class="syn-str">"syn-str">class="syn-str">"api"} |
When to use a set:
- You need to eliminate duplicates.
- You want fast membership testing (
inchecks). - You're doing set operations: union, intersection, difference.
| 1 | a = {1, 2, 3, 4} |
| 2 | b = {3, 4, 5, 6} |
| 3 | |
| 4 | print(a & b) class="syn-comment"># intersection: {3, 4} |
| 5 | print(a | b) class="syn-comment"># union: {1, 2, 3, 4, 5, 6} |
| 6 | print(a - b) class="syn-comment"># difference: {1, 2} |
Common pitfall: You can't index a set (my_set[0] raises an error). If you need both uniqueness and order, consider a list that you deduplicate, or look into dict.fromkeys() which preserves insertion order.
Dictionaries — Key-Value Lookup, Fast
A dictionary maps keys to values. Lookups by key are O(1) — essentially instant regardless of size.
| 1 | user = { |
| 2 | class=class="syn-str">"syn-str">class="syn-str">"name": class=class="syn-str">"syn-str">class="syn-str">"Alice", |
| 3 | class=class="syn-str">"syn-str">class="syn-str">"age": 30, |
| 4 | class=class="syn-str">"syn-str">class="syn-str">"active": True |
| 5 | } |
| 6 | |
| 7 | print(user[class=class="syn-str">"syn-str">class="syn-str">"name"]) class="syn-comment"># class=class="syn-str">"syn-str">class="syn-str">"Alice" |
| 8 | user[class=class="syn-str">"syn-str">class="syn-str">"email"] = class=class="syn-str">"syn-str">class="syn-str">"alice@example.com" class="syn-comment"># add a new key |
When to use a dictionary:
- You need to look something up by name or ID.
- You're counting things (
word_count["hello"] += 1). - You're grouping data by a category.
- You want structured data without defining a class.
Common pitfall: Accessing a key that doesn't exist raises a KeyError. Use .get() instead for safe access:
| 1 | class="syn-comment"># This raises an error if class=class="syn-str">"syn-str">class="syn-str">"email" doesn't exist |
| 2 | user[class=class="syn-str">"syn-str">class="syn-str">"email"] |
| 3 | |
| 4 | class="syn-comment"># This returns None (or a default) if class=class="syn-str">"syn-str">class="syn-str">"email" doesn't exist |
| 5 | user.get(class=class="syn-str">"syn-str">class="syn-str">"email", class=class="syn-str">"syn-str">class="syn-str">"no email") |
List Comprehensions and Dict Comprehensions
These are one-liners for building collections. They're faster than loops and, once you're used to them, more readable.
List comprehension:
| 1 | class="syn-comment"># The long way |
| 2 | squares = [] |
| 3 | for n in range(10): |
| 4 | squares.append(n ** 2) |
| 5 | |
| 6 | class="syn-comment"># The comprehension way |
| 7 | squares = [n ** 2 for n in range(10)] |
| 8 | |
| 9 | class="syn-comment"># With a filter |
| 10 | even_squares = [n ** 2 for n in range(10) if n % 2 == 0] |
Dict comprehension:
| 1 | words = [class=class="syn-str">"syn-str">class="syn-str">"apple", class=class="syn-str">"syn-str">class="syn-str">"banana", class=class="syn-str">"syn-str">class="syn-str">"cherry"] |
| 2 | word_lengths = {word: len(word) for word in words} |
| 3 | class="syn-comment"># {class=class="syn-str">"syn-str">class="syn-str">"apple": 5, class=class="syn-str">"syn-str">class="syn-str">"banana": 6, class=class="syn-str">"syn-str">class="syn-str">"cherry": 6} |
When to use them: When you're transforming or filtering a collection in one step. If the logic gets complex (nested ifs, nested loops), a regular for loop is clearer. Comprehensions aren't always better — just often more concise.
Mutable vs Immutable — Why It Matters
Mutable objects can be changed after creation: lists, dicts, sets.
Immutable objects cannot: integers, floats, strings, tuples.
This distinction has practical consequences.
| 1 | class="syn-comment"># Strings are immutable — this creates a NEW string, doesn't change the original |
| 2 | name = class=class="syn-str">"syn-str">class="syn-str">"alice" |
| 3 | name.upper() class="syn-comment"># returns class=class="syn-str">"syn-str">class="syn-str">"ALICE", but name is still class=class="syn-str">"syn-str">class="syn-str">"alice" |
| 4 | name = name.upper() class="syn-comment"># now name is class=class="syn-str">"syn-str">class="syn-str">"ALICE" |
The bigger trap is with mutable default arguments in functions:
| 1 | class="syn-comment"># DON'T do this |
| 2 | def add_item(item, my_list=[]): |
| 3 | my_list.append(item) |
| 4 | return my_list |
| 5 | |
| 6 | add_item(class=class="syn-str">"syn-str">class="syn-str">"a") class="syn-comment"># [class=class="syn-str">"syn-str">class="syn-str">"a"] |
| 7 | add_item(class=class="syn-str">"syn-str">class="syn-str">"b") class="syn-comment"># [class=class="syn-str">"syn-str">class="syn-str">"a", class=class="syn-str">"syn-str">class="syn-str">"b"] — the list persists between calls! |
| 8 | |
| 9 | class="syn-comment"># DO this instead |
| 10 | def add_item(item, my_list=None): |
| 11 | if my_list is None: |
| 12 | my_list = [] |
| 13 | my_list.append(item) |
| 14 | return my_list |
This trips up almost every Python developer at least once. The default list is created once when the function is defined, not each time it's called.
Time and Space Complexity — The Practical Summary
You don't need to memorize every operation, but these are the ones that matter most in practice.
| Operation | List | Set | Dict |
|---|---|---|---|
| Access by index | O(1) | — | — |
| Access by key | — | — | O(1) |
Membership test (in) | O(n) | O(1) | O(1) |
| Append / Add | O(1) | O(1) | O(1) |
| Insert at position | O(n) | — | — |
| Delete by value | O(n) | O(1) | O(1) |
The one to internalize first: membership testing in a list is slow. If you're checking if x in collection frequently, use a set or dict instead.
Space-wise, sets and dicts consume more memory than lists because they maintain hash tables internally. For small datasets this doesn't matter. For millions of records, it might.
Takeaway
- List when order matters and data changes.
- Tuple when data is fixed and order matters.
- Set when you need uniqueness and fast lookups.
- Dict when you need to find things by a key.
Use comprehensions to build collections cleanly, but keep them readable. Understand mutability so you don't get surprised by shared state. And know that in checks on a list are linear — if you're doing many of them, reach for a set.
These aren't just syntax rules. Each choice communicates intent to the next person reading your code — including future you.