Python Data Types & Structures

Start Here: The Mental Model

Think of Python's data structures like containers at a hardware store.

A list is like a shopping cart — ordered, you can add or remove items, and things can repeat.
A tuple is like a locked display case — ordered, but nobody's changing what's inside.
A set is like a bag where duplicates automatically fall out — unordered, unique items only.
A dictionary is like a filing cabinet — every item has a label (key) so you can find it instantly.

That mental model alone will carry you pretty far. Now let's go deeper.

Lists — Your Everyday Workhorse

A list is an ordered, mutable sequence. "Mutable" means you can change it after creation.

python

1	fruits = [class=class="syn-str">"syn-str">class="syn-str">"apple", class=class="syn-str">"syn-str">class="syn-str">"banana", class=class="syn-str">"syn-str">class="syn-str">"cherry"]
2	fruits.append(class=class="syn-str">"syn-str">class="syn-str">"mango") class="syn-comment"># adds to end
3	fruits[0] = class=class="syn-str">"syn-str">class="syn-str">"avocado" class="syn-comment"># changes first item

When to use a list:

You care about order.
You need to add, remove, or modify items.
Duplicates are fine or even expected.

Common pitfall: Lists are slow for membership checks. If you're writing if item in my_list in a tight loop with thousands of items, switch to a set. Checking membership in a list is O(n) — it scans every element. A set does it in O(1).

Tuples — Lightweight and Immutable

A tuple looks like a list but uses parentheses and cannot be changed after creation.

python

1	coordinates = (40.7128, -74.0060) class="syn-comment"># New York City lat/long
2	rgb = (255, 128, 0)

When to use a tuple:

The data shouldn't change — like coordinates, RGB values, or database records.
You need to use it as a dictionary key (lists can't be keys because they're mutable).
You want a small performance edge — tuples use slightly less memory than lists.

Common confusion: Some devs avoid tuples entirely and just use lists everywhere. That works, but it misses a signal. When you use a tuple, you're saying "this data is fixed." It communicates intent, and Python also enforces it.

Sets — When Uniqueness Is the Point

A set stores only unique values and doesn't preserve insertion order.

python

1	tags = {class=class="syn-str">"syn-str">class="syn-str">"python", class=class="syn-str">"syn-str">class="syn-str">"backend", class=class="syn-str">"syn-str">class="syn-str">"api"}
2	tags.add(class=class="syn-str">"syn-str">class="syn-str">"python") class="syn-comment"># duplicate — silently ignored
3	print(tags) class="syn-comment"># {class=class="syn-str">"syn-str">class="syn-str">"python", class=class="syn-str">"syn-str">class="syn-str">"backend", class=class="syn-str">"syn-str">class="syn-str">"api"}

When to use a set:

You need to eliminate duplicates.
You want fast membership testing (in checks).
You're doing set operations: union, intersection, difference.

python

1	a = {1, 2, 3, 4}
2	b = {3, 4, 5, 6}
3
4	print(a & b) class="syn-comment"># intersection: {3, 4}
5	print(a \| b) class="syn-comment"># union: {1, 2, 3, 4, 5, 6}
6	print(a - b) class="syn-comment"># difference: {1, 2}

Common pitfall: You can't index a set (my_set[0] raises an error). If you need both uniqueness and order, consider a list that you deduplicate, or look into dict.fromkeys() which preserves insertion order.

Dictionaries — Key-Value Lookup, Fast

A dictionary maps keys to values. Lookups by key are O(1) — essentially instant regardless of size.

python

1	user = {
2	class=class="syn-str">"syn-str">class="syn-str">"name": class=class="syn-str">"syn-str">class="syn-str">"Alice",
3	class=class="syn-str">"syn-str">class="syn-str">"age": 30,
4	class=class="syn-str">"syn-str">class="syn-str">"active": True
5	}
6
7	print(user[class=class="syn-str">"syn-str">class="syn-str">"name"]) class="syn-comment"># class=class="syn-str">"syn-str">class="syn-str">"Alice"
8	user[class=class="syn-str">"syn-str">class="syn-str">"email"] = class=class="syn-str">"syn-str">class="syn-str">"alice@example.com" class="syn-comment"># add a new key

When to use a dictionary:

You need to look something up by name or ID.
You're counting things (word_count["hello"] += 1).
You're grouping data by a category.
You want structured data without defining a class.

Common pitfall: Accessing a key that doesn't exist raises a KeyError. Use .get() instead for safe access:

python

1	class="syn-comment"># This raises an error if class=class="syn-str">"syn-str">class="syn-str">"email" doesn't exist
2	user[class=class="syn-str">"syn-str">class="syn-str">"email"]
3
4	class="syn-comment"># This returns None (or a default) if class=class="syn-str">"syn-str">class="syn-str">"email" doesn't exist
5	user.get(class=class="syn-str">"syn-str">class="syn-str">"email", class=class="syn-str">"syn-str">class="syn-str">"no email")

List Comprehensions and Dict Comprehensions

These are one-liners for building collections. They're faster than loops and, once you're used to them, more readable.

List comprehension:

python

1	class="syn-comment"># The long way
2	squares = []
3	for n in range(10):
4	squares.append(n ** 2)
5
6	class="syn-comment"># The comprehension way
7	squares = [n ** 2 for n in range(10)]
8
9	class="syn-comment"># With a filter
10	even_squares = [n ** 2 for n in range(10) if n % 2 == 0]

Dict comprehension:

python

1	words = [class=class="syn-str">"syn-str">class="syn-str">"apple", class=class="syn-str">"syn-str">class="syn-str">"banana", class=class="syn-str">"syn-str">class="syn-str">"cherry"]
2	word_lengths = {word: len(word) for word in words}
3	class="syn-comment"># {class=class="syn-str">"syn-str">class="syn-str">"apple": 5, class=class="syn-str">"syn-str">class="syn-str">"banana": 6, class=class="syn-str">"syn-str">class="syn-str">"cherry": 6}

When to use them: When you're transforming or filtering a collection in one step. If the logic gets complex (nested ifs, nested loops), a regular for loop is clearer. Comprehensions aren't always better — just often more concise.

Mutable vs Immutable — Why It Matters

Mutable objects can be changed after creation: lists, dicts, sets.

Immutable objects cannot: integers, floats, strings, tuples.

This distinction has practical consequences.

python

1	class="syn-comment"># Strings are immutable — this creates a NEW string, doesn't change the original
2	name = class=class="syn-str">"syn-str">class="syn-str">"alice"
3	name.upper() class="syn-comment"># returns class=class="syn-str">"syn-str">class="syn-str">"ALICE", but name is still class=class="syn-str">"syn-str">class="syn-str">"alice"
4	name = name.upper() class="syn-comment"># now name is class=class="syn-str">"syn-str">class="syn-str">"ALICE"

The bigger trap is with mutable default arguments in functions:

python

1	class="syn-comment"># DON'T do this
2	def add_item(item, my_list=[]):
3	my_list.append(item)
4	return my_list
5
6	add_item(class=class="syn-str">"syn-str">class="syn-str">"a") class="syn-comment"># [class=class="syn-str">"syn-str">class="syn-str">"a"]
7	add_item(class=class="syn-str">"syn-str">class="syn-str">"b") class="syn-comment"># [class=class="syn-str">"syn-str">class="syn-str">"a", class=class="syn-str">"syn-str">class="syn-str">"b"] — the list persists between calls!
8
9	class="syn-comment"># DO this instead
10	def add_item(item, my_list=None):
11	if my_list is None:
12	my_list = []
13	my_list.append(item)
14	return my_list

This trips up almost every Python developer at least once. The default list is created once when the function is defined, not each time it's called.

Time and Space Complexity — The Practical Summary

You don't need to memorize every operation, but these are the ones that matter most in practice.

Operation	List	Set	Dict
Access by index	O(1)	—	—
Access by key	—	—	O(1)
Membership test (`in`)	O(n)	O(1)	O(1)
Append / Add	O(1)	O(1)	O(1)
Insert at position	O(n)	—	—
Delete by value	O(n)	O(1)	O(1)

The one to internalize first: membership testing in a list is slow. If you're checking if x in collection frequently, use a set or dict instead.

Space-wise, sets and dicts consume more memory than lists because they maintain hash tables internally. For small datasets this doesn't matter. For millions of records, it might.

Takeaway

List when order matters and data changes.
Tuple when data is fixed and order matters.
Set when you need uniqueness and fast lookups.
Dict when you need to find things by a key.

Use comprehensions to build collections cleanly, but keep them readable. Understand mutability so you don't get surprised by shared state. And know that in checks on a list are linear — if you're doing many of them, reach for a set.

These aren't just syntax rules. Each choice communicates intent to the next person reading your code — including future you.

Designing RAG Pipelines for Production

Python Functions: Arguments, Scope, Lambdas, and First-Class Behavior