Let's be honest. You've written tests like this:
def test_user_exists():
user = User(name="Test")
assert user.name == "Test" That test will never catch a real bug. It tests that Python assignment works. Your coverage number goes up. Your confidence stays at zero.
Good tests answer a different question: If this test passes, can I ship with confidence?
What to always test
- Business rules — The logic that earns money or prevents lawsuits. If a project can only have one active task per sprint, test that constraint.
- Authorization — Can user A really not access user B's data? Test the boundary, not the happy path.
- Data isolation — Does multi-tenancy actually isolate? Prove it with a test that tries to cross the boundary.
- Edge cases — Empty inputs, duplicate entries, concurrent operations. These are where production bugs hide.
What to skip
- Framework behavior — Your ORM already works. You don't need to test that
User.query.get(id)returns a user. - Trivial getters and setters — If it's a property access, it doesn't need a test.
- UI layout tests — They break on every redesign and rarely catch functional bugs.
Name tests so they document behavior
A failing test name should tell you exactly what broke without opening the file.
# Bad — tells you nothing when it fails
def test_task_1():
def test_thing_works():
# Good — describes behavior, scenario, and expected result
def test_create_task_with_past_due_date_raises_validation_error():
def test_deactivate_user_unassigns_all_open_tasks():
def test_admin_can_view_all_workspace_sessions(): Pattern: test_{action}_{scenario}_{expected_result}
Your test names become living documentation. When someone asks "what happens if a user tries to assign two tasks to the same sprint slot?", the answer is in the test name.
Two types of tests, two purposes
Unit tests: fast, focused, no database
async def test_calculate_story_points_rounds_to_fibonacci():
result = estimate_points(raw_score=37)
assert result == 40 # Rounds to nearest Fibonacci number Tests service logic. Runs in milliseconds. Mocks database interactions. Catches logical errors instantly.
Integration tests: realistic, with database and RLS
async def test_member_cannot_see_other_workspace_projects(db_session):
# Set context to Workspace A
await set_workspace_context(db_session, workspace_a_id)
# Try to query Workspace B's projects
projects = await project_service.list_projects(db_session)
assert len(projects) == 0 # RLS blocks cross-workspace access Tests the full stack. Slower, but proves your security actually works. These are the tests that let you sleep at night.
The coverage floor
Set --cov-fail-under=80 in your CI pipeline. Not because 80% is a magic number, but because a coverage floor prevents erosion. Without it, coverage drifts down 1% per month until testing becomes optional in practice.
80% is the floor, not the ceiling. Some domains (authentication, billing, data isolation) should be at 95%+. Others (simple CRUD) can sit at 70%. The aggregate floor keeps the average honest.