Skip to content
Part 3: The Craft 7 min read

Lesson 10

Testing for Confidence

Because 100% Coverage With Bad Tests Is Worse Than 80% With Good Ones

Let's be honest. You've written tests like this:

def test_user_exists():
    user = User(name="Test")
    assert user.name == "Test"

That test will never catch a real bug. It tests that Python assignment works. Your coverage number goes up. Your confidence stays at zero.

Good tests answer a different question: If this test passes, can I ship with confidence?

What to always test

  • Business rules — The logic that earns money or prevents lawsuits. If a project can only have one active task per sprint, test that constraint.
  • Authorization — Can user A really not access user B's data? Test the boundary, not the happy path.
  • Data isolation — Does multi-tenancy actually isolate? Prove it with a test that tries to cross the boundary.
  • Edge cases — Empty inputs, duplicate entries, concurrent operations. These are where production bugs hide.

What to skip

  • Framework behavior — Your ORM already works. You don't need to test that User.query.get(id) returns a user.
  • Trivial getters and setters — If it's a property access, it doesn't need a test.
  • UI layout tests — They break on every redesign and rarely catch functional bugs.

Name tests so they document behavior

A failing test name should tell you exactly what broke without opening the file.

# Bad — tells you nothing when it fails
def test_task_1():
def test_thing_works():

# Good — describes behavior, scenario, and expected result
def test_create_task_with_past_due_date_raises_validation_error():
def test_deactivate_user_unassigns_all_open_tasks():
def test_admin_can_view_all_workspace_sessions():

Pattern: test_{action}_{scenario}_{expected_result}

Your test names become living documentation. When someone asks "what happens if a user tries to assign two tasks to the same sprint slot?", the answer is in the test name.

Two types of tests, two purposes

Unit tests: fast, focused, no database

async def test_calculate_story_points_rounds_to_fibonacci():
    result = estimate_points(raw_score=37)
    assert result == 40  # Rounds to nearest Fibonacci number

Tests service logic. Runs in milliseconds. Mocks database interactions. Catches logical errors instantly.

Integration tests: realistic, with database and RLS

async def test_member_cannot_see_other_workspace_projects(db_session):
    # Set context to Workspace A
    await set_workspace_context(db_session, workspace_a_id)

    # Try to query Workspace B's projects
    projects = await project_service.list_projects(db_session)
    assert len(projects) == 0  # RLS blocks cross-workspace access

Tests the full stack. Slower, but proves your security actually works. These are the tests that let you sleep at night.

The coverage floor

Set --cov-fail-under=80 in your CI pipeline. Not because 80% is a magic number, but because a coverage floor prevents erosion. Without it, coverage drifts down 1% per month until testing becomes optional in practice.

80% is the floor, not the ceiling. Some domains (authentication, billing, data isolation) should be at 95%+. Others (simple CRUD) can sit at 70%. The aggregate floor keeps the average honest.