This article is part number 2 of the Readability series.


In a dynamically-typed language, it is common for the scoping semantics of a variable to be wider than a single code block. For example: in at least Python and the shell, it is the case that a variable defined anywhere within a function —even inside conditionals or loops— is reachable anywhere in the function from there on.

To illustrate what this means, consider this snippet in which we define a function to compute the CPU requirements needed in a database system to support a set of tables:

def cpu_requirements(database):
    disk_bytes, tables_count = calculate_summaries(database)
    if disk_bytes > 0:
        cpu = cpu_favoring_disk_bytes(disk_bytes)
    elif tables_count > 0:
        cpu = cpu_favoring_tables_count(tables_count)
    else:
        cpu = 0

    ... various tens of lines of code ...

    return cpu + SAFETY_MARGIN

The thing I want you to note in this code snippet is that we are defining the cpu variable in all code paths of the conditional and later using the computed value outside the conditional, possibly after tens of lines of code. Practically, there is nothing wrong with this as long as the code works as intended, but I personally find this style to be confusing: the code does not show the intent of the programmer regarding where a variable is going to be used.

As a guideline, define a variable in the outermost block where it is used. Visually, this means defining the variable at the leftmost indentation level in which it is going to be referenced later.

The code above would be rewritten as follows:

def cpu_requirements(tables_info):
    disk_bytes, tables_count = calculate_summaries(database)
    cpu = None
    if disk_bytes > 0:
        cpu = cpu_favoring_disk_bytes(disk_bytes)
    elif tables_count > 0:
        cpu = cpu_favoring_tables_count(tables_count)
    else:
        cpu = 0
    assert cpu is not None, 'cpu not defined in a code path'

    ... various tens of lines of code ...

    return cpu + SAFETY_MARGIN

There are two key ideas behind these tiny adjustments:

  • First, the scope of the cpu variable is explicitly declared to be outside of the conditional. This happens right before entering the alternative code paths, so there is a clear expectation that this variable will be used later on outside of the conditional statement.
  • And second, the expectation that the cpu variable has been assigned a value in all possible code paths is explicitly coded by an assertion. This is important. Consider that the code in each branch of the conditional could be multiple lines long, and the assignment of a value to our variable might be buried among those lines. With the assertion, we want to ensure that whatever happens to the contents of the conditional branches in future revisions does not mean that the initialization of the variable is lost (by mistake).

Everything mentioned here applies to loops and other higher-level constructs as well, particularly try/catch blocks. Consider this code:

def get_current_user_data():
    try:
        user_data = helper_module.get_user_data(os.getuid())
    except helper_module.UserQueryError, e:
        raise BackendError(e)

    try:
        group_data = helper_module.get_group_data(os.getgid())
    except helper_module.GroupQueryError, e:
        raise BackendError(e)

    return user_data, group_data

The code in this example performs two separate queries via a helper module, and these queries raise exceptions defined in the helper module. To be self-contained, our get_current_user_data() rewrites these exceptions as a generic exception defined in the current module. The issue, however, is that the variables defined within the try block are accessed later separately. I would instead do:

def get_current_user_data():
    user_data = None
    try:
        user_data = helper_module.get_user_data(os.getuid())
    except helper_module.UserQueryError, e:
        raise BackendError(e)
    assert user_data is not None

    group_data = None
    try:
        group_data = helper_module.get_group_data(os.getgid())
    except helper_module.GroupQueryError, e:
        raise BackendError(e)
    assert group_data is not None

    return user_data, group_data

Which is very similar to what we did above for conditionals.

To wrap everything up, keep these guidelines in mind:

  • Define variables in the outermost block where they are referenced.
  • If you don’t have a good default value to which to initialize the variable to—because, for example, its value is computed in a conditional path—set it to None and later assert that it has been set to something different.

Comments from the original Blogger-hosted post: