In a dynamically-typed language, it is common for the scoping semantics of a variable to be wider than a single code block. For example: in at least Python and the shell, it is the case that a variable defined anywhere within a function —even inside conditionals or loops— is reachable anywhere in the function from there on.
To illustrate what this means, consider this snippet in which we define a function to compute the CPU requirements needed in a database system to support a set of tables:
def cpu_requirements(database):
disk_bytes, tables_count = calculate_summaries(database)
if disk_bytes > 0:
cpu = cpu_favoring_disk_bytes(disk_bytes)
elif tables_count > 0:
cpu = cpu_favoring_tables_count(tables_count)
else:
cpu = 0
... various tens of lines of code ...
return cpu + SAFETY_MARGIN
The thing I want you to note in this code snippet is that we are defining the cpu
variable in all code paths of the conditional and later using the computed value outside the conditional, possibly after tens of lines of code. Practically, there is nothing wrong with this as long as the code works as intended, but I personally find this style to be confusing: the code does not show the intent of the programmer regarding where a variable is going to be used.
As a guideline, define a variable in the outermost block where it is used. Visually, this means defining the variable at the leftmost indentation level in which it is going to be referenced later.
The code above would be rewritten as follows:
def cpu_requirements(tables_info):
disk_bytes, tables_count = calculate_summaries(database)
cpu = None
if disk_bytes > 0:
cpu = cpu_favoring_disk_bytes(disk_bytes)
elif tables_count > 0:
cpu = cpu_favoring_tables_count(tables_count)
else:
cpu = 0
assert cpu is not None, 'cpu not defined in a code path'
... various tens of lines of code ...
return cpu + SAFETY_MARGIN
There are two key ideas behind these tiny adjustments:
- First, the scope of the
cpu
variable is explicitly declared to be outside of the conditional. This happens right before entering the alternative code paths, so there is a clear expectation that this variable will be used later on outside of the conditional statement. - And second, the expectation that the
cpu
variable has been assigned a value in all possible code paths is explicitly coded by an assertion. This is important. Consider that the code in each branch of the conditional could be multiple lines long, and the assignment of a value to our variable might be buried among those lines. With the assertion, we want to ensure that whatever happens to the contents of the conditional branches in future revisions does not mean that the initialization of the variable is lost (by mistake).
Everything mentioned here applies to loops and other higher-level constructs as well, particularly try
/catch
blocks. Consider this code:
def get_current_user_data():
try:
user_data = helper_module.get_user_data(os.getuid())
except helper_module.UserQueryError, e:
raise BackendError(e)
try:
group_data = helper_module.get_group_data(os.getgid())
except helper_module.GroupQueryError, e:
raise BackendError(e)
return user_data, group_data
The code in this example performs two separate queries via a helper module, and these queries raise exceptions defined in the helper module. To be self-contained, our get_current_user_data()
rewrites these exceptions as a generic exception defined in the current module. The issue, however, is that the variables defined within the try
block are accessed later separately. I would instead do:
def get_current_user_data():
user_data = None
try:
user_data = helper_module.get_user_data(os.getuid())
except helper_module.UserQueryError, e:
raise BackendError(e)
assert user_data is not None
group_data = None
try:
group_data = helper_module.get_group_data(os.getgid())
except helper_module.GroupQueryError, e:
raise BackendError(e)
assert group_data is not None
return user_data, group_data
Which is very similar to what we did above for conditionals.
To wrap everything up, keep these guidelines in mind:
- Define variables in the outermost block where they are referenced.
- If you don’t have a good default value to which to initialize the variable to—because, for example, its value is computed in a conditional path—set it to
None
and later assert that it has been set to something different.