The simple thoughts of Jonathan J Hunt

Atoms/Symbols in Python

I’ve been learning Erlang and one thing I like is the idea of atoms (aka symbols in Ruby I think) 1. Atoms in Erlang are written by just using lowercase. They are pointers to a global string table. This means comparison operations are very fast (you are just comparing two pointers) but for debugging and pattern matching the atom has a meaning (the string).

Atoms are particularly useful in pattern-matching languages like Erlang, but they’re a useful idea even without pattern matching. They’re a very lightweight way of communicating something with a clear semantic meaning (compare returning the atom ok with the integer 0) with very little runtime cost.

There are other solutions such as enums and constant definitions but these suffer from two problems. One, you have to go and declare them, so for a small function its tempting to just hardcode a value. Two, in most languages enums and constants map from a semantic meaning to an arbitrary value (for example, an enum in C is a int). When I return the atom ok I mean ok. Testing of ok == 0 or ok + 1 are meaningless operations. Atoms are restricted - the only thing you can do is test for equality. If my debugger shows the value 0 instead of the meaning, that’s not very helpful.

I was discussing atoms in python with a colleague when some googling revealed the Python almost has atoms. Python has a built-in function intern which will insert a string in a global table ensuring that future equality comparisons with other strings that are interned are a constant time operation. It automatically interns string literals. Thus, python string literals can be used at atoms. The debugger will, of course show the content of the string.

There are some disadvantages to using a string as an atom. For one thing, its not common practice in python so people might get confused. Additionally, strings support additional operations which don’t make sense for atoms (for example: 'ok' + 'error' will happily give you 'okerror'). One solution would be to make a thin wrapper which doesn’t expose these functions.

class atom(object):
    def __init__(self, a):
        self._a = intern(a)
    def __eq__(self, another_atom):
        return another_atom._a == self._a
    def __ne__(self, another_atom):
        return not self.__eq__(another_atom)
    def __repr__(self):
        return 'atom: ' + self._a
    def __hash__(self):
        return hash(self._a)

Python is going to get enums. Overall, I think the true enums provide many of the advantages of atoms (such as retaining their semantic meaning)2. The main difference is the need to declare them in advance. This is good for big projects it leaves one tempted to do something simpler for a small function.

  1. Erlang did not originate the idea of of atoms - its just where I’m most familiar with them.

  2. For a more negative take on enums in python see this post