I’ve been learning Erlang and one thing I like is the idea of atoms (aka symbols in Ruby I think) 1. Atoms in Erlang are written by just using lowercase. They are pointers to a global string table. This means comparison operations are very fast (you are just comparing two pointers) but for debugging and pattern matching the atom has a meaning (the string).
Atoms are particularly useful in pattern-matching languages like Erlang, but they’re a
useful idea even without pattern matching. They’re a very lightweight way of communicating
something with a clear semantic meaning (compare returning the atom ok
with the integer
0
) with very little runtime cost.
There are other solutions such as enums and constant definitions but these suffer from
two problems. One, you have to go and declare them, so for a small function its tempting
to just hardcode a value. Two, in most languages enums and constants map from a semantic
meaning to an arbitrary value (for example, an enum in C is a int). When I return the atom
ok
I mean ok
. Testing of ok == 0
or ok + 1
are meaningless operations. Atoms are
restricted - the only thing you can do is test for equality. If my
debugger shows the value 0
instead of the meaning, that’s not very helpful.
I was discussing atoms in python with a colleague when some googling revealed the Python
almost has atoms. Python has a built-in function
intern
which
will insert a string in a global table ensuring that future equality comparisons with
other strings that are interned are a constant time operation. It automatically interns
string literals. Thus, python string literals can be used at atoms. The debugger will,
of course show the content of the string.
There are some disadvantages to using a string as an atom. For one thing, its not common
practice in python so people might get confused. Additionally, strings support additional
operations which don’t make sense for atoms (for example: 'ok' + 'error'
will happily
give you 'okerror'
). One solution would be to make a thin wrapper which doesn’t
expose these functions.
class atom(object):
def __init__(self, a):
self._a = intern(a)
def __eq__(self, another_atom):
return another_atom._a == self._a
def __ne__(self, another_atom):
return not self.__eq__(another_atom)
def __repr__(self):
return 'atom: ' + self._a
def __hash__(self):
return hash(self._a)
Python is going to get enums. Overall, I think the true enums provide many of the advantages of atoms (such as retaining their semantic meaning)2. The main difference is the need to declare them in advance. This is good for big projects it leaves one tempted to do something simpler for a small function.