bytes != str ... a few notes

Michael Watkins

2008-12-15 18:01:50 UTC

Post by John Machin
=== Comparing bytes objects with str objects ===
A tentative solution when maintaining one codebase which runs as is on
return x.encode(encoding)
def STR2BYTES(x): return x

Perhaps your STR2BYTES function should test to see if "x" is already a
byte string, to avoid recasting errors. As it stands should "x" be recast
later down the road by some other chunk of code which is oblivious to the

Post by John Machin

STR2BYTES(something_already_a_byte_string)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in STR2BYTES
AttributeError: 'bytes' object has no attribute 'encode'

An approach similar to yours is what the authors of Durus, a ZODB-like
Python Object Database, have done. They add an isinstance(s, byte_string)
test to avoid any attempt at re-encoding a byte string (which would lead
to an attribute error since a byte string will never have an "encode"
method.

Sadly the same is not true in 2.x and below.

Browse the relevant module:

http://www.mems-exchange.org/software/durus/Durus-3.8.tar.gz/Durus-3.8/utils.py

Or peek at this snippet from within::

if sys.version < "3":
from __builtin__ import xrange
from __builtin__ import str as byte_string
def iteritems(x):
return x.iteritems()
def next(x):
return x.next()
from cStringIO import StringIO as BytesIO
from cPickle import dumps, loads, Unpickler, Pickler
else:
xrange = range
from builtins import next, bytearray, bytes
byte_string = (bytearray, bytes)
def iteritems(x):
return x.items()
from io import BytesIO
from pickle import dumps, loads, Unpickler, Pickler

def as_bytes(s):
"""Return a byte_string produced from the string s."""
if isinstance(s, byte_string):
return s
else:
return s.encode('latin1')

empty_byte_string = as_bytes("")

I wish it were as easy as searching for '\x'-y looking literals to find
areas that will work in 2 but fail in 3. That's a start but there are
little surprises to find elsewhere. Consider the following which of course

Post by John Machin

x = ":".join(('1','plus','two'))
x

'1:plus:two'

Post by John Machin

hashlib.md5(x)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object supporting the buffer API required

Post by John Machin

hashlib.md5(as_bytes(x))
hashlib.md5(as_bytes(x)).hexdigest()

'4e3a3a8075a6982177c24af5179ec82c'

Failing code and failing unit tests ought to pick up most of these sorts
of issues.