Working with bytes in Python 3

Background

Sometimes you find yourself needing to work at the byte-level in an application you are working on. I feel that in Python there are not enough examples of how to do this. There is also a lot of potential to over-complicate the solution.

This Example

I plan to cover several aspects of working with bytes in this example. I’ll cover working with the struct package, the bytearray built-in and the ctypes module.

The Code

from ctypes import c_int, Structure
from struct import pack_into, unpack_from
# create 2 buffers, one smaller than the other for
# demonstration purposes.
buff1 = bytearray(64)
buff2 = bytearray(32)
# first we'll use the struct package to initalize the arrays.
# initialize buff1 with 32 integers.
pack_into('I' * 16, buff1, 0, *range(0, 16))
print('Buffer 1:', unpack_from('I' * 16, buff1, 0))
# for the sake of demonstration, we'll work with buff2.
# copy part of buff1 into buff2, since we're using
# bytearrays, this should be equivalent to a memcpy
buff2[:] = buff1[:32]
# test it out, did we copy 32 bytes from buff1 into buff2?
print('Buffer 2:', unpack_from('I' * 8, buff2, 0), end='\n\n')
# We can also use the ctypes package to access the buffers
# We can access it piece-meal like this.  Note that this
# copies the buffer, if we didn't want a copy. from_buffer
# is the function we would use.
x = c_int.from_buffer_copy(buff2, 8)
y = c_int.from_buffer_copy(buff2, 12)
print(f'x, y as 2 standalone c_ints: {x.value}, {y.value}', end='\n\n')
# You can also create C Structures to access the data
# Define a simple ctypes structure.
class Point(Structure):
    _fields_ = [
        ('x', c_int),
        ('y', c_int)
    ]
p1 = Point.from_buffer_copy(buff2, 8)
print(f'x, y as elements of a ctype structure (copied): {p1.x}, {p1.y}')
# note that since this is a copy any manipulation doesn't
# effect the buffer.
p1.x, p1.y = 50, 51
print(f'p1.x, p1.y set to: {p1.x}, {p1.y}')
print('Show buff2 is unchanged:', unpack_from('I' * 8, buff2, 0), end='\n\n')
# so, if we wanted to directly manipulate the buffer using the structure
p2 = Point.from_buffer(buff2, 8)
print(f'x, y as elements of a ctype structure (not copied): {p1.x}, {p1.y}')
p2.x, p2.y = 100, 101
print(f'p2.x, p2.y set to: {p2.x}, {p2.y}')
# see that the 3rd and 4th element now been changed.
print('Show buff2 is changed:', unpack_from('I' * 8, buff2, 0), end='\n\n')
# finally note that the original buffer is unchanged.
print('Show buff1 unchanged:', unpack_from('I' * 16, buff1, 0))

Output

Buffer 1: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
Buffer 2: (0, 1, 2, 3, 4, 5, 6, 7)
x, y as 2 standalone c_ints: 2, 3
x, y as elements of a ctype structure (copied): 2, 3
p1.x, p1.y set to: 50, 51
Show buff2 is unchanged: (0, 1, 2, 3, 4, 5, 6, 7)
x, y as elements of a ctype structure (not copied): 50, 51
p2.x, p2.y set to: 100, 101
Show buff2 is changed: (0, 1, 100, 101, 4, 5, 6, 7)
Show buff1 unchanged: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)

A statement on copying

Be careful with slicing a python bytearray, bytes or array.array. Slicing creates a copy and can impact the performance of your application. There is a better way; enter memoryview. Memoryview works with anything that implements the Python Buffer Protocol and makes slicing very efficient. Slicing a memoryview will result in another memoryview, not a copy of the bytes represented.

Extra Reading

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply