[pycrypto] AES, python 2.7 vs 3

Paul_Koning at Dell.com Paul_Koning at Dell.com
Mon Feb 10 08:40:53 PST 2014


On Feb 8, 2014, at 3:25 AM, Dave Pawson <dave.pawson at gmail.com> wrote:

> On 7 February 2014 17:01,  <Paul_Koning at dell.com> wrote:
>> That’s what pycrypto needs to do, yes.  From what Dwayne says, it sounds like that’s currently not finished yet.
>> 
>> The easiest way to look at this is as a data type matching exercise.  Cryptographic operations are functions that operate on sequences of bytes.  Unicode strings are NOT sequences of bytes — they are an entirely different data type.
> 
> How (if at all) does that statement change if I am using unicode,
> utf-8 encoding please Paul?
> As I understand it, utf-8 constitutes octets? Or am I wrong?

Yes, UTF-8 is one of several encodings you can use for Unicode.  It’s probably the most popular one for a variety of reasons.  So unless you have a reason to do otherwise, UTF-8 is a good default choice for encoding of Unicode strings.

> 
> 
>> 
>> It is valid to speak of specific encodings of Unicode strings as sequences of bytes, but the key point is that you have to do the encoding — which means, first of all, choosing WHICH encoding — in order to have that sequence of bytes.
> 
> And how does that match with Python 3, which (appears |  is) based on
> Unicode strings?

The “str” type is Unicode.  To turn it into “bytes” — for I/O, for crypto, or for other purposes that need octet strings, you have to encode the Unicode.  As I mentioned, UTF-8 is a typical choice, but if you had a reason for using something else, you would specify that encoding instead.

For example:
$ python3
>>> s="foo"
>>> type(s)
<class 'str'>
>>> b=s.encode("utf-8")
>>> type(b)
<class 'bytes'>
>>> b
b'foo'
>>> s="aéö"
>>> b=s.encode("utf-8")
>>> b
b'a\xc3\xa9\xc3\xb6'

> 
> 
>> 
>> Since you have to make those choices, it’s not safe for APIs like crypto to accept strings and effectively do some encoding as a side effect.  Better to require bytes in the interface, and let you handle the encode/decode steps explicitly, in the way you want them to be done.
> 
> 
> Thanks for that Paul... you seem to be pointing at Pycrypto as the
> source of my problem. An approach paper / web page would be very
> helpful to me (and others facing the same issues) in managing this
> slippery (to me) aspect of crypto.
> 
> Again, thanks for the comments.

This http://docs.python.org/3/howto/unicode.html might be helpful for a much more detailed explanation of what I’ve been talking about.

	paul


More information about the pycrypto mailing list