[pycrypto] Crypto.Random crashes due to unaligned access
andyhhp at gmail.com
Sun Oct 27 15:50:41 PDT 2013
On 27/10/2013 22:07, Sebastian Ramacher wrote:
> On 2013-10-24 09:59:12, Dwayne Litzenberger wrote:
>> Hi Greg!
>> What version/build of GCC is this? Does "setup.py test" crash for you as well?
>> I'd rather figure out how to fix the problem than to start making copies of the key.
>> Greg Price <gnprice at gmail.com> wrote:
>>> I get the following crash in a PyCrypto built from the current master,
>>> af058ee (aka v2.6.1-136-gaf058ee):
>>>>>> import Crypto.Random
>>> Segmentation fault (core dumped)
>>> This is on i686. I compiled with GCC 4.6.3 (or "Ubuntu/Linaro
>>> GDB shows the crash is here:
>>> Program received signal SIGSEGV, Segmentation fault.
>>> aes_key_setup_enc (keylen=32, cipherKey=
>>> 0x84900a8) at src/AESNI.c:122
>>> 122 rk = _mm_loadu_si128((const __m128i*) cipherKey);
>>> at which the instruction is
>>> (gdb) x/i $pc
>>> => 0xb78f2600 <ALGnew+2160>: movdqa %xmm0,0x40(%esi)
>>> This is an aligned store. The documentation of MOVDQA says it should
>>> be 16-byte aligned. The value of rk (aka %esi + 0x40) is only 8-byte
>>> (gdb) p rk
>>> $5 = (__m128i *) 0x84900a8
>>> (gdb) p/x $esi
>>> $9 = 0x8490068
>>> It's not clear to me why GCC generated an aligned instruction here --
>>> in fact, the definition of _mm_loadu_si128 in my emmintrin.h appears
>>> to be
>>> extern __inline __m128i __attribute__((__gnu_inline__,
>>> __always_inline__, __artificial__))
>>> _mm_loadu_si128 (__m128i const *__P)
>>> return (__m128i) __builtin_ia32_loaddqu ((char const *)__P);
>>> and the name of that builtin sure sounds more like MOVDQU than MOVDQA.
>>> Perhaps GCC somehow decides that it can prove the pointer is aligned
>>> I don't know why GCC makes this mistake, or (since it's never the
>>> compiler's fault) which code is lying to it about something being
>>> aligned. Anyone know how to investigate this kind of question?
>>> A workaround would be to make sure that the cipherKey argument to
>>> aes_key_setup_enc() in src/AESNI.c is always 16-byte aligned. At
>>> present, that argument comes straight from the first Python-level
>>> argument to _AESNI.new(); see the PyArg_ParseTupleAndKeywords() call
>>> in src/block_template.c. I guess to implement this workaround we'd
>>> copy the key to a new, aligned buffer if it's not aligned.
>>> I can send a patch for that workaround if it seems like the best
>>> approach. Happy to hear alternatives, and of course it'd be most
>>> satisfying if we can understand why the compiler is emitting this
>>> output in the first place.
> I debugged this for a while and the problem is not _mm_loadu_si128.
> That's fine. It generates the correct movdqu instruction for that. The
> problem is the rk = ... part. On amd64, ek and dk from block_state
> get aligned at 16 byte boundaries and everything works out properly.
> However, on i386 this does not appear to be true.
> I have some ideas how to fix it and will hopefully come up with a patch
> next week.
The alignment might quite easily be chance, or because of the Red Zone
mandated for 64bit, which causes stack frame alignment on 128byte
I accept your argument about the assignment being the problem, but
still, it is invalid for gcc to be emitting a movdqa instruction in this
case; the target address is demonstrably not 16 byte aligned.
More information about the pycrypto