I'm currently a PhD student at FCTUC, currently working on the implementation of secret and public-key cryptographic algorithms for modern parallel CPUs and other devices, such as graphics processing units (GPUs).
You can reach me at sneves at dei dot uc dot pt. My PGP public key is 0x3C2C6C6F.
I have done research on efficient cryptographic primitives in NVIDIA GPUs. Some symmetric primitives, such as AES and Salsa20, do match the computation model quite well, particularly in parallel modes of operation.
On the other hand, public-key cryptography is trickier to implement efficiently in GPUs, especially in the G80 and GT200 architectures, where hardware multipliers are limited to 24 bits. After exploring various integer representations and modular multiplication approaches, we have reached positive results. These are all available in my master's thesis and subsequent article .
Due to their ubiquitous nature, symmetric cryptographic functions are required to be as fast as possible without sacrificing security. I am interested in design and implementation choices that make cryptography as fast as possible, by taking advantage of modern hardware features, such as superscalar execution, vector units, and multiple cores/threads.
The BLAKE2 function is an optimized variant of BLAKE that offers similar security guarantees, but prioritizes speed. It beats MD5 and SHA-1 on most current desktop processors. It also supports keying and tree parameters out-of-the-box.
More recently I have also co-authored NORX, a fast parallel authenticated encryption scheme submitted to the CAESAR competition.
Why SHA-1 should not be used for any secure application.
A faster ChaCha and BLAKE implementation for Intel processors.
Cryptology ePrint Archive - Source of the newest developments in cryptographic research.
Agner Fog's Software optimization resources - One of the best resources on x86 assembly optimization.
Helger Lipmaa's crypto pointers.