Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Intels AES-NI Benchmarks

  • 15-03-2013 11:28pm
    #1
    Registered Users, Registered Users 2 Posts: 3,888 ✭✭✭


    Hi,

    Looking for a help on a project Im working on-

    I'm testing brute forcing some encryption for a report Im doing - and would like to know what effect having a decent CPU would have.

    My 2.5Ghz i5 laptop (2 core - 4 threads), using Intels very impressive new onchip AES commands can guess 10 Million password attempts per second.
    That's on a single thread - running more in parallel however surprisingly doesnt make it much faster (about 20% faster on two threads) - which seems to suggest there is only one shared AES processor unit on the die - which I havn't seen mentioned anywhere....it might be my low cost laptop...but Id like to see if it improves with an i7 chip.


    So I would anyone out there have a decent i7 (must be made after 2009- when they started adding the AES instructions) - Id really like to see if a better chip can handle multiple threads and double or more the performance.


    Some test code is on the Intel site that will benchmark encryption - it creates a csv report (takes aaages to run - 30 mins or so).

    important - you need run it like this to capture the log

    aessampletiming86.exe > log.txt


    Download:
    http://software.intel.com/en-us/articles/download-the-intel-aesni-sample-library

    (you need compile it using VS2008 - Free edition should do - or pm me if you cannot compile it)

    No logs are anywhere on the internet yet for this benchmark.

    If anyone can attach the log zipped - Ill summarise the findings...

    “Roll it back”



Comments

  • Closed Accounts Posts: 128 ✭✭morlock_


    Even though you've requested i7 benchmarks, I've got an i5 2500k with 4 physical cores which are similar to the i7 models.

    Graphics processor and memory controller are separate but I'd say the AES-NI are built in to each core.

    What model of CPU do you have?

    results


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    Off topic question: How does that compare to using GPU's? Obviously backend hardware doesn't usually come with a fancy GPU unit, but recently deployed ones will all have that AES-NI enhancement.


  • Registered Users, Registered Users 2 Posts: 3,888 ✭✭✭ozmo


    morlock_ wrote: »
    Even though you've requested i7 benchmarks, I've got an i5 2500k with 4 physical cores which are similar to the i7 models.

    Graphics processor and memory controller are separate but I'd say the AES-NI are built in to each core.

    What model of CPU do you have?

    results


    Thanks so much for that - I was considering getting a better cpu - but those results are pretty much what I am seeing on my i5 (mine is a lesser i5 2520M)

    The Documentation says you should be able run AES NI on multiple threads - so enabling being able to preforming these encryptions in parallel - but my custom software and Intels own software (the one I linked to above) don't show this happening on the i5 processors...
    AES instructions (CPU=I5 2500K) (decoding AES 128)
    #threads	#bytes		min		max		avg
    1		16		18.221669	18.221669	18.221669
    2		16		27.67624	28.40234	28.03929
    4		16		53.817085	54.249585	54.033335
    

    This is an extract from the test you ran - you can see that decoding a 16 byte string takes 18 time cycles (to test if a password is valid - and you know the what to expect there - you really only need decode the first 16 bytes of a file) - doing the same using two threads in parallel should take the same time - one running on each core (ie. about 18) - but in fact it takes 28 - not much better than single threaded.

    Doing it four in parallel is the same - (taking 54 time cycles).

    But when you try with longer sequences, likely when you know the password - the story is better - and you can in-fact decode 4 streams at the same time (0.932305, 0.9540885 and maybe 1.0273765 are about the same time when you decode 32K bytes)
    AES instructions (CPU=I5 2500K) (decoding AES 128)
    #threads	#bytes		min		max		avg
    1		32656		0.932305	0.932305	0.932305
    2		32656		0.953702	0.954475	0.9540885
    4		32656		1.027032	1.027721	1.0273765
    

    So it world appear that using this cpu (while 10 times better than software alone) when brute force guessing lots of passwords - multi-threading just does not help.

    But if you know the password - then it would be very useful for eg. streaming in parallel from maybe a raid array or unzipping lots of files 4 at a time in parallel.


    If anyone has an I7 - it would be good to compare - there must be some difference for the premium people pay over an I5?... Thanks :)

    “Roll it back”



  • Registered Users, Registered Users 2 Posts: 3,888 ✭✭✭ozmo


    Khannie wrote: »
    Off topic question: How does that compare to using GPU's? Obviously backend hardware doesn't usually come with a fancy GPU unit, but recently deployed ones will all have that AES-NI enhancement.


    I didnt want to shell out for new hardware for the report - so just using what I had to hand.
    I was testing breaking a string I encoded with DotNet encryption (AES).


    Using i5 2520M ( a laptop)

    * software only - 2 threads (as its a dual core chip)-
    DotNet C# software - I can get 1.5 Million password guesses a second.
    C++ software - I can get 3 Million password guesses a second.

    * using 2 threads using AES NI instructions - I can get 10 to 15 Million password guesses a second.

    Using CUDA
    * using an old low spec laptop NVidia card I was able to get 10 Million password guesses a second.

    * Using an Amazon Server - they have an option to use dual GPU (NVIDIA Keplar Cores) - and I was able to get 45 Million password guesses a second...
    Really impressive stuff - but that machine was costing $1 a hour - so I only did a couple hours testing and deleted it.

    “Roll it back”



  • Closed Accounts Posts: 128 ✭✭morlock_


    ozmo wrote: »
    Thanks so much for that - I was considering getting a better cpu - but those results are pretty much what I am seeing on my i5 (mine is a lesser i5 2520M)

    The Documentation says you should be able run AES NI on multiple threads - so enabling being able to preforming these encryptions in parallel - but my custom software and Intels own software (the one I linked to above) don't show this happening on the i5 processors...

    Intel used an i7-980X when developing the code and recording their times
    If anyone has an I7 - it would be good to compare - there must be some difference for the premium people pay over an I5?

    Initially wanted the 2600k model but 2500k was roughly €100 cheaper at the time iirc.
    The only difference I could see in them was a 2MB cache but I've seen others mention 2 threads per physical core which i don't normally pay attention to.
    The 2500k doesn't have hyper threading but seems i7 does which might explain the 2x speed up.

    I removed the SetThreadAffinity() call and got the following results
    AES instructions detected							
    #threads	#bytes  Dec-CBC-128		
    		            min	        max	         avg
    1	        16	    24.738363	24.738363	 24.738363
    2	        16	    28.103142	28.391567	 28.247354
    4	        16	    51.942719	53.603494	 52.92789
    
    AES instructions detected       
    #threads #bytes  Enc-CBC-128 
                             min             max           avg
    1           16         21.380408   21.380408  21.380408
    2           16         16.911054   18.346071  17.628563
    4           16         32.524856   33.495954  33.020692
    


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 3,888 ✭✭✭ozmo


    morlock_ wrote: »
    I removed the SetThreadAffinity() call and got the following results
    AES instructions detected							
    #threads	#bytes  Dec-CBC-128		
    		            min	        max	         avg
    1	        16	    24.738363	24.738363	 24.738363
    2	        16	    28.103142	28.391567	 28.247354
    4	        16	    51.942719	53.603494	 52.92789
    

    Really odd first result there - it appears to be running a bit too slow for the first result (when compared to results below) - then as before - each extra thread does not speed up the process - (ie. 4 threads takes twice as long as 2)?
    It just seems like we are getting the functionality of just one AES unit per chip rather than per core?


    On my CPU its like this:
    With Affinity code in place....adding each thread adds to time
    #threads	#bytes	min	max	avg
    1	16	10.760881	10.760881	10.760881
    2	16	25.575455	26.222481	25.898968
    3	16	27.245365	37.747897	34.186728
    4	16	49.461505	51.560212	50.673783
    

    And without Affinity... - No difference - no increased throughput due to parallelisation.
    #threads	#bytes	min	max	avg
    1	16	10.441436	10.441436	10.441436
    2	16	21.58211	21.637328	21.609719
    3	16	26.139087	37.025686	33.396091
    4	16	47.12059	49.179552	48.153305
    




    Just for fun - what this means in real life - is that...

    A 7 Character password using say the letters
    all lower case + numerals + 10 special other characters)


    So the time to bruteforce break:

    using AES-NI 2.5Ghz using 1 thread...
    10			10 clock cycles per byte (from the test results above)
    160			there are 16 bytes needed to decrypt
    2,500,000,000	CPU units in 1 second
    15,625,000		attempts per sec
    7			password length
    46			Characters (26+10+10)
    4.35818E+11	Permutations ( Characters ^ password length)
    27,892		seconds  (permutations/attempts per sec)
    
    

    so thats 7.7 hours max to find the 7 character password of that spec (using bruteforce - more selective lists would vastly reduce this time).
    or 2.5 to find a shorter 6 character password if you include upper case also.

    Using that Nvidia dual gpu - or getting all 4 cores working on the Intel CPU - could be about 4 times faster....


    long passwords really are a must - short passwords (7 characters) even spelt odd or with Leet characters- can seem be broken quite quickly....

    “Roll it back”



Advertisement