Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

password algorithms

  • 10-03-2007 6:16pm
    #1
    Closed Accounts Posts: 1,567 ✭✭✭


    ;
    ; ok, this probably won't interest many, but i believe its interesting, in terms of password algorithm design.
    ; (being the nerd i am with too much time on my hands) what the heck..
    ; md5crypt shows subtle weaknesses and shortcuts which shouldn't be available to potential attackers
    ; in possession of password databases.
    ;
    ; the incomplete assembly code and macros here, assume to process less than or equal to 8 bytes of salt,
    ; less than 16 bytes of a password, (which with the 1000 iteration loop, in C below, should not
    ; exceed 56 bytes of a buffer size) to process in md5_block_x86
    ;
    ; it is recommended that if using md5crypt() algorithm, a 16 byte salt be used instead
    ; of "normal" 7-8 bytes, and atleast an 8 byte (>16 is better) password, to ensure greater security of the hash.
    ;
    ; a 16 byte salt is recommended, because many people may not easily remember a password greater than 8 bytes and less
    ; than 16 in length, or more of course..and mainly because crackers only target passwords less than 16 bytes.
    ; (i'm not talking about "passphrases" here, just incase the zealots want to interrupt.They are definitely better,i agree)
    ;
    ; some services which use md5crypt() may restrict a password to a predefined length,perhaps generated randomly
    ; for initial access to a apache website.
    ;
    ; for a more secure password algorithm, use bcrypt() from openbsd.. but, using blowfish CBC mode instead of ECB :)
    ; bcrypt() was definitely well thought of IMHO, but could certainly be improved exponentially simply by using CBC mode.
    ; there is no performance hit, but for the attacker,yes ;)
    ;
    ; benchmarks
    ;
    ; this MMX : 6250
    ; JTR 1.7 : 5991
    : MDCrack-sse 1.82 : 5062
    ;
    ; intel plan on releasing an 80 core chip in five years, perhaps password algorithms should be designed to withstand
    ; parallel cracking, in order to preserve security of peoples passwords, should the database become compromised.
    ;
    ; some information was gathered from work by polska, http://www.insidepro.com/

    ; this loop is the main processing of md5crypt()/Apache MD5 passwords.

    [PHP]comment ~ history

    /*
    * And now, just to make sure things don't run too fast. On a 60 MHz
    * Pentium this takes 34 msec, so you would need 30 seconds to build
    * a 1000 entry dictionary...
    */
    for (i = 0; i < 1000; i++) {
    MD5Init(&ctx1);

    if ((i & 1) != 0)
    MD5Update(&ctx1, (const unsigned char *)pw, pwl);
    else
    MD5Update(&ctx1, final, 16);

    if ((i % 3) != 0)
    MD5Update(&ctx1, (const unsigned char *)sp, sl);

    if ((i % 7) != 0)
    MD5Update(&ctx1, (const unsigned char *)pw, pwl);

    if ((i & 1) != 0)
    MD5Update(&ctx1, final, 16);
    else
    MD5Update(&ctx1, (const unsigned char *)pw, pwl);

    MD5Final(final, &ctx1);
    }

    md5_block_x86 proto


    .data
    _new db 64 dup (?)

    .code

    main_md5_loop macro
    local i_index

    i_index = 0

    while i_index lt 1000

    lea edi,[_new] ; zero initialize buffer
    push edi
    mov ecx,64/4
    xor eax,eax
    rep stosd
    pop edi

    if i_index and 1
    mov ecx,pwl
    mov ebx,pwl
    mov esi,[pw]
    rep movsb
    else
    mov ecx,16/4
    mov ebx,16
    lea esi,[final]
    rep movsd
    endif

    if i_index mod 3
    mov ecx,sl
    add ebx,sl
    mov esi,[_sp]
    rep movsb
    endif

    if i_index mod 7
    mov ecx,pwl
    add ebx,pwl
    mov esi,[pw]
    rep movsb
    endif

    if i_index and 1
    mov ecx,16/4
    add ebx,16
    lea esi,[final]
    rep movsd
    else
    mov ecx,pwl
    add ebx,pwl
    mov esi,[pw]
    rep movsb
    endif

    lea esi,[_new]
    lea eax,[ebx*8] ; calculate number of bits
    mov byte ptr [esi+ebx],80h ; add the end bit
    mov dword ptr [esi+56],eax ; add number of bits

    lea edi,[final] ; where to save context
    call md5_block_x86

    i_index = i_index + 1
    endm
    endm

    here is using MMX registers.

    main_md5_loop macro
    local i_index

    i_index = 0

    pxor mm0,mm0 ; for zero initializing buffer
    lea eax,[pw]
    lea ebx,[_sp]

    movq mm1,[eax+0*8] ; 1st 8 bytes of password string
    movq mm2,[eax+1*8] ; 2nd 8 bytes

    movq mm5,[ebx+0*8] ; salt

    lea esi,[_new] ; setup buffer
    lea edi,[final]

    while i_index lt 1000

    mov eax,[pwl] ; current password length
    mov ebx,[sl] ; current salt length
    movq mm3,[edi+0*8] ; 1st 8 bytes of previous hash
    movq mm4,[edi+1*8] ; 2nd 8 bytes of previous hash

    xor ecx,ecx ; number of bytes in buffer

    movq [esi+0*8],mm0
    movq [esi+1*8],mm0
    movq [esi+2*8],mm0
    movq [esi+3*8],mm0
    movq [esi+4*8],mm0
    movq [esi+5*8],mm0
    movq [esi+6*8],mm0
    movq [esi+7*8],mm0
    movq [esi+8*8],mm0

    if i_index and 1 ; store the password string
    movq [esi+ecx+0*8],mm1
    movq [esi+ecx+1*8],mm2
    mov ecx,eax
    else ; store the previous hash
    movq [esi+ecx+0*8],mm3
    movq [esi+ecx+1*8],mm4
    mov ecx,16
    endif

    if i_index mod 3 ; store the salt value
    movq [esi+ecx+0*8],mm5
    add ecx,ebx
    endif

    if i_index mod 7 ; store the password string
    movq [esi+ecx+0*8],mm1
    movq [esi+ecx+1*8],mm2
    add ecx,eax
    endif

    if i_index and 1 ; store the previous hash
    movq [esi+ecx+0*8],mm3
    movq [esi+ecx+1*8],mm4
    add ecx,16
    else ; store the password string
    movq [esi+ecx+0*8],mm1
    movq [esi+ecx+1*8],mm2
    add ecx,eax
    endif

    lea edx,[ecx*8] ; calculate number of bits
    mov byte ptr [esi+ecx],80h ; store the end bit
    mov dword ptr [esi+14*4],edx ; store number of bits

    call md5_block_x86

    i_index = i_index + 1
    endm
    endm

    note: LEA is not used (because of P4), which *may* be faster on AMD/pentiums, but not tested fully.

    the only difference between Apache MD5crypt() and FreeBSD md5crypt() is the
    "magic" string, preceeding salt and password.

    "password" "12345678"

    Apache MD5 : $apr1$12345678$9pHAGSBYtlmFtid2xxNog0

    "password" "1234567"
    MD5Crypt : $1$1234567$0y0fKjfowEmIuFCOlHsoJ1

    tests for md5crypt mmx/x86 computed 6250 hashes per second on a 2.0ghz amd64 3200+
    ~
    .386
    .model flat,stdcall

    w_buffer textequ <esi>

    ; ++++++++++++++++++++++++++++++++++++++++

    public md5_block_x86
    public _md5_block_x86

    A_CONSTANT equ 067452301h
    B_CONSTANT equ 0efcdab89h
    C_CONSTANT equ 098badcfeh
    D_CONSTANT equ 010325476h

    ; ++++++++++++++++++++++++++++++++++++++++

    ff macro dwa,dwb,dwc,dwd,x,s,t

    mov edi,dwc
    mov ebp,[w_buffer+4*x]

    xor edi,dwd
    add dwa,ebp

    and edi,dwb
    add dwa,t

    xor edi,dwd
    add dwa,edi

    rol dwa,s
    add dwa,dwb
    endm

    ; ++++++++++++++++++++++++++++++++++++++++

    gg macro dwa,dwb,dwc,dwd,x,s,t

    mov edi,dwc
    mov ebp,[w_buffer+4*x]

    xor edi,dwb
    add dwa,ebp

    and edi,dwd
    add dwa,t

    xor edi,dwc
    add dwa,edi

    rol dwa,s
    add dwa,dwb
    endm

    ; ++++++++++++++++++++++++++++++++++++++++

    hh macro dwa,dwb,dwc,dwd,x,s,t

    mov edi,dwc
    mov ebp,[w_buffer+4*x]

    xor edi,dwd
    add dwa,ebp

    xor edi,dwb
    add dwa,t

    add dwa,edi
    rol dwa,s

    add dwa,dwb
    endm

    ; ++++++++++++++++++++++++++++++++++++++++

    ii macro dwa,dwb,dwc,dwd,x,s,t

    mov edi,-1
    mov ebp,[w_buffer+4*x]

    xor edi,dwd
    add dwa,t

    or edi,dwb
    add dwa,ebp

    xor edi,dwc
    add dwa,edi

    rol dwa,s
    add dwa,dwb
    endm

    .code

    _ebp textequ <esp-4>
    _esi textequ <esp-8>
    _edi textequ <esp-12>
    _ebx textequ <esp-16>

    ; on call:
    ; esi = input of data to use in updating chaining variables, instead of calling MD5Init()
    ; edi = 16 byte buffer for result

    md5_block_x86:
    _md5_block_x86:

    mov [_ebp],ebp
    mov [_esi],esi
    mov [_edi],edi
    mov [_ebx],ebx

    mov eax,[esi][00*04] ; md5 input buffer
    mov edx,[esi][01*04]
    mov ecx,[esi][02*04]
    mov ebx,[esi][03*04]
    ;==================== round 1
    add eax,(((((C_CONSTANT xor D_CONSTANT) and B_CONSTANT) xor D_CONSTANT) + A_CONSTANT)+0d76aa478h)
    rol eax,7
    add eax,B_CONSTANT
    ;==================== round 2
    mov edi,(B_CONSTANT xor C_CONSTANT)
    and edi,eax
    xor edi,C_CONSTANT
    lea edx,[edx+edi+D_CONSTANT+0e8c7b756h]
    rol edx,12
    add edx,eax
    ;==================== round 3
    mov edi,eax
    xor edi,B_CONSTANT
    and edi,edx
    xor edi,B_CONSTANT
    lea ecx,[ecx+edi+C_CONSTANT+0242070dbh]
    rol ecx,17
    add ecx,edx
    ;==================== round 4
    mov edi,edx
    xor edi,eax
    and edi,ecx
    xor edi,eax
    lea ebx,[ebx+edi+B_CONSTANT+0c1bdceeeh]
    rol ebx,22
    add ebx,ecx
    ;####################

    ; =======================================

    ;ff eax, ebx, ecx, edx, 00, 07, 0d76aa478h
    ;ff edx, eax, ebx, ecx, 01, 12, 0e8c7b756h
    ;ff ecx, edx, eax, ebx, 02, 17, 0242070dbh
    ;ff ebx, ecx, edx, eax, 03, 22, 0c1bdceeeh

    ff eax, ebx, ecx, edx, 04, 07, 0f57c0fafh
    ff edx, eax, ebx, ecx, 05, 12, 04787c62ah
    ff ecx, edx, eax, ebx, 06, 17, 0a8304613h
    ff ebx, ecx, edx, eax, 07, 22, 0fd469501h

    ff eax, ebx, ecx, edx, 08, 07, 0698098d8h
    ff edx, eax, ebx, ecx, 09, 12, 08b44f7afh
    ff ecx, edx, eax, ebx, 10, 17, 0ffff5bb1h
    ff ebx, ecx, edx, eax, 11, 22, 0895cd7beh

    ff eax, ebx, ecx, edx, 12, 07, 06b901122h
    ff edx, eax, ebx, ecx, 13, 12, 0fd987193h
    ff ecx, edx, eax, ebx, 14, 17, 0a679438eh
    ff ebx, ecx, edx, eax, 15, 22, 049b40821h

    ; =======================================

    gg eax, ebx, ecx, edx, 01, 05, 0f61e2562h
    gg edx, eax, ebx, ecx, 06, 09, 0c040b340h
    gg ecx, edx, eax, ebx, 11, 14, 0265e5a51h
    gg ebx, ecx, edx, eax, 00, 20, 0e9b6c7aah

    gg eax, ebx, ecx, edx, 05, 05, 0d62f105dh
    gg edx, eax, ebx, ecx, 10, 09, 002441453h
    gg ecx, edx, eax, ebx, 15, 14, 0d8a1e681h
    gg ebx, ecx, edx, eax, 04, 20, 0e7d3fbc8h

    gg eax, ebx, ecx, edx, 09, 05, 021e1cde6h
    gg edx, eax, ebx, ecx, 14, 09, 0c33707d6h
    gg ecx, edx, eax, ebx, 03, 14, 0f4d50d87h
    gg ebx, ecx, edx, eax, 08, 20, 0455a14edh

    gg eax, ebx, ecx, edx, 13, 05, 0a9e3e905h
    gg edx, eax, ebx, ecx, 02, 09, 0fcefa3f8h
    gg ecx, edx, eax, ebx, 07, 14, 0676f02d9h
    gg ebx, ecx, edx, eax, 12, 20, 08d2a4c8ah

    ; =======================================

    hh eax, ebx, ecx, edx, 05, 04, 0fffa3942h
    hh edx, eax, ebx, ecx, 08, 11, 08771f681h
    hh ecx, edx, eax, ebx, 11, 16, 06d9d6122h
    hh ebx, ecx, edx, eax, 14, 23, 0fde5380ch

    hh eax, ebx, ecx, edx, 01, 04, 0a4beea44h
    hh edx, eax, ebx, ecx, 04, 11, 04bdecfa9h
    hh ecx, edx, eax, ebx, 07, 16, 0f6bb4b60h
    hh ebx, ecx, edx, eax, 10, 23, 0bebfbc70h

    hh eax, ebx, ecx, edx, 13, 04, 0289b7ec6h
    hh edx, eax, ebx, ecx, 00, 11, 0eaa127fah
    hh ecx, edx, eax, ebx, 03, 16, 0d4ef3085h
    hh ebx, ecx, edx, eax, 06, 23, 004881d05h

    hh eax, ebx, ecx, edx, 09, 04, 0d9d4d039h
    hh edx, eax, ebx, ecx, 12, 11, 0e6db99e5h
    hh ecx, edx, eax, ebx, 15, 16, 01fa27cf8h
    hh ebx, ecx, edx, eax, 02, 23, 0c4ac5665h

    ; =======================================

    ii eax, ebx, ecx, edx, 00, 06, 0f4292244h
    ii edx, eax, ebx, ecx, 07, 10, 0432aff97h
    ii ecx, edx, eax, ebx, 14, 15, 0ab9423a7h
    ii ebx, ecx, edx, eax, 05, 21, 0fc93a039h

    ii eax, ebx, ecx, edx, 12, 06, 0655b59c3h
    ii edx, eax, ebx, ecx, 03, 10, 08f0ccc92h
    ii ecx, edx, eax, ebx, 10, 15, 0ffeff47dh
    ii ebx, ecx, edx, eax, 01, 21, 085845dd1h

    ii eax, ebx, ecx, edx, 08, 06, 06fa87e4fh
    ii edx, eax, ebx, ecx, 15, 10, 0fe2ce6e0h
    ii ecx, edx, eax, ebx, 06, 15, 0a3014314h
    ii ebx, ecx, edx, eax, 13, 21, 04e0811a1h

    ii eax, ebx, ecx, edx, 04, 06, 0f7537e82h
    ii edx, eax, ebx, ecx, 11, 10, 0bd3af235h
    ii ecx, edx, eax, ebx, 02, 15, 02ad7d2bbh
    ii ebx, ecx, edx, eax, 09, 21, 0eb86d391h

    add eax,A_CONSTANT
    add ebx,B_CONSTANT
    add ecx,C_CONSTANT
    add edx,D_CONSTANT

    mov ebp,[_ebp]
    mov esi,[_esi]
    mov edi,[_edi]

    mov [edi][0*4],eax ; save result
    mov [edi][1*4],ebx
    mov [edi][2*4],ecx
    mov [edi][3*4],edx

    mov ebx,[_ebx]
    ret

    end[/PHP]


Comments

  • Moderators, Recreation & Hobbies Moderators, Science, Health & Environment Moderators, Technology & Internet Moderators Posts: 93,552 Mod ✭✭✭✭Capt'n Midnight


    Why wait 5 years ? http://opensparc-t1.sunsource.net/index.html
    Its 32 simultaneous processing threads, drawing about as much power as a light bulb, give you the best performance per watt of any processor available.
    Or use multiple GPU's

    or just use a botnet

    maximum password age tbh.


  • Closed Accounts Posts: 1,567 ✭✭✭Martyr


    Why wait 5 years ? http://opensparc-t1.sunsource.net/index.html
    Quote:
    Its 32 simultaneous processing threads, drawing about as much power as a light bulb, give you the best performance per watt of any processor available.

    hey,thats pretty neat..but i don't think i could build it.
    i don't mean wait five years, because we already have FPGAs and GPUs available, but it is difficult to see how MD5 would benefit on SIMD architecture like GPU/FPGA and SSE2 (x86), because MD5 was designed for 32-bit processors.
    Because of this, it does not allow easy implementation of parallel hash operations..in fact, its not possible at all.(for processing data sequentially, it isn't)

    i meant that with multi-cores being more commonplace, that algorithms should be designed to resist parallel attacks to some degree.
    there shouldn't be any advantage to using hardware for example.
    the hash algorithms like md4/md5/sha1 used in quite a number of password algorithms, because of the design, does not allow parallel hash operations.

    so, if you were to use a 16 byte salt, and atleast 16 byte password, there would be twice as many md5 operations as there would be with less than 16 bytes.
    also, the attacker would be faced with writing an efficient algorithm to interleave blocks of data after each md5 process before processing in hardware or SIMD architecture...in the end, the attacker would see no benefit from implementing on 32-bit processor..

    that was my point i suppose, to make it more difficult to programatically implement password algorithms in either hardware or software for parallel operations.

    botnets i agree would be big help to crack process.


  • Moderators, Recreation & Hobbies Moderators, Science, Health & Environment Moderators, Technology & Internet Moderators Posts: 93,552 Mod ✭✭✭✭Capt'n Midnight


    The algorithms you listed have speedups in the order of 50%, handy but not earth shattering, new processors / instruction sets may also do the same.

    how secure do you want the stuff ?

    at the lower end you have live feeds from wall street that you pay for, 15 minutes later it's free

    the deeds to you house or the indentity of your secret love child is something you want to keep secure for the rest of your life, problem is that botnets and Mores law mean that a password that would take 50,000 years to break now could probably be done in a few hours in 20 years time even if there are no improvements in algorithms

    you are going to need a lot more one way functions as you can't rely on any particular one to be unbreakable as AFAIK non have been mathically proven to be one way

    as bandwidth increases it might be worth looking at mixing the data/key with random junk - in order to break it ,you first have to find it


  • Closed Accounts Posts: 1,567 ✭✭✭Martyr


    The algorithms you listed have speedups in the order of 50%, handy but not earth shattering, new processors / instruction sets may also do the same.

    how secure do you want the stuff ?

    at the lower end you have live feeds from wall street that you pay for, 15 minutes later it's free

    the deeds to you house or the indentity of your secret love child is something you want to keep secure for the rest of your life, problem is that botnets and Mores law mean that a password that would take 50,000 years to break now could probably be done in a few hours in 20 years time even if there are no improvements in algorithms

    you are going to need a lot more one way functions as you can't rely on any particular one to be unbreakable as AFAIK non have been mathically proven to be one way

    as bandwidth increases it might be worth looking at mixing the data/key with random junk - in order to break it ,you first have to find it

    ; i meant to add this in previous post, but didn't get time. just about why CBC mode is better than ECB in any
    ; password algorithm where a block cipher is used.
    ;
    ; if you look at code from bcrypt.c
    ;
    ; #define BCRYPT_BLOCKS 6
    ;
    ; /* Now do the encryption */
    ; for (k = 0; k < 64; k++)
    ; blf_enc(&state, cdata, BCRYPT_BLOCKS / 2);
    ;
    ; the 3 8-byte (64-bit) block values in cdata are "OrpheanB" "eholderS" "cryDoubt"
    ; we see 64 blf_enc() calls, which internally call blowfish_ecb 3 times. A total of 192
    ;
    ; if brute-force is used to attack these hashes. As a shortcut, one only has to compute 1 8-byte block (64 times)
    ; because the key state is not being updated, it is highly unlikely that we will find collisions to the wrong
    ; password.
    ; one can always encrypt last 16 bytes at end, just to be sure.
    ;
    ; if CBC mode were used, this attack would not work, computing the full 192 rounds would be required...probably
    ;
    ; considering a cpu with lots of registers, where an attacker can decrypt/encrypt the hash/plaintext at same
    ; time (because blowfish state isn't being updated) it might prove faster.
    ;
    ; that would be 32 rounds each with ECB mode, or 96 with recommended CBC mode.
    ; the increase in speed is on the assumption that there will be better pipelining or less dependencies.
    ; this is almost always true on processors with many registers.
    ;
    ; to counter this, consider first hashing the (password+salt) with 160 bit HMAC-SHA-1, use this as the initial blowfish key.
    ; at end of 192 blowfish cbc mode rounds, we have 24 bytes of ciphertext.XOR the last 8 bytes with the 1st 8 bytes.
    ; finally, XOR the first 160 bits blowfish output with the 160-bit HMAC-SHA-1 hash.(discard sha-1 160 hash as its not needed anymore)
    ; the result is the 192-bit, 24 byte bcrypt password hash.
    ;
    ; this way should then force any attacker to use full 192 blowfish encryption rounds, taking longer obviously.
    ;
    ; does that make any sense?
    ;
    ; there is no real performance hit for described, using current bcrypt version, just some minor modifications.
    ;
    ; i'm no expert, but if i were an administrator of a large user/password database, i'd probably use something
    ; thats as secure as possible, without affecting performance of the system too much..bcrypt is good choice.
    ;
    ; i read that bcrypt is resistant to parallel attacks already! so hardware has no advantage over 32-bit processors, nice!


  • Closed Accounts Posts: 1,567 ✭✭✭Martyr


    you are going to need a lot more one way functions as you can't rely on any particular one to be unbreakable as AFAIK non have been mathically proven to be one way

    as bandwidth increases it might be worth looking at mixing the data/key with random junk - in order to break it ,you first have to find it

    there is a novel way to do it, but only for cracking passwords.

    the naive assumption by some is that there is some benefit to using SSE2/GPU or FPGAs for hash operations.there isn't for most, because of their inherent design.
    remember, you require the previous hash in order to process the next 64 byte block, so how can it ever be done in parallel operations, it can't.
    but if its just 1 operation, then there is possibility to optimise cracking certain password algorithms like md5crypt(), thats what i was getting at.

    if 16 byte salts and 16 byte passwords were used, then parallel operations would be more difficult, because of having to arrange blocks of data, it would take longer than just using normal 32-bit instructions.

    a GPU is for floating point operations, most block ciphers/hash algorithms are integer based, with byte manipulation (8 bits)

    perhaps GPU would be good for Elliptic Curve Digital Signature Algorithms, or ciphers of a hardware design.
    blowfish was designed for 32-bit processors, so there is no advantage using hardware(FPGA/GPU), unless you can give a link to some credible evidence

    i should mention that the current release of mdcrack uses information by polska, from insidepro.com but solar designer and gregory forgot to mention that in credits.
    what they don't know is that it is possible to improve THE particular algorithm by 150%, i won't be submitting any code to them in future.


  • Advertisement
  • Closed Accounts Posts: 1,567 ✭✭✭Martyr


    XOR the last 8 bytes with the 1st 8 bytes.
    ; finally, XOR the first 160 bits blowfish output with the 160-bit HMAC-SHA-1 hash.

    this doesn't work either, guess nobody noticed :p


Advertisement