Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Writing a Lexer, regular expressions

  • 04-02-2011 08:01PM
    #1
    Registered Users, Registered Users 2 Posts: 324 ✭✭


    Hi there,

    I am tasked with writing a lexer, its a college assignment so I wont ask you to write it for me, dont worry :P But I could really use some help on clearing up the concept!... the notes are terrible and the lecturer loves his jargon :o

    As I understand it, lexers parse a body of text and numbers etc, tokenise it and store it in symbol table, is this correct?

    In terms of parsing, I should be following a finite state automaton model, perhaps employing a method for each state and and gererate a token in each? and calling the next method depending on the value of a state? Does this make sense? Or should I use a looping switch statement with methods defined in each for nums, chars, etc?

    Finally, how are regular expressions involved in the parsing process, if I am coding in a procedural manner, like below, where are they involved, should they be? Are they part of the grammar, and is that involved in my code?

    Heres a excerpt from my notes, entitled "Constucing a lexer". I am slightly confused by it as there is no context for th 'number' variable and im not sure how its related to a token. The assignments are also a bit strange to me (bold)
    state:=0;
    ch:=readchar(inputstream);
    loop forever
        {switch(state):
            case 0: if ch=‘0’ or ch=‘1’ or … ch=‘9’
                    then {[B]number:=ch-’0’;[/B] state:=1;
                          ch:=readchar(inputstream); break}
                    else {unreadchar(inputstream);
                          return false}
            case 1: if ch=‘0’ or ch=‘1’ or … ch=‘9’
                    then {[B]number:=number*10+ch-’0’;[/B] state:=1;
                          ch:=readchar(inputstream); break}
                    else {unreadchar(inputstream);
                          return true}
        }
    }
    

    Sorry if im way off the mark, and am asking too many questions :P but if someone could answer any of them, or even point me to some useful resources regarding lexers (book is also ****e :/) that would be great! Cheers for reading.


Comments

  • Registered Users, Registered Users 2 Posts: 397 ✭✭Design_Dude


    In the same boat as ya!
    Start off by drawing out your FA, and minimizing it.
    Then put it into code using each of your states as a new method.
    Thats all im doing for now anyways...


  • Registered Users, Registered Users 2 Posts: 324 ✭✭greyed


    Ah, nice one, cheers! Its a tricky one alright.


Advertisement