Basic Parsing Concepts

From Computer Tyme Support Wiki

Revision as of 07:05, 2 October 2005 by Marc (Talk | contribs)
Jump to: navigation, search
  • GULP - Main GULP Index

Routines - Data Structures - Concepts

First - the tools. These are the pieces that need to be created in order to make this work. I will outline the concepts here. These shouldn't be hard to write. Once I define some concepts and routines we'll put it together and give you the big picture.

  • NextWord - NextWord is the main text parsing routine. It take the next logical word from the imput stream for processing. If the input stream is X=Y+(Z*5)/10 the NextWord tokens will break down as:
X = Y + ( Z * 5 ) / 10

How do you code NextWord? Probably some giant regular expression. It all depends on how complex you want to make the interpreter. But no mater what you decide to use for a syntax, the idea is that NextWord tokenizes the input stream separating out the individual commands.

  • FindToken(Token) - After you parse the next work in the inpot stream you then need to look it up to see what it means. Different tokens maen different things. Is it a constant or a variable or a command. First you test it to see if it's a number. If so - then treat it as such. Is it a string constand? You tell that by looking for "" or or whatever else you migh want to use to say "this is a string". Otherwise - it's a command.

Obviously you are going to have to have a table of commands and a way to search for them so when you have a token then you can look it up to see what it is supposed to do. I've has good resultes using hash tables to find things fast. The hash table stores a pointer that points to the runtime code if you are using an interpreter model, or a token that is compiuled to represent that command so that at run time the token is an offset into and array of pointers that pint to the runtime code. So findToken take a command - looks it up and it returns something thatr represents what the command is supposed to do.

Personal tools