Fork me on GitHub
Objective-PHP is a port of the Objective-C (or Objective-J runtime to PHP. This adds the language features of Objective-C nestled nicely inside the syntax of Objective-C. Moka is a port of the Apple Cocoa Frameworks (or Cappuccino).

The Tokenizer

What is a Tokenizer?

The Tokenizer takes on the job of converting the input characters that make up the source code into more easily handled symbols. For example, it may find the characters that make up a specific keyword and then output one ID number that represents that keyword. This thus turns the input source code into a sequence of so called tokens.

For example the code $a = @protocol(ProtocolName); will become a sequence of

  1. T_VARIABLE ($a)
  2. =
  3. T_OBJPHP_PROTOCOL
  4. (
  5. T_STRING (ProtocolName)
  6. )
  7. ;

where T_VARIABLE, T_OBJPHP_PROTOCOL, T_STRING are all constants.

The subsequent parse stage will now be much simpler to implement.

Implementation Details

Since Objective-PHP builds on PHP we can use the PHP tokenizer (which is exposed to us via the in-built PHP method token_get_all) to initially tokenize the input Objective-PHP.

However, the PHP tokenizer has no knowledge of the ObjPHP keywords so the Tokenizer object takes the token stream from token_get_all and then looks for the @ symbol and a T_STRING token (or other PHP token) which when combined with the @ symbol makes a ObjPHP token.

For example the tokens @ and T_PUBLIC will be consumed by the Tokenizer and a T_OBJPHP_PUBLIC will be produced.

PHP produces tokens either as strings or as associative arrays. If the token is itself simply one character, e.g. {, then it is returned as ‘{‘. However if the token is a string e.g. protected, then it is returned as an array containing the string protected, the ID of this keyword, T_PROTECTED, the PHP name of this keyword, T_PROTECTED, and the line number the token is on on the input source.

Important Notes

Relavent Source Code

Other Information


Document status: COMPLETE for current version.