# Building an Interpreter, part 1

January 04, 2020

I'm working my way through Ruslan Spivak's tutorial *Let’s Build A Simple Interpreter*
https://ruslanspivak.com/lsbasi-part1/

Here are my notes and reflections from part 1

# Part 1

Compiler takes source code and preprocesses it into a machine language.

An interpreter interprets source code without turning it into machine language first.

Lexical analysis: read input of characters and convert into a stream of tokens. Token: an object represetning type and value. Example: the token "3" is an integer with value of 3

## exercises

### multiple digits

The assumptions are changing where instead of having digit->plus->digit we have an arbitrary number of digits followed by a plus followed by an arbitrary number of digits. Let’s write a function that eats tokens keeping track of their values until we hit a non integer. Then return the resulting values parsed as an integer.

```
def eat_integers(self):
"""
eats integers tokens until a non integer is found.
Returns teh resulting integers
"""
result = ''
while True:
# eat tokens until you hit a non integer. Assume its a plus!
curr_token = self.current_token
try:
self.eat(INTEGER)
result += str(curr_token.value)
except InterpreterParseError as e:
return int(''.join(result))
```

### spaces

Rather than have the interpreter process spaces, I am going to punt and remove them prior to processing. We can't allow spaces just anywhere however ('1 0 + 3' for example). For a space character to be legal both of its neighboors must be either a space or different token from one another. Here's how I'm doing it:

```
@staticmethod
def strip_text(text):
"""
strips valid empty space from expression
text: expression string
returns: expression with valid spaces removed
"""
split_expr = text.split('+') # split expression on plus, getting list of two expressions.
stripped = [ char.strip() for char in split_expr] # strip leading and trailing white space
return '+'.join(stripped) # join expressions together with a plus in the middle
```

### Support addition or subtraction

I may have gone overboard on this one :) I wanted to generalize the concept of an operator which takes two numbers and, using a common interface returns the result. Further, I wanted to later be able to construct an AST of operator nodes and integer leaf nodes. All of these nodes respond to `value`

which allows for computing an expression by calling `value`

on the root and recursivly evaluating each node. Here's what I did.

First I extracted the `Token`

class, to keep as my base class

```
class Token(object):
"""
an abstract class representing all tokens
Inheriting classes must:
- define a self.type in their __init__
- implement a `value` property
"""
def __str__(self):
"""String representation of the class instance.
Examples:
Token(INTEGER, 3)
Token(PLUS '+')
"""
return 'Token({type}, {value})'.format(
type=self.type,
value=repr(self.value)
)
def __repr__(self):
return self.__str__()
```

Next I defined an abstract `OperatorToken`

which represents any binary operator.

```
class OperatorToken(Token):
def __init__(self) -> None:
__slots__ = 'left_value', 'right_value'
self.left_value = None
self.right_value = None
def __str__(self):
"""String representation of the class instance.
Examples:
Token(INTEGER, 3)
Token(PLUS '+')
"""
return 'Token({type}, {left}, {right})'.format(
type=self.type,
left=self.left_value,
right=self.right_value
)
```

A general binary operator has a `left_value`

and a `right_value`

. The child class will implement `value`

which determines the result.

Next, we've got two operators, addition and subtraction. Really their only differences are 1) they need to know their `type`

and 2) they both need to implement a property `value`

which returns either the sum or difference of their two values.

```
class AddToken(OperatorToken):
def __init__(self) -> None:
self.type = 'PLUS'
super().__init__()
@property
def value(self):
"""
returns the sum of left_value and right_value
"""
return self.left_value.value + self.right_value.value
class SubtractToken(OperatorToken):
def __init__(self) -> None:
self.type = 'MINUS'
super().__init__()
@property
def value(self):
return self.left_value.value - self.right_value.value
```

Next, how to represent numbers? Really there's two different concepts to deal with. The first is an integer token, which is a single digit.

```
class IntToken(Token):
"""
represents a single numeric character. Example '3'
"""
def __init__(self, value: str) -> None:
self.value = value
self.type = INTEGER
```

Next is the concept of an entire number, which is composed of one or many digits. Let's think of that as an object which wraps a list of `IntTokens`

```
class IntWrapper(Token):
"""
a list of IntTokens that represent an integer.
"""
def __init__(self, tokens: List[IntToken] ) -> None:
self.tokens = tokens
self.type = INT_WRAPPER
@property
def value(self) -> int:
number_strings = [int_token.value for int_token in self.tokens]
return int(''.join(number_strings))
```

And finally, we have the `EOF`

token. Not much to say there.

```
class EOFToken(Token):
def __init__(self):
self.type = EOF
self.value = None
```

The update to `expr`

in Interpreter looks like this

```
def expr(self):
# set current token to the first token taken from the input
self.current_token = self.get_next_token()
left = self.eat_integers()
# we expect the current token to be a '+' or '-' token
operator = self.current_token
self.eat(token.PLUS, token.MINUS)
# we expect the current token to be a single-digit integer
right = self.eat_integers()
operator.left_value = left
operator.right_value = right
# after the above call the self.current_token is set to
# EOF token
return operator.value
```

Finally, I need to update my `strip_text`

method to allow subtraction

```
def strip_text(self, text):
"""
strips valid empty space from expression
text: expression string
returns: expression with valid spaces removed
"""
try:
operator = re.search(r'[\+-]', text).group(0) # find the operator, either a + or -
except(AttributeError):
raise self.error('expression does not contain an operator') # this error means neither operator was found. Throw our custom exception
split_expr = text.split(operator)
stripped = [ char.strip() for char in split_expr]
return operator.join(stripped)
```

This architecture opens the door for constructing a more complex abstract syntax tree, with any number of binary operators.