Code Goes Here: March 2009

Thursday, March 26, 2009

The C# coalesce operator short circuits

Though it isn't mentioned in the help, the C# coalesce operator ( ?? ) does short circuit. If the left hand side isn't null, then the right hand side isn't evaluated.

Wednesday, March 25, 2009

F# Steals Show Smart Tag Keyboard Shortcut

Over the last few days I've noticed that pressing Ctrl-. no longer expanded the Smart Tags in the text editor (used to Refactor Rename or implement interfaces). It turns out FSI steals this combination for Canceling Evaluation.

To rebind it, find View.ShowSmartTag in the keyboard options and set it back to Ctrl-.

Edit: It turns out this is mentioned in the release notes that I neglected to read. Oops.

Monday, March 23, 2009

Writing a Compiler with F# Part 3

Now that we can read user input and pipe it into our evaluator, we need to start implementing the main parts of the compiler. First it's helpful to understand what compilers actually do. In a rough sense, they take the text files containing the code and output an exe or a dll or something, i.e., they transform human readable text into machine code. This is a simplified definition, but it will suffice for our purposes.

It may be tempting to go directly from text into code, but for anything beyond the simplest of languages this ends up being very difficult both to write and maintain. Most compilers divide up the work into several stages that form a pipeline of sorts. Some common stages are:

Lexing: The raw sequence of characters is transformed into a sequence of Tokens. For example, the Lexer would consume 'w' 'h' 'i' 'l' 'e' from the input file and return the WHILE token
Parsing: The tokens from the Lexer are consumed to produce an Abstract Syntax Tree. An AST is a convenient in-memory representation of the program. More on these in a moment. For example, the Parser might consume VAR IDENTIFIER('i') COLON TYPENAME('int') and produce an instance of our VariableDeclaration class with the Name property set to 'i' and the type set to 'int'
Type Checking: Once we have the AST, we can make several passes over it to check the semantics of the language, including that programs are well typed. For example, when looking at an IfStatement we'd want to make sure that the Expression for the if clause was of type bool.
Other backend phases: We'll talk more about backend phases like instruction selection, register allocation, and optimizations if we ever get to building the backend. Ultimately these phases involve walking over the AST and finally spitting out machine code (assembly language, IL for .NET, etc.)

What you should see here is that the actual textual representation of the language plays a very small part in this whole process. A large portion of the compiler depends solely on the AST. Only the Lexer and Parser really care whether you end statements with semicolons or have syntactically significant whitespace, for example. Because of this, it's useful to start thinking about the AST first, and then worry about Lexing and Parsing.

In F#, the most common way to model ASTs is through the use of discriminated unions. These are roughly the functional programming equivalent of a polymorphic class hierarchy, although much simplified. Let's look at an AST for a very simple language that lets you print arithmetic expressions (provided you only want to use parentheses and addition/subtraction).

type binop =
    | Plus
    | Minus

type expr =
    | Int   of int
    | BinOp of binop * expr * expr
    
type stmt =
    | Print of expr
    | Seq of stmt list

The keyword 'type' introduces the declaration of a discriminated union. Each option is followed by a |. binop, for example, resembles an Enum that can either be Plus or Minus. The main difference between these unions and enums becomes apparent when you look at 'expr': each of the options can carry data. We can have an expr Int that contains an integer value, or a BinOp that contains a binop (remember Plus or Minus) and two other expressions. We would represent "print 5 + 3" in our AST as Print(BinOp(Plus, Int(5), Int(3)))

Discriminated unions really become powerful when combined with pattern matching. Given the AST above we can easily build a simple evaluator. Here's all we need:

let rec evalE (env,fenv) = function
    | Int i -> i
    | BinOp(Plus, l, r) -> (evalE (env,fenv) l) + (evalE (env,fenv) r)
    | BinOp(Minus, l, r) -> (evalE (env,fenv) l) - (evalE (env,fenv) r)
    
and eval (env,fenv) = function   
    | Seq stmts -> List.fold_left eval (env,fenv) stmts
    | Print e -> printfn "%d" (evalE (env,fenv) e); (env,fenv)

Ok what's going on here? We've defined two mutually recursive functions, 'eval' for evaluating statements, and 'evalE' for evaluating expressions. We can safely ignore (env,fenv) for now; those will be useful when we add variables and functions. The body of 'eval' says: If I'm handed a Seq of statements, fold the eval function over each statement. Since we don't have variables yet this is could easily just be 'for each statement, evaluate it'. It also says: If I'm handed a Print of some expression, evaluate the expression and then print it out. Evaluating expressions works similarly. If I'm handed a BinOp(Plus) of two expressions, evaluate each one and then add them together.

In these relatively few lines of code, we've built a simple arithmetic evaluator. If you type this stuff into fsi, you can call the evaluator by doing something like:

eval (0,0) (Print(BinOp(Plus,Int(5),Int(6))))

Which will evaluate 5 + 6 and print out 11. Next time we see how to build a simple Lexer and Parser for the language so we don't have to type the AST out manually any more.

Friday, March 20, 2009

Using a custom base class for a generated ObjectContext

We're adapting our framework to support the Entity Framework in addition to its current support for Linq to SQL and running into issues. The Linq to SQL approach involved, among other things, a Base Data Context class that we could easily instruct the built in code generator to use. The LINQ to SQL designer has first class support for specifying that we want the generated data context to inherit from our BaseDataContext instead of the built-in DataContext directly.

Not only does the Entity Framework designer not have first class support for this, it appears that the Entity Framework tool chain itself has no support for this at all. I hope I'm wrong about this, but thus far I've been completely unable to find a way to specify that I want my generated ObjectContext to inherit from my BaseObjectContext instead of just ObjectContext.

Not impressed.

Sunday, March 15, 2009

Writing a Compiler with F# Part 2

Since the ideas I want to mess with are more related to type systems than, say, instruction selection or optimizing passes, I'm going to completely neglect the backend of the compiler for awhile. Initially I'm not going to compile the language at all. Instead I'll just build a simple interpreter to evaluate the type-checked parse tree.

The easiest way to play with this is to make a basic REPL (Read Evaluate Print Loop). Basically a prompt that you can type language snippets into and have them run and the results printed. For F#, the REPL is called fsi.exe. Other languages have this feature as well (in Ruby it's irb.exe).

To get started, I'm going to forget about all the lexing and parsing stuff and get my feet wet with F# by building the framework for a simple REPL.

First, I'll need to print a prompt and get some input. Here's an F# function to handle that:

let getInput (prompt:string) =
 (Console.Write(prompt); Console.ReadLine())

Pretty basic stuff. I had to specify the type for "prompt" since Console.Write is overloaded.

Next, I want a basic loop to get input and evaluate it. Functional programming and F# in particular work naturally with sequences as a first class element. To that end, I'm going to model the user input as a sequence of lines. The F# Seq type has a generate method for this purpose. It takes 3 functions: an initializer that returns any data you might need, a generator that given the initial data produces either another element or None to signify the sequence has ended, and a cleanup function. Initially, I don't need any supporting data so both the initialize and cleanup functions can do nothing. Here's what it looks like:

let input = Seq.generate
 (fun () -> ()) // init
 (fun () ->
   match getInput(">") with
   | "#quit" -> None
   | inp -> Some(inp))
 (fun () -> ()) // cleanup

The middle function uses getInput that was defined above and asks the user for input. If they type "#quit" then the sequence ends, otherwise the input is returned.

Now we'll take this sequence and do something with it. Ultimately we'll want to evaluate the input and then keep prompting. Since the statement run might result in new variables being declared or existing ones modified, we'll need some way to remember what happened. The state of our world is referred to as an environment, so we'll write a dummy eval function that takes an environment, evaluates input, and returns the new environment to pass along to the next statement. For now let's use this:

let eval (env:int) (x:string) =
 (Console.WriteLine("Evaluating: {0}", x);env)

We ignore the environment, just print out the input, and return the environment unchanged.

The last step is to apply this eval function to our sequence of user input, passing the modified environment forward to the next element. This is a classic case of a fold. We start with an initial environment, call eval on the first item in the sequence, and then pass the resulting environment in when calling eval on the second item. Then the environment returned by eval'ing the second item gets passed into the third, etc. Here's what this looks like:

Seq.fold eval 0 input |> ignore

That's it. We don't care about the final value of the environment so we just pipe it to ignore. With only a few lines of F# we have a basic REPL going and we're ready to start with the real language work next time.

Saturday, March 14, 2009

Writing a Compiler with F#

I've been dabbling with F# off and on since it was first publicly available, but I've never put enough time into it for it to feel natural to me. The language itself has also evolved considerably since the early days.

To take my knowledge of the F# from its current state (simple snippets typed into FSI used to befuddle and amaze people who have never seen a functional language with strong type inference before) into a solid working understanding I'm going to write a compiler for a new programming language. Languages like F# excel at solving problems like this, so this should give me a good exposure to what it's really like.

I also have some ideas regarding programming languages that I want to try out, but I'll talk more about those as they arise. I'm going to start pretty simply for the target language and as time goes by I'll hopefully transition from worrying about the low level stuff like F# syntax into the really interesting stuff like the type system for my language and how it interoperates with the CLR.

My intent is to make a series of posts highlighting the process I'm going through including interesting things about F#, compilers and compiling, and type systems. I'm not getting any Big Ideas about the resulting language - my formal PL background is limited to a solid CS education, reading Pierce's Types and Programming Languages, and lurking Lambda The Ultimate. That is to say, I'm expecting to do things...less than optimally.

Now since I've posted this introduction I actually have to work on this thing. Next time I'll talk about some basics of F# and also compilers.

Code Goes Here