Welcome to the Gobba Programming Language Handbook

gobba


gobba is a dynamically typed and purely functional interpreted programming language, heavily inspired from the OCaml, Haskell and Scheme languages. It is based on Professors Gianluigi Ferrari and Francesca Levi's minicaml interpreter example. The goal for gobba is to be a practical language with built in support for scientific computing, solving some of the problems that exist in other dynamically typed interpreted languages like python and Javascript. A primary goal is also to offer a compromise between solidity, ease of learning and the ability to express ideas quickly in the language.

Features

  • C and Haskell-like syntax with lexical scoping
  • Only immutable variables
  • Dynamically typed
  • Eager (default) and lazy evaluation
  • Simple but effective module system
  • Interactive REPL with readline-like features such as completion, search and hints
  • The REPL has didactical debugging option to print expression ASTs and every reduction step.
  • Static inference to separate pure and impure computations
  • A lot more coming in the next releases...

Thanks to

  • Prof. Gian-Luigi Ferrari and Francesca Levi for teaching us how to project and develop interpreters in OCaml
  • Kevin Fontanari for the pixel art gobba mascotte.
  • Antonio DeLucreziis for helping me implement lazy evaluation.
  • Prof. Alessandro Berarducci for helping me study lambda calculus in deep.
  • Giorgio Mossa for helping me polish the lambda-closure mechanism.

Installation

To install, you need to have opam (OCaml's package manager with a version greater than 2.0) and a recent OCaml distribution installed on your system.

gobba has 3 main development dependencies you have to install on your system:

  • cblas
  • openblas
  • lapacke

These packages may or may not be present in the package repositories of your operating system. If they are, please be sure to install the development versions of these packages.

You can then install gobba by running

opam install gobba

Manual installation

# clone the repository
git clone https://github.com/0x0f0f0f/gobba
# cd into it
cd gobba
# install dependencies
opam install dune menhir ANSITerminal cmdliner alcotest bisect_ppx ocamline
# compile
make
# test
make test
# run
make run
# rlwrap is suggested
rlwrap make run
# you can install gobba with
make install
# run again
rlwrap gobba

Usage

The executable name is gobba. If a file is specified as the first command line argument, then it will be run as a program. If you are running a program you may want to use the flag -p to print the results of the expressions that are evaluated. Otherwise, if a program is not specified a REPL session will be opened.

Keep in mind that gobba is purely functional and values are immutable by default!

Command Line Options

  • --help[=FMT] (default=auto): Show this help in format FMT. The value FMT must be one of auto, pager, groff or plain. With auto', the format is pageror plain' whenever the TERM env var is `dumb' or undefined.

  • --internals: To print or not the language's internal stack traces

  • -m MAXSTACKDEPTH, --maxstackdepth=MAXSTACKDEPTH (absent=10): The maximum level of nested expressions to print in a stack trace.

  • -p, --printexprs: If set, print the result of expressions when evaluating a program from file

  • -v VERBOSITY, --verbose=VERBOSITY (absent=0): If 1, Print AST to stderr after expressions are entered in the REPL. If 2, also print reduction steps

  • --version Show version information.

The Gobba Programming Language Basics

Comments

Comments are treated by gobba as whitespace and consequently ignored. Comments can be nested and are multi-line by default, and can be written as following:

(* This is a comment *)

Numbers, Arithmetics and the Numerical Tower

Gobba supports 3 kinds of numbers: integers, floats and complex numbers. Floating point numbers' decimal part can be omitted if it is 0. Floats also support the scientific notation in literal values. Complex numbers literals are "allocated" from other two numbers with the :+ operator. The number on the left will be the real part and the one on the right will be allocated as the imaginary part.

(* Integer literals *)
1 ;
0 ;
10350156 ;
(* Floating Point Literals *)
1.2e-3 ;
0.0 ;
0.123 ;
34. ;
4.3e9 ;
(* Complex Number Literals *)
12.1 :+ 1.23;
0 :+ 1.12;

Arithmetical operations are straightforward. If an arithmetical operation is called on different types of numbers, the result is "raised" to the "most inclusive" type of numbers. For example, Integer division returns an integer if the modulo is 0, and returns a float otherwise. Floating point numbers can use the power syntax using e.

(* Addition, multiplication and subtraction *)
1 + 2 + 3 * (4 - 1) ;
(* *)
1. / 2.315 ;

Boolean literals and arithmetic

The boolean literals in gobba are true and false. There are also operators for the logical AND and OR: && and ||. The comparison operators return boolean values and are:

  • a = b: True if and only if the two objects are the same
  • a > b and a >= b: Greater and greater or equal
  • a < b and a <= b: Less and less or equal

Here's an example:

true && false || (1 < 2) && (1 = 1) ;

Character literals.

The same as all the other languages: Single characters enclosed in ' are character literals, such as 'a' or '\n'. UTF-8 support is planned for a future release.

Strings

Strings are values enclosed in double quotation marks. Here is how to concatenate strings

"hello " ++ "world"
(* It is the same as *)
String:concat "hello " "world"

To convert any value to a string you can use the show primitive.

Lists

Lists are enclosed in square brackets and values are separated by commas. Lists are heterogeneous, so any value in a list can have a different type from the other elements of the list.

[1, 2, "ciao", 4, 5] ;

:: is the classic cons operator. It inserts the element on the left at the beginning of the list. This is done in O(1) time. The ++ operator is used for list and string concatenation.

1 :: [2] ++ [3]

To access the n-th value of a list, use the @ operator, called "at". List indexes begin from 0.

[1, 2, 3, 4] @ 0 (* => 1 *)
[1, 2, 3, 4] @ 2 (* => 3 *)

In gobba, the classic functional programming functions and morphisms on lists are defined in the List module:

Get the length of a list (done in O(n) time).

List:length [1, 2, 3, 4] ;

Get the element at the beginning of a list

List:head [1, 2, 3, 4] ;

Get another list with the first element removed

List:tail [1, 2, 3, 4] ;

Iterate a function over every list value and return a new list

List:map (fun x -> x + 1) [1, 2, 3, 4] ;

List folding: see the Wikipedia Page

List:foldl (fun x y -> x - y) 10 [1, 2, 3, 4] ;
List:foldr (fun x y -> x - y) 10 [1, 2, 3, 4] ;

Declarations

Local declaration statements are purely functional and straightforward:

let x = 4 and y = 1 in x + y

Global declaration statements create new, purely functional environments in both programs and the REPL. Omitting in is syntax-sugar, subsequent blocks will be evaluated in the resulting new environment.

let a = 2 ;
a + 3 ;

You can declare lazily evaluated variables by prefixing the name of the variables with the lazy keyword.

let x = 2 and lazy y = (3 + 4) ;

Toplevel Directives

Toplevel directives can be used in both files and the REPL. Like in OCaml, they start with a # symbol. Note that toplevel directives are not expressions and they can only be used in a file (or REPL) top level, and cannot be used inside expressions.

#include loads a file at a position relative to the current directory (if in the REPL) or the directory containing the current running file (in file mode). The declarations in the included file will be included in the current toplevel environment:

#include "examples/fibonacci.mini" ;

#module loads a file like #include but the declarations in the included file will be wrapped in a dictionary, that acts as a module:

#module "examples/fibonacci.mini" ;
(* Declarations will be available in module *) Fibonacci

#open takes a module name and imports everything that is contained in that module into the toplevel environment:

#open Math;
  • #verbosity n sets verbosity level to n. There are "unit" directives:
  • #dumpenv () and #dumppurityenv () dump the current environments.
  • #pure (), #impure () and #uncertain () set the globally allowed purity level.

Functions and recursion

For parsing simplicity, only the OCaml anonymous function style of declaring functions is supported. The keyword fun is interchangeable with lambda.

(fun x -> x + 1) 1;
let fib = fun n -> if n < 2 then n else (fib (n - 1)) + fib (n - 2)

Functions are abstracted into a single parameter chain of functions, and they can be partially applied:

(fun x y z -> x + y + z) = (fun x -> fun y -> fun z -> x + y + z) ;
(* result: true - bool - This is true!! *)

let f = (fun x y z -> x + y + z) in f 1 2 3 ;
(* result: 6 - int - Function application *)

let f = (fun x y z -> x + y + z) in f 1 2 ;
(* result: (fun z -> ... ) - fun - Partial application *)

Dictionaries and modules.

Dictionary (object) values are similar to Javascript objects. The difference from javascript is that the keys of an existing dictionary are treated as symbols, and values can be lazy.

You may have noticed that dictionary fields are syntactically similar to the assignments in let statements. This is because there is a strict approach towards simplicity in the parsing logic and language syntax. A difference from let statements, is that values in dictionaries can only access the lexical scope outside of the dictionary.

let n = {hola = 1, lazy mondo = 2, somefunc = fun x -> x + 1 } ;
let m = Dict:insert "newkey" 123 n ;
m = {newkey = 123, hola = 1, mondo = 2, somefunc = fun x -> x + 1 } (* => true *)
Dict:haskey "newkey" m (* => true *)
(* => {newkey = 124, hola = 2, mondo = 3} *)

An element of a dictionary can be accessed using the : infix operator.

m:hola (* returns 1 *)

Some morphisms are defined in the Dict module.

Dict:map (fun x -> x + 1) m;
Dict:foldl (fun x y -> x + y) 0 m;
Dict:foldr (fun x y -> x - y) 10 m;

Haskell-like dollar syntax

Too many parens?

f (g (h (i 1 2 3)))

Is equivalent to

f $ g $ h $ i 1 2 3

Purity of gobba code

From Wikipedia:

In computer programming, a pure function is a function that has the following properties: Its return value is the same for the same arguments (no variation with local static variables, non-local variables, mutable reference arguments or input streams from I/O devices). Its evaluation has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or I/O streams).

The gobba interpreter statically infers whether the expressions you run in your programs are numerical, pure or impure. An impure expression is a computation that calls primitives that have side effects, such as direct memory access or I/O access. Pure expressions are those expressions that do not imply side effects. Numerical expressions are the purest computations that operate only on numerical values. By default, the interpreter is in a uncertain state, it means that it will allow evaluation of pure expressions, and will allow impure code to be executed only if inside an impure statement.

To be impure, an expression must contain an explicit impure statement or a call to an impure language primitive such as IO:print. You can enforce purity explicitely, inside an expression by wrapping it into the pure and impure statements, followed by an expression.

It is good practice to reduce the use of the pure/impure keywords as much as possible, and to avoid using it inside of function bodies. This means keeping your code as purely functional as you can.

Here is an example:

let bad_function = fun x ->
    impure (let mystring =
        "I am a bad impure function! Also: " ++ x in
        IO:print_endline mystring );

let good_function = fun x ->
    IO:print_endline ("I am a good function! Also: " ++ x) ;

The following function call is causing side effects and will error

bad_function "hello!" ;

The above will error, because it is trying to execute an impure computation in a pure environment

good_function "hello! I should error" ;

Here's a good way of calling it

impure $ good_function "hello!" ;

You can specify that you DO NOT want to compute impure expressions by using the pure statement We want the following to error because it contains an impure computation.

pure $ good_function "henlo world! I should error" ;

The above will error because a pure contest does not allow nesting an impure contest inside

pure $ bad_function "ciao mondo! I should error" ;

A good way of structuring your code is keeping pure/impure statements as external from expressions as you can (towards the top level).

Sequencing (>>) operator

Keep in mind that every expression is either numerical, pure or impure. You may like, as in imperative languages, to do operations sequentially one after another. Newcomers may be confused by the ; symbol at the end of expressions. This special symbol signals gobba that the current command is over, and everything after ; is another command. A command is either a global declaration, a directive call or an expression to be evaluated. You may want to sequence operations inside of expressions. To do so, you can use the >> operator. The >> operator is called the sequencing operator or the then operator and what it does is straightforward:

Evaluate the expression on the left, discard the result and then evaluate the expression on the right.

Here's a simple example: since 12.4 is greater than 3, this snippet of code will print "Good maths!" on the standard output, and then (>>) the variable c will be assigned to a * c, precisely 37.2.

let a = 12.4 and b = 3 ;
let c =
    if a < b then
        impure $ IO:print_endline "Bad maths!" >>
        a + b
    else
        impure $ IO:print_endline "Good maths!" >>
        a * b ;

Function pipes (reverse composition) and composition

You can redirect the result of a function to the first argument of another function using the >=> operator. The <=< operator does the same thing but in reverse. The following example will output 6, because 2 + 3 is piped into z + 1

let sum_and_add_one = (fun x y -> x + y) >=> (fun z -> z + 1) ;
sum_and_add_one 2 3

The composition operators yield the same result as normal function composition:

let my_sum = (fun x y -> x + y) ;
let add_one = (fun z -> z + 1) ;
(add_one <=< my_sum) 2 3 = add_one (my_sum 2 3) ;

The operator <=< means compose, the following example evaluates to true:

(add_one <=< my_sum) = (my_sum >=> add_one) ;

The Gobba Standard Library

This chapter is a work in progress.

Most standard library functions are organized in modules. The main modules are

  • IO: All the impure Input/Output functionality is contained in this module.
  • Char: Common operations on characters.
  • List: Simple and higher-order functions to manipulate lists.
  • Dict: Simple and higher-order operations on dictionaries.
  • String: Common string operations.
  • Math: Common mathematical functionality

Some other basic primitives are defined on the toplevel, those are:

  • show val: Gives you a string representation of any value.
  • typeof val: Gives you the string representation of the type of a value.
  • failwith message: Fail the current computation with an error message

Primitives and printing

The impure primitives IO:print and IO:print_endline automatically call show on a value. The difference between them is that IO:print_endline automatically adds a newline at the end of the line.

Calling gobba from your OCaml programs

You may want to use gobba as an embedded language in your OCaml projects. Just add gobba as a dependency and you can call gobba code from OCaml fairly easily. Keep in mind that gobba is dynamically typed and if you want to do something with the results of gobba evaluation you will have to extract the values with the unpacking functions.

(* This not code written in gobba, this is written in OCaml *)
(* Run a gobba string and retrieve the value and resulting state *)
let x, state = Gobba.Repl.run_string "3.14 * (4.0001)" () in
(* The expression is not altering the state, so let's throw it away *)
let _ = state and extracted_value = Gobba.Typecheck.unpack_float x in
print_float extracted_value

Internals of the Gobba programming language.

Interactive Command Line Interface

If no filename is passed to gobba executable, the interactive shell interface is launched. The REPL shell is built upon a simple library called ocamline, which itself relies on a library called ocaml-linenoise that provides readline navigation and completion functionality without additional system dependencies.

Syntax and Parser

Lexing is achieved with ocamllex, the default tool for generating scanners in OCaml. The parser is realized with the Menhir parser generator library, and is documented using Obelisk, which generates a clean text file containing the language grammar, available in Appendix [grammar].

Purity Inference

An important feature of the gobba language is the purity inference algorithm, which is performed statically on expressions before evaluation. It is an interpretation of expressions over the domain of purity, meant to prevent side effects by signal an error if they are contained inside the programs written in the language. Expressions are tagged by the algorithm with the Pure, Impure and Numerical labels. An Impure expression is an expression that contains calls to primitives that perform I/O operations, mutable variables and/or imperative style assignments. A Numerical expression is an expression where only numerical operations are performed; Pure expressions are those which do not fall into the previous two categories.

To achieve the execution of impure side effects, the programmer has two constructs available called purity blocks. By default, the evaluator is in an Uncertain context, which means that it will not allow side effects to be carried on by evaluation, but will allow evaluating purity blocks that change the currently allowed purity context. The impure statement takes an expression (the block) and evaluates it in a context where the allowed purity is Impure, so that side effects may be performed. The other construct available, the pure statement, takes an expression and enforces a Pure context, meaning that side effects and nested impure blocks will not be allowed inside of the expression.

AST Optimization

After purity inference is performed, and before evaluation, AST expressions are analyzed and optimized by an optimizer function that is recursively called over the tree that is representing the expression. The optimizer simplifies expressions which result is known and therefore does not need to be evaluated. For example, it is known that 5 + 3 equiv 8 and true && (true || (false && false)) equiv true. When a programmer writes a program, she or he may not want to do all the simple calculations before writing the program in which they appear in, we rely on machines to simplify those processes. Reducing constants before evaluation may seem unnecessary when writing a small program, but they do take away computation time, and if they appear inside of loops, it is a wise choice to simplify those constant expressions whose result is already known before it is calculated in all the loop iterations. It is also necessary in optimizing programs before compilation. The optimizer, by now, reduces operations between constants and if statements whose guard is always true (or false). To achieve minimization to an unreducible form, optimizer calls are repeated until it produces an output equal to its input; this way, we get a tree representing an expression that cannot be optimized again. This process is fairly easy:

let rec iterate_optimizer e =
  let oe = optimize e in
  if oe = e then e (* Bottoms out *)
  else iterate_optimizer oe

Boolean operations are reduced using laws from the propositional calculus, such as DeMorgan’s law, complement, absorption and other trivial ones.

Types

TODO

Evaluator

gobba’s evaluator is heavily inspired by the Metacircular Evaluator defined in the highly acclaimed textbook Structure and Interpretation of Computer Programs.

Primitives

The language primitives that are implemented in OCaml are organized in modules separated by functionality. Each primitive is a function that accepts a list of evaluated values and returns a single reduced value; therefore they have a type of evt list -> evt. OCaml primitives have to perform internal typechecking and unpacking of the arguments they receive from the gobba calls.

From the evaluator’s perspective, primitives are organized in a table such that when a symbol gets evaluated, it is looked up in the primitives table, if there is a match then the found primitive’s name is wrapped in an ApplyPrimitive expression nestedinside of a lazy lambda expression that permits partial application. When the evaluator finally encounters an ApplyPrimitive expression, the primitive OCaml function is extracted, applied to the arguments and the resulting value is returned by the current evaluator call. If a primitive is not found when looking up for a symbol, then a symbol lookup is performed in the current environment.

Some primitives, such as catamorphic procedures, are not native OCaml functions but small expressions written directly in gobba; those primitives are kept as lazy expressions into the same table as native OCaml primitives. The key difference between the two resides in the fact that those textual gobba primitives are not transformed into a function which body contains only an ApplyPrimitive call, but are instead parsed and analyzed at run time. The resulting additional startup time caused by parsing and analysis is proportional to the number of textual form primitives in the table and therefore quite irrelevant on non-embedded computer systems. The fold left and fold right catamorphic primitives are written directly in the gobba language and are hereby provided as examples.

This is the fold left procedure defined in the gobba standard library, which is tail recursive:

fun f z l ->
if typeof l = "list" then
  let aux = fun f z l ->
    if l = [] then z else
      aux f (f z (head l)) (tail l)
    in aux f z l
else if typeof l = "dict" then
    let aux = fun f z kl vl ->
        if kl = [] && vl = [] then z else
        aux f (f z (head vl)) (tail kl) (tail vl)
    in aux f z (getkeys l) (getvalues l)
else failwith "value is not iterable"

This is the fold right procedure defined in gobba standard library, which is not tail recursive:

fun f z l ->
if typeof l = "list" then
   let aux = fun f z l ->
      if l = [] then z else
      f (head l) (aux f z (tail l))
   in aux f z l
else if typeof l = "dict" then
   let aux = fun f z kl vl ->
      if kl = [] && vl = [] then z else
      f (head vl) (aux f z (tail kl) (tail vl))
   in aux f z (getkeys l) (getvalues l)
else failwith "value is not iterable"

Tests

Unit testing is extensively performed using the alcotest testing framework. Code coverage is provided by the bisect_ppx library which yields an HTML page containing the coverage percentage when unit tests are run by the dune build system. After each commit is pushed to the remote version control repository on Github, the package is built and tests are run thanks to the Travis Continuos Integration system.

Grammar

This is the full parsing grammar for the gobba language:

<optterm_list(separator, X)> ::= [separator]
                               | <optterm_nonempty_list(separator, X)>

<optterm_nonempty_list(separator, X)> ::= X [separator]
                                        | X separator
                                          <optterm_nonempty_list(separator,
                                          X)>

<toplevel> ::= <optterm_nonempty_list(SEMI, <statement>)> EOF

<statement> ::= <ast_expr>
              | <def>
              | <directive>

<assignment> ::= SYMBOL EQUAL <ast_expr>
               | LAZY SYMBOL EQUAL <ast_expr>

<def> ::= LET [<assignment> (AND <assignment>)*]

<directive> ::= DIRECTIVE STRING
              | DIRECTIVE INTEGER
              | DIRECTIVE UNIT
              | DIRECTIVE SYMBOL

<ast_expr> ::= <ast_app_expr>
             | <ast_expr> CONS <ast_expr>
             | NOT <ast_expr>
             | <ast_expr> BIND <ast_expr>
             | <ast_expr> ATSIGN <ast_expr>
             | <ast_expr> CONCAT <ast_expr>
             | <ast_expr> LAND <ast_expr>
             | <ast_expr> OR <ast_expr>
             | <ast_expr> EQUAL <ast_expr>
             | <ast_expr> DIFFER <ast_expr>
             | <ast_expr> GREATER <ast_expr>
             | <ast_expr> LESS <ast_expr>
             | <ast_expr> GREATEREQUAL <ast_expr>
             | <ast_expr> LESSEQUAL <ast_expr>
             | <ast_expr> PLUS <ast_expr>
             | <ast_expr> MINUS <ast_expr>
             | <ast_expr> COMPLEX <ast_expr>
             | <ast_expr> TIMES <ast_expr>
             | <ast_expr> DIV <ast_expr>
             | <ast_expr> TOPOWER <ast_expr>
             | IF <ast_expr> THEN <ast_expr> ELSE <ast_expr>
             | <def> IN <ast_expr>
             | LAMBDA SYMBOL+ LARROW <ast_expr>
             | <ast_expr> COMPOSE <ast_expr>
             | <ast_expr> PIPE <ast_expr>

<ast_app_expr> ::= <ast_simple_expr>+

<ast_simple_expr> ::= SYMBOL
                    | UNIT
                    | DOLLAR <ast_expr>
                    | LPAREN <ast_expr> RPAREN
                    | <ast_simple_expr> COLON SYMBOL
                    | PURE <ast_expr>
                    | IMPURE <ast_expr>
                    | LSQUARE <optterm_list(COMMA, <ast_expr>)> RSQUARE
                    | LVECT <optterm_nonempty_list(COMMA, <ast_expr>)> RVECT
                    | LVECT RVECT
                    | LBRACKET <optterm_list(COMMA, <assignment>)> RBRACKET
                    | BOOLEAN
                    | CHAR
                    | STRING
                    | INTEGER
                    | FLOAT