In a typical interpreter Prolog is most usually compiled into instructions of an abstract model called the Warren Abstract Machine. Then the instructions are executed in a WAM implementation inside the interpreter. WAM is imperative ; it contains mutable structures and commands to be executed. A detailed explanation can be found here.
OCaml is multi-paradigm, but we choose to implement an interpreter in a functional manner. As we can't directly work with the WAM, we will work directly on Prolog terms instead.
Our interpreter should be able to understand Prolog terms by reading queries from a prompt and by reading Prolog program files, parsing them and asserting them in its database. The database will simply be a hash table of predicates containing the loaded programs, converting, for example, the following program:
mother_child(trude, sally). father_child(tom, sally). father_child(tom, erica). father_child(mike, tom). sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y). parent_child(X, Y) :- father_child(X, Y). parent_child(X, Y) :- mother_child(X, Y).
to
"mother_child/2" -> mother_child(trude, sally).
"father_child/2" -> father_child(tom, sally).
father_child(tom, erica).
[..]
"sibling/2" -> sibling(X, Y) :- parent_child(Z,X), parent_child(Z,Y).
[..]
We begin by writing Prolog's simple grammar rules in the form of regular expression annotations that will be read by ocamllex to generate a lexical analyzer of Prolog. Essentially, the analyzer will recognize Prolog structures like clauses, variables, constants, atoms, and then we will use them to construct OCaml types with which we will work.
let whitespace = [ ' ' '\t']+ let newline = '\r' | '\n' | "\r\n" let constant = ['a' - 'z' '_'] ['a'-'z' 'A'-'Z' '0'-'9' '_']* let variable = ['A'-'Z'] ['a'-'z' 'A'-'Z' '0'-'9' '_']* rule token = parse | whitespace { token lexbuf } | newline { next_line lexbuf ; token lexbuf } | variable as v { VAR v } | constant as v { CONST v } | '(' { LPAREN } | ')' { RPAREN } | ':' { COLON } | '-' { DASH } | '.' { PERIOD } | ',' { COMMA } | '[' { LBRACKET } | ']' { RBRACKET } | '\'' { APOSTROPHE } | eof { EOF } | _ { raise (SyntaxError ("Unexpected char: " ^ Lexing.lexeme lexbuf)) }
These tokens must be fed to a parser, which recognizes token structures as Prolog terms and returns the appropriate OCaml types. In order to write the parsing rules, we will take a little detour to write our types.
type varN = string type opN = string type term = | Var of varN | Const of varN | Op of opN * (term list) type sub = (varN * term) list type clause = { head : term ; body : term list }
In the simple program mentioned before, mother_child(trude, sally). will be a variant term of tag Op in the form ("mother_child",[Const "trude"; Const "sally"]). It will be stored in the database as a clause record in the form { head = Op ("mother_child", [Const "trude"; Const "sally"]) ; body = [] }. This is called a 'fact', a clause which we know to be true without needing to prove a body of other terms. The following clause,
parent_child(X, Y) :- father_child(X, Y).
is a rule, because it is only true if father_child(X, Y). is true.
Now we can write our parser rules. Reading a Prolog program will yield a list of clauses. We assume that what we read through our prompt is a query, whereas what we read from files is to be loaded into our database.
main: | EOF { [] } | clause { $1 } clause: | rule main { $1 :: $2 } rule: | LBRACKET v = VAR RBRACKET PERIOD | LBRACKET v = CONST RBRACKET PERIOD { { head = (Op ("assert",[(Var v)])); body=[] } } | CONST PERIOD { { head=(Op ($1,[])); body=[] } } | CONST LPAREN arguments RPAREN PERIOD { { head = (Op ($1,$3)); body = [] } } | CONST LPAREN arguments RPAREN COLON DASH goals PERIOD { { head=(Op ($1,$3)) ; body= $7 } } goals: | goal { [$1] } | goal COMMA goals { $1 :: $3 } goal: | CONST LPAREN arguments RPAREN { Op ($1,$3) } arguments: | term { [$1] } | term COMMA arguments { $1 :: $3 } term: | APOSTROPHE CONST APOSTROPHE { Const $2 } | CONST { Const $1 } | VAR { Var $1 } | goal { $1 }
We use the unification algorithm from this page. We will use this code to find the most general unifier in the following solving process.
function resolution(clause, goals, query):
let sub = the MGU of head(clause) and head(goals)
return (sub(tail(clause) concatenated with tail(goals)), sub(query))
function solve(goals, query):
if goals is empty then succeed(query)
else for each clause c in the program, in order,
if head(c) does not unify with head(goals) then do nothing
else solve(resolution(c, goals, query))
In OCaml this will be:
(* get predicate notation f and return list of database entries *) let predicate f = try List.rev (Hashtbl.find_all prog f) (* Hashtbl.find_all returns most recent entries first *) with Not_found -> raise No_pred (* lapply applies substitution to lists of terms, apply to terms *) let resolution clause (b::gs) q = let u = unify [(clause.head,b)] in (lapply u (clause.body @ gs), apply u q) let rec solve (g : term list) (query : term) = match g with | [] -> query | x::xs as g -> match (predicate (notate x)) with | [] -> raise (No_solution "no clauses left") | ((a::cs) :clause list) -> try (match resolution a g query with | ([],newq) -> solve [] newq | (newg,newq) -> solve newg newq with No_unify _ -> solve g query
We can then call solve and if it returns a result, unify it with the original query and show the result. To support backtracking and also check other clauses in case of failure, we have to keep the clauses we have not looked at everytime we pick a path in the clause database. In addition, to avoid trouble with recursion and variable name conflict in general, we have to rename our variables to keep them different in each call of solve. This can be done by assigning solve's recursion level as a suffix
(* renames variables to avoid confusion in case of identical variable names *) let rename_depth n c = let rec aux = function Var x -> Var (x ^ "__" ^ (string_of_int n)) | Const x -> Const x | Op (x,y) -> Op (x, (List.map aux y)) in { head = (aux c.head) ; body = (List.map aux c.body) } let resolution clause (b::gs) q = let u = unify [(clause.head,b)] in (lapply u (clause.body @ gs), apply u q) (* (g = goals list, q = query, clausesleft, n = depth level, ch = choices list *) let rec solve (g : term list) (query : term) (cls : clause list) (n : int) (ch : (term list * term * clause list * int) list) = match g with | [] -> (query,ch) | x::xs as g -> match cls with | [] -> next_choice ch (*raise (No_solution "no clauses left") *) | ((a::cs) :clause list) -> try (match resolution (rename_depth n a) g query with | ([],newq) -> solve [] newq cs (n+1) ((g,query,cs,n)::ch) | (newg,newq) -> solve newg newq (predicate (notate (List.hd newg))) (n+1) ((g,query,cs,n)::ch)) with No_unify _ -> solve g query cs n ch and next_choice = function (* gets next path in predicate database - backtracking *) [] -> raise (No_solution "No choices left") | (g, q, cls, n)::tl -> solve g q cls n tl (* unify result with query, if substitution = [] then print yes, otherwise print the substitution. *) let rec success (result,choices) query = let rec print_bindings = function [] -> () | (x,y)::tl -> print_string (" " ^ x ^ " = ") ; print_term [y] ; print_bindings tl in match unify_one result query with | [] -> print_endline "yes." | list -> print_bindings list ; try match read_line () with (* search for next choice if user requests with ';\n' *) | ";" -> success (next_choice choices) query ; () | _ -> print_string "" with (No_solution _) -> print_string "" let solve_prompt q = try success (solve [q] q (predicate (notate q)) 0 []) q with (No_solution _) -> print_endline "no."
This implementation is fairly plain with no support for integers, lists, comparison operators and built-in predicates.