Compressed ASTs

Research Team:
Team leader (HDR):
St├ęphane Ducasse
Project leader:
Marcus Denker

Project Context

Pharo ( is a new open-source Smalltalk-inspired programming language and environment [2]. It provides a platform for innovative development both in industry and research.


In Pharo, code is, like in most programming languages, stored as text. The text is compiled to a tree representation (AST) for executable code generation and transformations (e.g refactorings). The AST is never stored as is it not a compact representation.

The goal of the project is to implement a compression scheme for ASTs to make it practical to keep all ASTs in the system at runtime.

Work plan

To solve this problem, the student will have to:

  • Evaluate encoding and compression schemes for ASTs.
  • Define criteria related to space, computation time and similar that a solution needs to fulfil.
  • Implement one or more of the solutions.
  • Experiment with ideas of how the AST can be used, for example in tools.

Benefits for the student

  • Learn about Abstract Syntax Trees, Serialisation and compression.
  • Learn more about advanced concepts of language implementation.
  • Integration into a prolific research group, fond of software development and programming languages;
  • Understanding of how programming languages and IDEs are implemented;
  • Potential integration as a master and/or PhD student either within the group or within one of its numerous partners around the world (Switzerland, Chile, Belgium, Argentina, Italy).


[1] Franz, Michael and Kistler, Thomas: Slim binaries. CACM Vol 40 No 12, Dec 1997.

[2] Andrew Black, St├ęphane Ducasse, Oscar Nierstrasz, Damien Pollet, Damien Cassou and Marcus Denker, Pharo by Example, Square Bracket Associates, 2009, ISBN 978-3-9523341-4-0