FullText Seach

Research Team:
RMod
Team leader (HDR):
St├ęphane Ducasse
Project leader:
Marcus Denker, Camillo Bruni

Project Context

Pharo (www.pharo-project.org) is a new open-source Smalltalk-inspired programming language and environment [2]. It provides a platform for innovative development both in industry and research.

Pharo contains a complete IDE (Integrated Development Environment).

Problem

Currently the sources in Pharo are stored outside the system, which poses several problemes. One is that fulltext search can only be performed with a huge performance overhead.

The goal of this project is to speed up fulltext searches in code by introducing a search index.

Work plan

To solve this problem, the student will have to:

  • Evaluate the existing prototype of full text search. Define a catalog of requirements that a real implementation needs to fullfill that is good enough for daily use.
  • Implement a full text indexing subsystem according to these requirements.
  • Possible extension of the topic: explore the relationship of text indexing and efficient storage of source code.

Benefits for the Pharo community

  • Fast fulltext search over all code in the IDE.
  • The indexing can be reused. Example: code duplication detection.

Benefits for the student

  • Learn about both modern development enviroments and the basics of text indexing and search.
  • Integration into a prolific research group, fond of software development and programming languages;
  • Understanding of how programming languages and IDEs are implemented;
  • Potential integration as a master and/or PhD student either within the group or within one of its numerous partners around the world (Switzerland, Chile, Belgium, Argentina, Italy).

Bibliography

[1] Andrew Black, St├ęphane Ducasse, Oscar Nierstrasz, Damien Pollet, Damien Cassou and Marcus Denker, Pharo by Example, Square Bracket Associates, 2009, ISBN 978-3-9523341-4-0