[PhD] Multi-Facet Actionable Analytics for Information System Rejuvenation

The RMod research team (France) is opening a PhD position:

Context and Challenges

Information Systems are one of the key software backbones of our society and economy. They manage key data of our lives and activities: Insurance, payroll, CRM, or human resource management system. Often they are the cornerstone of organizations and key enablers of revenue. 

Reverse engineering, maintenance and evolution of software assets such as Information Systems has been identified by Deloitte as one of the 10 future breakthroughs in IT.

Organizations managing Information Systems are facing the following hard problems on daily use:

  • Old Languages.Very often, Information Systems’ lifetime spans decades. They survive technology hypes. But the counter part is that they are developed in programming languages that seem old and out of fashion compared to modern technology. For example, half of the business of a large insurance group is programmed in a language that does not exist according to Google. 
  • Aging Software.Since information systems grow over a long period of time, the underlying software is aging. It frequently contains dead or duplicated code, obsolete documentation, lack of tests. Since original developers are often not longer part of the project,  the overall knowledge of the application is scattered and incomplete.
  • Lack of Tools.  Often old languages lack modern tooling such as metrics, refactoring, test coverage, therefore it is difficult to exact information and control the evolution of an Information Systems. For example, performance analysis is often difficult to do because there are no off-the-shelf tools for old language.
  • Lack of knowledge.It is often difficult to understand the flow of information and processes embedded in the software. Over the years, the systems had to interact with different technologies  (REST, webservices,…) that may not even exist anymore. Yet this had an impact on the architecture of the system. Regularly past architecture decisions are lost and new changes unknowingly break basic assumptions or important invariants.
  • Changes at high risk.  The lack of knowledge coupled to the fact that there is often no or limited test available, turn any change into a very risky task. Developers are then hampered to do more than bug fixes or immediate client requirements.


The goal of the PhD is to support the “Rejuvenation of Information Systems”. The experiments and validation of the results will happen in the context of the PowerBuilder Information System of the CIM company. To support the PhD, CIM is paying an expert engineer to build infrastructure (parser, meta-model,) dedicated to PowerBuilder for the Moose open-source platform. The student will use and extend this infrastructure (software maps, quality assistant) for building new generation tools. 

The student will work on the following challenges:

  • Reverse engineering. Reverse engineering is not new. However, extracting key views that  support decision making is complex since it depends on local context (business, process, framework constraints). Such contextual approach does not have a formal frame but it advocated by “Actionable Analytic”@_ftnref1. The student will work on how to support the reverse engineering of Information Systems taking into account their local context. This reverse engineering will integrate information from various different sources such as structural information, data flow between identified components, authors, bug reports, etc. This is a complex task because of the intrinsic complexity of the legacy and the local context. 
  • Actionable quality assessment. The student will develop domain and language specific quality assessment maps. The quality assessment will provide reports and maps about dead code, code duplication, specific metrics adapted to the language and the domain (form, specific database call, specific procedure).
  • Run-time analysis and program charge.CIM is planning to expand on a new market of large insurance companies. It is worried that its products may not scale up to the amount of data this will imply. There is a need to identify and understand optimization opportunities. How can we support understanding the run-time performances of an information system? This is a complex task because performance gains and losses are spread among different software layers (graphical interface, telecommunications, core application, database), and it is not clear where one should focus. In addition, instrumenting the legacy code and the other systems interacting with it (e.g.the database back-ends) is not straightforward.

Identified Tasks

The student will work incrementally in 3 months “sprints” on the following tasks : 

  • Learn Powerbuilder, Moose and meta-modeling, literature review
  • Identify contextual information (local/team patterns, frameworks constraints, processes).
  • Define actionable metrics or queries. 
  • Build first actionable analysis (local anti-pattern identification) and related maps.
  • Validate with development team.
  • Build first run-time actionable analysis ( database anti-pattern identification) and related maps. 
  • Identify concrete run-time bottlenecks. 

RMOD Supervisors: Stéphane Ducasse (program understand, analyses, tooling), Nicolas Anquetil (code analysis, quality metrics, program transformation) et Anne Etien (tests, database, information systems).

Advanced engineer: Guillaume Larcheveque

Posted by admin at 20 June 2018, 1:20 pm link