case sensitive

Sort by: Display: Hide controls:

  1. Nicolas Anquetil and Timothy C. Lethbridge. Recovering Software Architecture from the Names of Source Files. In Journal of Software Maintenance: Research and Practice 11 p. 201—21, 1999. DOI 

    We discuss how to extract a useful set of subsystems from a set of existing source-code file names. This problem is challenging because many legacy systems use thousands of files names, including some that are very short and cryptic. At the same time the problem is important because software maintainers often find it difficult to understand such systems. We propose a general algorithm to cluster files based on their names, and a set of alternative methods for implementing the algorithm. One of the key tasks is picking candidate words to try to identify in file names. We do this by (a) iteratively decomposing file names, (b) finding common substrings, and (c) choosing words in routine names, in an English dictionary or in source-code comments. In addition, we investigate generating abbreviations from the candidate words in order to find matches in file names, as well as how to split file names into components given no word markers. To compare and evaluate our five approaches, we present two experiments. The first compares the "concepts" found in each file name by each method with the results of manually decomposing file names. The second experiment compares automatically generated subsystems with subsystem examples proposed by experts. We conclude that two methods are most effective: extracting concepts using common substrings and extracting those concepts that relate to the names of routines in the files.