This website includes all the necessary information needed to reproduce the experiments presented in the paper "Type Inference Using Concrete Syntax Propoerties in Flexible Model-Driven Engineering". You can find step-by-step instructions on how to run the experiments in the Instructions section. All the source code needed can be downloaded from the Downloads section. In section Data all the raw data can be downloaded.
Abstract: In traditional Model-Driven Engineering (MDE) models are instantiated from metamodels. In contrast, in Flexible MDE, language engineers initially create example models of the envisioned metamodel. Due to the lack of a metamodel at the beginning, the example models may include errors like missing types, typos or the use of different types to express the same domain concepts. In previous work  an approach that uses semantic properties of the example models to infer the types of the elements that are left untyped was proposed. In this paper, we build on that approach by investigating how concrete syntax properties (like the shape or the color of the elements) of the example models can help in the direction of type inference in. We evaluate the approach on an example model. The initial results suggest that on average 64% of the nodes are correctly identified.
The following image presents the experimentation approach overview as discussed in the paper. For each of the steps of the process, detailed instructions are provided. Readers can start from step 1 to generated their own models, muddles, features signatures lists and results or from any other step by downloading our files from all the previous steps which contain the artefacts generated as part of the experiment presented in the paper.
- Step 0: Prerequisites
- For steps 1 & 2 the full Epsilon suite is required. An Eclipse version that contains the Epsilon package can be downloaded from here.
- For steps 1, 2 & 3 a Java Runtime Environment (JRE) should be installed. JRE can be downloaded from here.
- For steps 4-6 the rpart R library should be installed and loaded (run the command "library(rpart)" before executing our functions). Rpart can be downloaded from here.
- Step 1: Features Signature Generation
- Download the "Type Inference" source code from here.
- Import the 2 projects into Eclipse. (org.eclipse.epsilon.emc.graphml and org.eclipse.epsilon.emc.muddle projects override the current published EMC Muddle driver to support the technical functionality needed for this work.)
- Navigate to /src/CARTTypeInferenceGraphical.java class of the org.eclipse.epsilon.emc.graphml project.
- Change the value of lines 18-20 (Muddle graph = importer.importGraph(new File("..."));) to point to the muddle for which you need to generate the signatures for. In case you're working with the example Muddle we discuss in the paper then point to that .graphml file.
- Change the value of the NOISE_CHANCE_PERCENTAGE of line 29 to pass the value of the desired added-noise.
- If you're working with another muddle and not the one provided, then change the value of line 35 (which is 104 in the version you have) to the number of nodes that your diagram has. ([HINT:] Run the script once and the script will tell you how many nodes your diagram has. Subtract 1 from this value and replace 104 with it.)
- [OPTIONAL] If you're working with adding noise you may need to add to the two lists created in lines 48-62 more shapes and colors which will be injected in the diagram.
- Run the script as Java Application. The generated signature file is stored in the same folder under the name outputForMuddlesXX.txt where XX represents the noise level you used.
- The features signatures files for all the 6 added-noise levels of the experiment presented in the paper can be downloaded from the Downloads section here.
- Steps 2-3: Sampling, Classification & Score Calculation
- Download the R scripts from here.
- Source the R script (an IDE like R-Studio is suggested to be used).
- Run the function "cartSamplingGraphical(filename, sampling)". The filename parameter points to one of the feature signatures list generated from the previous step (e.g. "Users/....../outputForMuddles0.txt"). The sampling parameter represents the desired sampling percentage in decimal values (e.g. 0.30 for 30%).
- The script will run 10 times (10-Fold sampling) and will calculate the success score in each run plus the average success score for the 10 runs (last double value printed).
- The results presented in the paper can be downloaded from the Downloads section here.