How to infer a program transformation from one example?

This blog is a brief introduction of the ASE’19 paper:

Inferring Program Transformations From Singular Examples via Big Code. [Jiajun Jiang, Luyao Ren, Yingfei Xiong, Lingming Zhang]

As we all know that repetitive program developing is a common practice in real world, where similar or even the same code snippets are created via independent coding or copy-past-like operations. However, repetitive code snippets are considered harmful as they potentially increase the burdern of developers during both developement and maintenance procedures. For example, when a faulty code snippet exists multiple copies in projects, a lot of manual efforts will be required to fix all of them one by one [1–3]. In addition, these code changes are mechanically repetitive and thus tedious and error-prone for developers. Therefore, automating such kind of tasks are vital to reduce human efforts. Similar tasks exist in many other related application scenarios, such as systematic editing [4], refactoring[5], porting commits among forked projects[6], etc.

A recommended method proposed by researchers to automate such repetitive code changes is to learn reusable program transformation patterns from code change examples and then apply the learned patterns to similar code snippets in other places. However, learning a general and reusable program transformation pattern from the given examples are not easy. For example, with the given example shown below,

*f(a, b) => f(g(a), b)                                        ---(1)*

can we perform the desired program transformation for code “f(a+b, c)” as the following?

*f(a+b, c) => f(g(a+b), c)                                    ---(2)*

From the example, we can notice that the code snippets (f(a, b) and f(a+b, c)) share both commonalities (same method call f) and particularities (the first arguments are a and a+b, respectively). Therefore, according to the commonalities, the transformation learned from the first example conform to the second one. However, when considering the particularities, the transformation is not applicable. Therefore, properly deciding how is the learned transformation pattern looks like is critical and hard, i.e., which part of the code change should be considered in the pattern while which part should not.

Depending on Multiple Examples

To tackle this problem most of existing approaches [7–8] depend on the existance of repetitive (multiple) examples, where the common parts among them are regarded as important and should be kept in the learned transformation patterns (need to be matched before applying it), while the other parts should be abstracted away and ignored while matching. For example, if there is another similar code change like the following example.

f(c, d)=>f(g(c), d) *                                         ---(3)*

Both (1) and (3) perform the same code change — wrapping the first argument of method f with g, the learned pattern will keep the method f while ignoring the variable of its arguments. This is decided by identifying the commonatilies among different examples (e.g., both (1) and (3) has the same method f but different arguments). As a matter of fact, repetitive code change examples do not alway exist in real world, where techniques that depend on multiple examples to learn transformation patterns will not work any more.

Depending on One Single Example

Therefore, researchers proposed to infer transformation patterns from one singular example, such as Meng et al. [9] proposed Sydit, which depends on a set of manually defined rules to decide what should be considered in a transformation pattern. For example, all method names should be kept while all variables should be ignored. This approach is effective when transforming a code snippet which is very similar to the given example for pattern learning. For complicated cases, it does not work well. For example, Fig. 1 shows a given code change example, from which we need to learn a transformation pattern.

Figure 1

Then, we need to match the learned pattern with the code snippet shown in Fig. 2 and perform a simialr code change — removing the second argument ***null ***of class Description.

Figure 2

In this case, Sydit will fail to match the learned pattern from Fig.1 to the code snippet in Fig. 2 due to different arguments — method call testClass.getName() in Fig.1 while variable ***name ***in Fig.2. The reason is the predefined rules constrain that all method calls should be matched, e.g., getName(). However, since programs vary greatly even with the same functionality, predefined rules hardly handle all of them.

Is there a general method to decide which parts should be kept in a tramsformation pattern when facing diverse code changes?

The anwser is YES. *GenPat *is such a technique that can infer a reusable program transformation pattern from only one code change example and it does not depend on predefined rules. The big idea behind GenPat is that massive open-source projects can provide valuable guidance to decide how a pattern should be like. In other words, commonly used program elements (e.g., variables or method calls) may be general and should be kept in the inferred pattern. The major difference from prior approaches is that GenPat does not constrain that the common elements have to exist in similar code changes and thus can be infer transformation patterns from one example. The decision logic is relatively simple and flexible — commonly used program elements among projects should be kept in a transformation pattern. For example, the method testClass.getName() is not commonly used among projects, and thus it does not need to be matched before applying the transformation.

In order to enable the flexible transformation pattern inferring process, we first transform programs into graphs (add data dependency edges on original abstract syntax tree) and perform pattern inferring on the graph. Similarly, when applying the learned transformation, a graph matching process will be performed. The overview of the complete process is shown in Fig. 3.

Figure 3

From the figure, we can find that GenPat mainly contains two stages — Inferring and Applying Stages.

Inferring Stage

  1. Extracting Hypergraphs : parses source code before and after changing into hypergraph representations with a set of attributes and relations in the graph.
  2. Extracting Modifications : performs hypergraph differencing and extracts modifications related to corresponding nodes in the graph. Specifically, tree-based differencing, e.g., GumTree, can be used to perfrom the differencing as this hypergraph is a super graph with the original AST as backbone.
  3. Inferring Transformations : based on the previous process, we know which nodes are modified. This step first selects a set of node according to the expansion starting from the changed nodes, which will constitute the context of the transformation. Next, a big code corpus will be utilized to select the attributes of those context nodes, i.e., frequently appeared attributes across projects will be kept in the transformation pattern, while others discarded.

As a result, the final transformation pattern contains a subset of the nodes in the hypergraph and their selected attributes.

Applying Stage

  1. Matching Hypergraph : when given a transformation pattern and an examplar code snippet, GenPat first extracts a hypergraph from the code and tries to match it with the given pattern, where all nodes and attributes in the pattern should be matched with some one in the hypergraph of the examplar code.
  2. Migrating Modifications : after matching the hypergraph with the pattern, a sequnence of mappings can be obtained to record the matching result of the hypergraphs. According to the mappings, modifications related to corresponding nodes can be transferred to the target hypergraph with replacing mapped variables and expressions.

To better understand the working process of GenPat, next let’s see how it works on the example shown in Fig.1 and Fig.2. As introduced, the target is to infer a transformation pattern from Fig.1 and apply it to code snippet in Fig. 2. Therefore, according to the code change in Fig. 1, we can obtain the following graph representation of the code change (ref. Fig. 4), where the second argument null is deleted. Particularly, we consider three kinds of attributes in the pattern — node type (e.g., return), value type (e.g., String), and node relations (e.g., data dependency).

Therefore, in the pattern inferring process, what we do is to decide which node type/value type/node relation should be kept. This process depends on a big code base. For example, the return statement is commonly used among projects, it will be kept. However, the method call getName() is rarely used and will be abstracted away. Finally, all attributes marked with red color are kept in the inferred transformation pattern in Fig. 4, such as the return node and the ***String ***type.

Figure 4

In the applying stage, the inferred transformation pattern (a graph) will be matched to the graph of candidate code snippet, which is shown in Fig. 5.

Figure 5

According to Fig. 5, all attributes in pattern shown in Fig.4 can be matched and the matched attributes are marked as bold black in Fig. 5. The matching result is:

*p1→n1, p2→n2, p3→n3, p4→n4, p5→n5*

Then, based on the modification corresponding to the inferred transformation pattern — “*deleting the node **p4 **from *p2”, and the matching relation, the same code change will be performed on the candidate code — “deleting **n4 **from **n2”. Up to now, a transformation pattern is successfully inferred from the given example (Fig. 1) and applied to the target code (Fig. 2).

In the end, GenPat was evaluated in two distinct application scenarios — systematic editing and automatic program repair. The results are:

Systematic Editing—significantly outperforms the state-of-the-art Sydit with up to 5.5x correctly transformed cases.

Automatic Program Repair —successfully fixed 19 bugs in the Defects4J benchmark, 4 of which have never been repaired by any existing technique.

For more detailed information, please read the corresponding paper at https://xgdsmileboy.github.io/publication/ase19a.

Reference

[1] Kim, Sunghun, Kai Pan, and E. E. Whitehead Jr. “Memories of bug fixes.” Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering. ACM, 2006.

[2] Gao, Qing, et al. “Fixing recurring crash bugs via analyzing q&a sites (T).” 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2015.

[3] Nguyen, Tung Thanh, et al. “Recurring bug fixes in object-oriented programs.” Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 2010.

[4] Kim, Miryung, and David Notkin. “Discovering and representing systematic code changes.” 2009 IEEE 31st International Conference on Software Engineering. IEEE, 2009.

[5] Opdyke, William F. “Refactoring: An aid in designing application frameworks and evolving object-oriented systems.” Proc. SOOPPA’90: Symposium on Object-Oriented Programming Emphasizing Practical Applications. 1990.

[6] Ray, Baishakhi, and Miryung Kim. “A case study of cross-system porting in forked projects.” Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, 2012.

[7] Meng, Na, Miryung Kim, and Kathryn S. McKinley. “LASE: locating and applying systematic edits by learning from examples.” Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 2013.

[8] Rolim, Reudismam, et al. “Learning syntactic program transformations from examples.” Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 2017.

[9] Meng, Na, Miryung Kim, and Kathryn S. McKinley. “Systematic editing: generating program transformations from an example.” ACM SIGPLAN Notices. Vol. 46. №6. ACM, 2011.

发表评论

电子邮件地址不会被公开。 必填项已用*标注