AI Tools and Software Migration

Hand eines Geschäftsmanns mit KI-Glaskugel und FragezeichenSOFTWARE MIGRATION AND AI

The British compu­ter scien­tist Alan Turing is regarded as one of the most influ­en­tial pioneers in the field of artifi­cial intel­li­gence (AI). In 1950, he formu­la­ted an intel­li­gence test for compu­ters: a compu­ter is intel­li­gent if, in a conver­sa­tion with it, a person cannot recog­nise whether it is a compu­ter or a human.

Today, Turing’s visions have become reality. By recog­nis­ing statis­ti­cal corre­la­ti­ons between words, AI language models can deter­mine the most likely next word (seman­tic proba­bi­lity) and enable human-like communication.

The current areas of appli­ca­tion for AI go far beyond commu­ni­ca­tion. This raises the question: what are the possi­bi­li­ties and limits of AI in software migration?

In the follo­wing, various AI tools are analy­sed with regard to their use in software migration.

Defini­tion of the subject matter

In the follo­wing, “software migration” refers to tool-supported, compi­­ler-based software migration; speci­fi­cally, the process of conver­ting program­mes from a legacy language to a modern programming language using conver­sion tools. Alter­na­tive migration approa­ches such as manual migration are not conside­red. The support of migration projects by e.g. analy­sis tools or tools for test support etc. are also not included in the follo­wing approach.

The inves­ti­ga­ti­ons include the conver­sion of a typical COBOL programme to Java and its proces­sing: on the one hand with various AI tools available on the net, on the other hand with the CoJaC (COBOL to Java Conver­ter) conver­sion tool develo­ped by pro et con. The results are compared and evaluated.

The evalua­tion is inten­ded to provide infor­ma­tion on the extent to which the results of the AI tools are seman­ti­cally equiva­lent to the correct results provi­ded by the COBOL compi­ler, for example. Seman­tic equiva­lence is an essen­tial, if not the most important criter­ion in a migration project.

Test basis

The test is based on a simple COBOL programme. The programme has no logic. However, it conta­ins typical COBOL instruc­tions, which are used in many produc­tive environ­ments. Experi­en­ced COBOL programm­ers will under­stand the COBOL programme immedia­tely. Diffe­rent data defini­ti­ons are made with simul­ta­neous initia­li­sa­tion. The data is manipu­la­ted, e.g. through value assign­ments or overlays, and repre­sen­ted with DISPLAY after each manipulation.

Micro­Fo­cus COBOL Compiler:

The programme was compi­led with the Micro­Fo­cus COBOL compi­ler and then execu­ted. It delivered the follo­wing results.

AI TOOLS

The experi­ments were conduc­ted with the follo­wing AI tools: ChatGPT (GPT-4o and GPT‑3.5), Copilot, CodeCon­vert, CodeGPT, Cursor and Black­box AI.
From the COBOL programme, Java program­mes, which should have identi­cal function­a­lity, were genera­ted by all above mentio­ned AI tools. It should be noted that the various AI tools were not speci­fi­cally trained for the task at hand.

The genera­ted Java program­mes, inclu­ding the results at runtime, are documen­ted below. The focus lies on the two AI tools ChatGPT-4o and Copilot. The tests of the other AI tools analy­sed delivered identi­cal results:

ChatGPT-4o

When the conver­ted programme is execu­ted, it returns incor­rect results.
A second conver­sion produ­ces a diffe­rent source code.
The (incor­rect) results of both conver­si­ons differ.

Copilot (v1.94)

When the conver­ted programme is execu­ted, it returns incor­rect results.
A second conver­sion produ­ces a diffe­rent source code.
The (incor­rect) results of both conver­si­ons differ.

COJAC (COBOL TO JAVA CONVERTER)

Besides other tools, CoJaC is used in pro et con migration projects for conver­ting COBOL program­mes to Java.
Thus, compa­ring the results of CoJaC with those of the AI tools is appropriate:

The COBOL programme was conver­ted to Java using the CoJaC tool from pro et con.
The Java programme was compi­led and execu­ted. At runtime, the Java programme delivered the correct results, identi­cal to the COBOL programme.

Logically, a new conver­sion also delivered identi­cal Java code and identi­cal, correct results at runtime.

DISCUS­SION OF THE FINDINGS

  1. All AI tools tested deliver incor­rect results. The reason for this is that COBOL-speci­­fic language elements are mapped using Java on-board resour­ces. This only works to a limited extent, as there are no seman­ti­cally equiva­lent Java data types/variables for the complex COBOL data descrip­ti­ons and their proper­ties, for example. AI models make predic­tions based on proba­bi­lity, which can only appro­xi­mate 100% seman­tic equiva­lence. Although the under­ly­ing machine or deep learning models can be optimi­sed through ongoing training, it remains to be seen whether – and if so, with what training effort – this 100% seman­tic equiva­lence can be fully achie­ved. In contrast, CoJaC has a library in which all data descrip­ti­ons in COBOL are included in the form of Java classes and packa­ges. These emulate the COBOL data descrip­ti­ons in the Java code. This ensures seman­tic equiva­lence between the COBOL and Java programme.
  2. Upon repea­ted Java genera­tion, all AI tools tested delivered a modified source code. Thus, a conver­sion by an AI tool is not repli­ca­ble 1:1. In a migration project usually compri­sing several million LOC, it is neces­sary to repea­tedly convert a certain number of program­mes (e.g. for further develo­p­ments) in several itera­ti­ons. A source code that is constantly changing as a result creates discrepan­cies in related compon­ents (referen­cing, classes, inter­faces), as each programme commu­ni­ca­tes via inter­faces with other program­mes in the programme system. This is unaccep­ta­ble in a migration project. With a compi­­ler-based migration tool such as CoJaC, a programme conver­sion can be repea­ted as often as requi­red and always delivers identi­cal source code.
  3. Quali­fied COBOL programm­ers will have recog­nised that the COBOL programme is incor­rectly programmed at several positi­ons (e.g. a two-digit number is assigned to a one-digit varia­ble). This is obviously a questionable style, but indeed happens in reality, especi­ally in long-running, complex projects, which is usually the case with legacy software. Nobody can guaran­tee that, at the time of a migration, every­thing will be programmed correctly. And a migration must do justice to this fact by mapping every special case 1:1 seman­ti­cally equiva­lent for criti­cal software. This is exactly what AI tools defini­tely cannot do, whereas the compi­­ler-based approach can.
  4. ‘The’ COBOL does not exist. The language depends on the hardware platform, compi­ler dialects and project-speci­­fic compi­ler settings, which change the behaviour of the system and are not repre­sen­ted in the COBOL code itself. A wide variety of precom­pi­lers are common as well. This means that every legacy system is indivi­dual and requi­res equally indivi­dua­li­sed proces­sing. This is another area where AI tools reach their limits, whereas the compi­­ler-based approach supports this custo­mi­sed processing.

CONCLU­SION

  • To date, AI tools can be used in migration projects, but only as assistants. With the IBM watsonx Code Assistant for Z, this function­a­lity is already included in the name. It can be used to generate unit tests, for example, which are time-consum­ing to create manually. AI tools can also support the commen­ting of code, refac­to­ring and program documen­ta­tion, for example. There are certainly other possi­ble areas of application.
  • Currently, the actual conver­sion of program­mes can only be supported to a limited extent by AI tools. Without training or manual adapt­a­tion, they deliver results at runtime that are not seman­ti­cally equiva­lent to the origi­nal programme. Seman­tic equiva­lence, however, is an indis­pensable prere­qui­site for a successful project.
  • Both the training of the AI tools and the manual adaption demand time and resour­ces, thus approa­ching the effort of a new development.
  • If a custo­mer needs to change the programming language and/or platform within a reasonable project time and budget, a tool-supported, compi­­ler-based software migration is currently the method of choice. AI tools can provide valuable, assis­­tance-based support.