HOME > SOFTWARE MIGRATION > AI
SOFTWARE MIGRATION AND AI
The British computer scientist Alan Turing is regarded as one of the most influential pioneers in the field of artificial intelligence (AI). In 1950, he formulated an intelligence test for computers: a computer is intelligent if, in a conversation with it, a person cannot recognise whether it is a computer or a human.
Today, Turing’s visions have become reality. By recognising statistical correlations between words, AI language models can determine the most likely next word (semantic probability) and enable human-like communication.
The current areas of application for AI go far beyond communication. This raises the question: what are the possibilities and limits of AI in software migration?
In the following, various AI tools are analysed with regard to their use in software migration.
Definition of the subject matter
In the following, “software migration” refers to tool-supported, compiler-based software migration; specifically, the process of converting programmes from a legacy language to a modern programming language using conversion tools. Alternative migration approaches such as manual migration are not considered. The support of migration projects by e.g. analysis tools or tools for test support etc. are also not included in the following approach.
The investigations include the conversion of a typical COBOL programme to Java and its processing: on the one hand with various AI tools available on the net, on the other hand with the CoJaC (COBOL to Java Converter) conversion tool developed by pro et con. The results are compared and evaluated.
The evaluation is intended to provide information on the extent to which the results of the AI tools are semantically equivalent to the correct results provided by the COBOL compiler, for example. Semantic equivalence is an essential, if not the most important criterion in a migration project.
Test basis
The test is based on a simple COBOL programme. The programme has no logic. However, it contains typical COBOL instructions, which are used in many productive environments. Experienced COBOL programmers will understand the COBOL programme immediately. Different data definitions are made with simultaneous initialisation. The data is manipulated, e.g. through value assignments or overlays, and represented with DISPLAY after each manipulation.
MicroFocus COBOL Compiler:
The programme was compiled with the MicroFocus COBOL compiler and then executed. It delivered the following results.
AI TOOLS
The experiments were conducted with the following AI tools: ChatGPT (GPT-4o and GPT‑3.5), Copilot, CodeConvert, CodeGPT, Cursor and Blackbox AI.
From the COBOL programme, Java programmes, which should have identical functionality, were generated by all above mentioned AI tools. It should be noted that the various AI tools were not specifically trained for the task at hand.
The generated Java programmes, including the results at runtime, are documented below. The focus lies on the two AI tools ChatGPT-4o and Copilot. The tests of the other AI tools analysed delivered identical results:
ChatGPT-4o
When the converted programme is executed, it returns incorrect results.
A second conversion produces a different source code.
The (incorrect) results of both conversions differ.
Copilot (v1.94)
When the converted programme is executed, it returns incorrect results.
A second conversion produces a different source code.
The (incorrect) results of both conversions differ.
COJAC (COBOL TO JAVA CONVERTER)
Besides other tools, CoJaC is used in pro et con migration projects for converting COBOL programmes to Java.
Thus, comparing the results of CoJaC with those of the AI tools is appropriate:
The COBOL programme was converted to Java using the CoJaC tool from pro et con.
The Java programme was compiled and executed. At runtime, the Java programme delivered the correct results, identical to the COBOL programme.
Logically, a new conversion also delivered identical Java code and identical, correct results at runtime.
DISCUSSION OF THE FINDINGS
- All AI tools tested deliver incorrect results. The reason for this is that COBOL-specific language elements are mapped using Java on-board resources. This only works to a limited extent, as there are no semantically equivalent Java data types/variables for the complex COBOL data descriptions and their properties, for example. AI models make predictions based on probability, which can only approximate 100% semantic equivalence. Although the underlying machine or deep learning models can be optimised through ongoing training, it remains to be seen whether – and if so, with what training effort – this 100% semantic equivalence can be fully achieved. In contrast, CoJaC has a library in which all data descriptions in COBOL are included in the form of Java classes and packages. These emulate the COBOL data descriptions in the Java code. This ensures semantic equivalence between the COBOL and Java programme.
- Upon repeated Java generation, all AI tools tested delivered a modified source code. Thus, a conversion by an AI tool is not replicable 1:1. In a migration project usually comprising several million LOC, it is necessary to repeatedly convert a certain number of programmes (e.g. for further developments) in several iterations. A source code that is constantly changing as a result creates discrepancies in related components (referencing, classes, interfaces), as each programme communicates via interfaces with other programmes in the programme system. This is unacceptable in a migration project. With a compiler-based migration tool such as CoJaC, a programme conversion can be repeated as often as required and always delivers identical source code.
- Qualified COBOL programmers will have recognised that the COBOL programme is incorrectly programmed at several positions (e.g. a two-digit number is assigned to a one-digit variable). This is obviously a questionable style, but indeed happens in reality, especially in long-running, complex projects, which is usually the case with legacy software. Nobody can guarantee that, at the time of a migration, everything will be programmed correctly. And a migration must do justice to this fact by mapping every special case 1:1 semantically equivalent for critical software. This is exactly what AI tools definitely cannot do, whereas the compiler-based approach can.
- ‘The’ COBOL does not exist. The language depends on the hardware platform, compiler dialects and project-specific compiler settings, which change the behaviour of the system and are not represented in the COBOL code itself. A wide variety of precompilers are common as well. This means that every legacy system is individual and requires equally individualised processing. This is another area where AI tools reach their limits, whereas the compiler-based approach supports this customised processing.
CONCLUSION
- To date, AI tools can be used in migration projects, but only as assistants. With the IBM watsonx Code Assistant for Z, this functionality is already included in the name. It can be used to generate unit tests, for example, which are time-consuming to create manually. AI tools can also support the commenting of code, refactoring and program documentation, for example. There are certainly other possible areas of application.
- Currently, the actual conversion of programmes can only be supported to a limited extent by AI tools. Without training or manual adaptation, they deliver results at runtime that are not semantically equivalent to the original programme. Semantic equivalence, however, is an indispensable prerequisite for a successful project.
- Both the training of the AI tools and the manual adaption demand time and resources, thus approaching the effort of a new development.
- If a customer needs to change the programming language and/or platform within a reasonable project time and budget, a tool-supported, compiler-based software migration is currently the method of choice. AI tools can provide valuable, assistance-based support.