Everything you should know when design a domain-specific language (Pt. 1): General-purpose vs. Domain-specific

Hello guys, this series is going to discuss about the design and implementation of a domain-specific language (DSL). Actually this is not a new topic all over the world, when studying at the university, me and my partner have created a transpilling language based on Java to overcome the disadvantages of general-purpose and domain-specific languages (However, it's already dead 😒), therefore we have done a ton of research about GPL and DSL from the internet and textbooks, thesis papers as well. Because of its own nature (as a thesis), it will contain lots of words, so I hope you won't be bored when following my series 😄. Now, let's rock and roll :love_you_gesture:

Most programming languages nowadays are often recognized by programmers as a tool to build software. Many of these languages are built for generalized problems, that is, they can be presented in programs in many application domains. Some of the more popular applications domains are the development of desktop applications, web applications (websites, web services), mobile applications, business systems, operating systems, games. These domains see the domination of many languages like C, C++, Java, C#, Python, etc. These languages are not only programming languages, but are also categorized as general-purpose language (GPL). A general-purpose programming language is a programming language that can be applied to solve problems in a variety of application domains. It does this by including non-specific generalized language constructs that only describe the logic of the program, not the instructions whose meanings refer to the logic of a specific domain. These language constructs are closely related to the execution steps required to complete an algorithm. In fact, this is the original purpose of programming languages; they are designed to describe algorithms, to specify calculations that computers can then perform. Nowadays, writing code in a general-purpose programming language is straightforward and one can build many types of software with them. Some programming languages that are general-purposeare: C, C++, Java, C#, Python, Go, JavaScript, etc. These languages, along with many more, are used every day to build software from websites, desktop applications to operating systems and computer-embedded engines.

As mentioned, general-purpose languages are packed with universal language constructs that can be understood as logical instructions to a computer. For example, the conditional statement if-else is a famous language construct that in computer programs means jump instructions. This means that the language constructs in a general-purpose programming language are close to machine logic than to application or business logic. While this is beneficial for programmers allowing many types of application logic to be composed of machine logic, it convolutes and hides away real application logic with layers of general-purpose source code. For example, to solve a quadratic the following code is needed in C#, high-level modernized general-purpose programming language:

using System;
class QuadraticFormulaSolver
{
    static void Main()
    {
        int a, b, c;
        double d, x1, x2;
        // code to retrieve the values for a, b, c
        a = 1; b = -2; c = 1;
        d = b * b - 4 * a * c;
        if (d == 0)
        {
            x1 = x2 = -b / (2.0 * a);
        }
        else if (d > 0)
        {
            x1 = (-b + Math.Sqrt(d)) / (2 * a);
            x2 = (-b - Math.Sqrt(d)) / (2 * a);
        }
        else
        {
            x1 = x2 = Double.NaN;
        }
    }
}

This is a simplified version, with a lot of boilerplate code removed, not to mention the fact that it is riddled with potential severe loss of significance problem. Regardless, looking at the code, it is hard to grasp the concept of what the code is trying to do at first and it will be a lot harder if not for the naming of identifiers. This is the code in MATLAB that performs the same function:

syms x
eqn = 1*x^2 - 2*x + 1 == 0;
sol = solve(eqn)

MATLAB is a programming language for numerical computations. It is clear that the 3-liner code from MATLAB is much easier to understand and implement, because it is specifically designed to convey application logic, mathematic logic in this case, and not machine logic. This complication had resulted in what is known as domain-specific language (DSL). A domain-specific language is a language designed to express statements in a particular problem space, or domain. Often taking away language constructs that are standard in GPLs, domain-specific programming languages add statements and expressions that are only meaningful in the context of the problem it tries to solve. This is both a good thing and a bad thing. First, since DSLs contain language constructs that are closer to application and business logic than to machine logic, it can simplify coding by providing apparent high-level meaning to the source code. It also encourages the practice of writing less code that do more since one application logic usually corresponds to a series of machine logic. For its seemingly limited use, domain-specific languages actually dated back as long as computer languages and still has a very widespread adoption in today's usage in computing. Almost all fourth-generation languages are domain-specific languages. ABAP, short for Advanced Business Application Programming, proprietary enterprise programming language for business applications of large scale. SQL – Structured Query Language –the most popular query language for accessing data in technologies such as database, datastores, filesystems, etc. Shell scripts such as Bash, Batch are also domain-specific programming languages. Other than programming languages, there are other types of computer languages that are domain-specific. These languages appear all over the Internet. XML – Extended Markup Language – for data exchange format, HTML – Hypertext Markup Language – for document storage. WebAssembly, or wasm, the new proposal for logic execution on the World Wide Web. Not only these languages are efficient in solving their respective problems, there are almost no general-purpose languages that can reach the same level of expectation. For example, JavaScript has been the standardized method for executing logic on the Web. However, WebAssembly is proposed to be a new portable format that is not only more efficient in execution, but also in parsing, compiling, and delivering, up to 8 times faster in certain tests. Due to the nature of domain-specific languages, they are almost never used outside of their problem spaces. Inside of their respective domains, they are the dominant languages and no general-purpose languages can easily take up their places.

(To be continue)