Hello there, @Thelightside.
You have a very ambitious goal, to say the least.
For a quick intro, I’m senior full stack developer with well over a decade of programming experience. I also provide mentoring for juniors and CS students. Just to be clear, I have no official association with CodeCombat, apart from being a forum moderator and occasional contributor in the CodeCombat and Aether repositories.
To be perfectly honest, I have not posted in this thread thus far as I may not manage to express what I have to say in a clear enough manner to be of use to you. In any case, you have asked for feedback, and I believe I have some insight that may be of use to you, so let’s get to it.
(golly! That was a pretty big intro)
Let’s start by delineating the scope of the problem: you have a set of programming languages, and you want to freely translate code written in one of these programming languages to another. So far so good.
Now just a small addendum: Nick has provided an excellent, terse and objective answer above—I would only change “It is not very easy to do this!” to “It is extremely hard to do this!”. I’ll now present a slightly more technical viewpoint in order to hopefully make this clearer.
It looks like you have put a lot of thought in the problem, but not so much on the solution. This reminds of one of CodeCombat’s quotes/loading tips: “think of the solution, not the problem.” (Terry Goodkind). I guess you couldn’t really think of the solution because no one pointed you in the right direction, right? Let’s fix that.
A good starting point would be looking at CodeCombat itself. As I believe you have noticed by now, CodeCombat runs code in several languages—Python, JavaScript, CoffeeScript, Clojure, Lua, and soon it will be able to run Java (yes, Java and JavaScript are two completely distinct programming languages. That is a different topic, though).
What you probably don’t know is that CodeCombat (more specifically, Aether) converts all inputted code (independent of the programming language) to JavaScript, which is the only programming language universally supported inside modern browsers. This is done in order to run your code inside the browser for the game simulation and error checking (yes, all this magic happens inside your browser!). Note that Aether actually does far more than just converting other programming languages’ code to JavaScript, but that is a topic for another day.
The important part here is that Aether is able to convert other languages’ code to JavaScript (e.g. Python -> JavaScript, Lua -> JavaScript etc.). This process usually consists of parsing source code into an Abstract Syntax Tree (henceforth AST), applying transformations to this AST and then piping it through a code generation tool, often accompanied by the injection of runtime libraries. Now let’s walk briefly through each of these steps.
You start off with some source code, that is, the code you have typed in the text editor. From the computer’s perspective, it is just a bunch of characters (or just a bunch of bytes, or just a bunch of zeros and ones, depending on how low-level you want to go), the computer can’t do much with these—it is just some text. It may look like valid and logical code to us but the computer only sees a bunch of characters with no special meaning.
Next, we parse that source code into an AST, which, in short, is a tree data structure representing all statements, control flow, declarations, and everything that our original code contained. Now, the source code that used to be a bunch of characters that neither us nor the computer could do much with has become logically structured data that we can much more easily manage and transform to our needs. Though, from the computer’s perspective, these are just some arrays, objects and strings floating in the memory that carry no special meaning.
There are several tools that can parse source code into ASTs. For the JavaScript programming language there are Esprima, Acorn and Espree to name a few parsers off the top of my head. The CodeCombat team has developed parsers for other languages, such as Filbert for parsing Python code.
Now that you have an AST in hands, you can apply the transformations necessary to convert one language to another (that is, converting constructs from the source language that are not supported or work differently in the destination language etc.). You can either do this by traversing the AST manually or using a tool such as Estraverse, Falafel or a more complete code transformation architecture such as Babel (see also the Babel Plugin Handbook, it gives a pretty nice overview of ASTs and code transformation as well).
And now the final step: turn the AST into code again, this time in the destination language. For this you will need a code generation tool that can turn the AST into your destination language’s code. If you want to output JavaScript code, you could use Escodegen, for example.
Note: CoffeeScript was designed as a compile-to-JavaScript language, so you don’t really need to care about the parsing or AST transformation or code generation steps when compiling CoffeeScript to JavaScript, you can just use the official CoffeeScript compiler (which outputs JavaScript).
So, from a very brief and high-level viewpoint, this covers the basics of syntax transformation. And yes, I did mean the basics. I hate to break it down to you but the above is only the very beginning. Programming languages differ not only in syntax, but at much more fundamental levels in how they work.
For instance, Python has a built-in (native) function called len
which does not exist in other languages. How do you translate it? Well, CodeCombat (Aether) appends a runtime library to the generated code, which is basically a bunch of functions ported from the original language to the destination language—that is, you have to re-create the original language’s native functions behavior in the target language, writing these functions in the target programming language.
CodeCombat only cares about translating languages to JavaScript, which is necessary to run the code inside of a browser, so they only have to maintain at the maximum one runtime library per language (original language -> JavaScript). You are requesting to be able to translate any language to any language, so you’d have to maintain (n²-n)
runtime libraries for a number n
of programming languages (one for each possible origin -> destination language permutation), which becomes very impractical as the number of languages grows.
And still, we are only scratching the surface. There are several other fundamental differences between programming languages, such as scoping rules, inheritance, support for functions as first-class citizens, automatic/manual memory allocation, and so forth. You cannot easily translate this kind of language-level features.
You may have noticed that for a few times I have hinted at the computer having no idea what it is doing. And that is the point: no computer has sentience of what it is supposed to do. We, the programmers, issue a bunch of orders and the computer mindlessly carries them out—without any thought about what actually it is or should be doing.
If you want a program to translate between programming languages, you first have to teach your computer all the rules about the original and destination programming languages as well as the logic necessary to translate one statement from one language to another language’s equivalent—essentially, you have to teach your computer to code.
So, yes, this is no easy task. Doing a 100%
accurate translation is often not possible, because languages operate fundamentally differently. However, if we consider a restricted environment where only the basic features of the languages are explored (like in CodeCombat), then you can likely give it a try and obtain satisfactory results.
If you or anyone feels like giving it a try, feel free to follow through the links in this post and do your own research. I’m aware this is an extremely complicated task for the general public of this forum (damn, it is even for me!) but here is the thing: in order to be a good programmer, you must not be afraid to fail—as long as you learn something, that is what counts!
For those that are up to the challenge, you don’t necessarily need to follow my instructions. You can do your own research or pull something directly out of your head too. I guess an inexperienced programmer would initially try to do some string replacements, perhaps using regular expressions (henceforth regex). That’s clearly the wrong tool for the job, as most programming languages are way too complex to be properly parsed by regex, and translating a given language to different languages would also require different replacements depending on both origin and destination languages (n²-n), while with ASTs and code generation you have exactly one tool to parse and one tool to generate code for each given language (2n). Well, you don’t actually have to read or understand what I’m saying here, if you feel like you have a solution in mind, by all means, go ahead! The worst that can happen is learning why that was a bad idea in the hard way, which is, in my humble opinion, a pretty good way to learn and remember things. And as I said, as long as you learn something new, it is a win.
As a side note, human brains are much more well suited to learn programming language semantics and translate between programming languages than computers.