Python, Java, CoffeeScript, Clojure and Lua language translator?

Thelightside · February 25, 2016, 10:40pm

I have been noticing some people trying to learn how to code in a different language. Sometimes people do the code correctly in one language but do not know how to code in that language cause I believe that learning more than one language is going to be a lot to learn! So the reason of this topic is to say that we should have a coding language translator designed by our fellow members of CodeCombat. If you agree with me like this comment and leave any info. down below!!! Even if it is hard it is for CodeCombat community!

Cheers

Thelightside · March 1, 2016, 11:59pm

Well it seems like no one wants this project to take place because no one is looking at this so time to invite @ChronistGilver and @nick and also @Luke10

Luke10 · March 2, 2016, 10:29pm

Hey @Thelightside I think that’s a great idea. Hey @nick what do you think?

nick · March 3, 2016, 7:32pm

It is not very easy to do this! You could get the simple parts working, like automatically replacing this and self and such things, but beyond that the languages just behave differently, so automatically and correctly translating between them is like… automatically translating between human languages, and we all know that Google Translate isn’t perfect, plus it took a lot of work to build.

Thelightside · March 3, 2016, 8:14pm

So I am guessing it is a no @nick …Well let’s see what @ChronistGilver thinks about this idea for now let’s just wait.

ChronistGilver · March 3, 2016, 11:49pm

I’m glad you have such faith in me. However, the fact remains that Nick is one of the prime workers on CodeCombat, while I am a lowly forum moderator.

UltCombo · March 4, 2016, 8:32am

Hello there, @Thelightside.

You have a very ambitious goal, to say the least.

For a quick intro, I’m senior full stack developer with well over a decade of programming experience. I also provide mentoring for juniors and CS students. Just to be clear, I have no official association with CodeCombat, apart from being a forum moderator and occasional contributor in the CodeCombat and Aether repositories.

To be perfectly honest, I have not posted in this thread thus far as I may not manage to express what I have to say in a clear enough manner to be of use to you. In any case, you have asked for feedback, and I believe I have some insight that may be of use to you, so let’s get to it.

(golly! That was a pretty big intro)

Let’s start by delineating the scope of the problem: you have a set of programming languages, and you want to freely translate code written in one of these programming languages to another. So far so good.

Now just a small addendum: Nick has provided an excellent, terse and objective answer above—I would only change “It is not very easy to do this!” to “It is extremely hard to do this!”. I’ll now present a slightly more technical viewpoint in order to hopefully make this clearer.

It looks like you have put a lot of thought in the problem, but not so much on the solution. This reminds of one of CodeCombat’s quotes/loading tips: “think of the solution, not the problem.” (Terry Goodkind). I guess you couldn’t really think of the solution because no one pointed you in the right direction, right? Let’s fix that.

A good starting point would be looking at CodeCombat itself. As I believe you have noticed by now, CodeCombat runs code in several languages—Python, JavaScript, CoffeeScript, Clojure, Lua, and soon it will be able to run Java (yes, Java and JavaScript are two completely distinct programming languages. That is a different topic, though).

What you probably don’t know is that CodeCombat (more specifically, Aether) converts all inputted code (independent of the programming language) to JavaScript, which is the only programming language universally supported inside modern browsers. This is done in order to run your code inside the browser for the game simulation and error checking (yes, all this magic happens inside your browser!). Note that Aether actually does far more than just converting other programming languages’ code to JavaScript, but that is a topic for another day.

The important part here is that Aether is able to convert other languages’ code to JavaScript (e.g. Python -> JavaScript, Lua -> JavaScript etc.). This process usually consists of parsing source code into an Abstract Syntax Tree (henceforth AST), applying transformations to this AST and then piping it through a code generation tool, often accompanied by the injection of runtime libraries. Now let’s walk briefly through each of these steps.

You start off with some source code, that is, the code you have typed in the text editor. From the computer’s perspective, it is just a bunch of characters (or just a bunch of bytes, or just a bunch of zeros and ones, depending on how low-level you want to go), the computer can’t do much with these—it is just some text. It may look like valid and logical code to us but the computer only sees a bunch of characters with no special meaning.

Next, we parse that source code into an AST, which, in short, is a tree data structure representing all statements, control flow, declarations, and everything that our original code contained. Now, the source code that used to be a bunch of characters that neither us nor the computer could do much with has become logically structured data that we can much more easily manage and transform to our needs. Though, from the computer’s perspective, these are just some arrays, objects and strings floating in the memory that carry no special meaning.

There are several tools that can parse source code into ASTs. For the JavaScript programming language there are Esprima, Acorn and Espree to name a few parsers off the top of my head. The CodeCombat team has developed parsers for other languages, such as Filbert for parsing Python code.

Now that you have an AST in hands, you can apply the transformations necessary to convert one language to another (that is, converting constructs from the source language that are not supported or work differently in the destination language etc.). You can either do this by traversing the AST manually or using a tool such as Estraverse, Falafel or a more complete code transformation architecture such as Babel (see also the Babel Plugin Handbook, it gives a pretty nice overview of ASTs and code transformation as well).

And now the final step: turn the AST into code again, this time in the destination language. For this you will need a code generation tool that can turn the AST into your destination language’s code. If you want to output JavaScript code, you could use Escodegen, for example.

Note: CoffeeScript was designed as a compile-to-JavaScript language, so you don’t really need to care about the parsing or AST transformation or code generation steps when compiling CoffeeScript to JavaScript, you can just use the official CoffeeScript compiler (which outputs JavaScript).

So, from a very brief and high-level viewpoint, this covers the basics of syntax transformation. And yes, I did mean the basics. I hate to break it down to you but the above is only the very beginning. Programming languages differ not only in syntax, but at much more fundamental levels in how they work.

For instance, Python has a built-in (native) function called len which does not exist in other languages. How do you translate it? Well, CodeCombat (Aether) appends a runtime library to the generated code, which is basically a bunch of functions ported from the original language to the destination language—that is, you have to re-create the original language’s native functions behavior in the target language, writing these functions in the target programming language.

CodeCombat only cares about translating languages to JavaScript, which is necessary to run the code inside of a browser, so they only have to maintain at the maximum one runtime library per language (original language -> JavaScript). You are requesting to be able to translate any language to any language, so you’d have to maintain (n²-n) runtime libraries for a number n of programming languages (one for each possible origin -> destination language permutation), which becomes very impractical as the number of languages grows.

And still, we are only scratching the surface. There are several other fundamental differences between programming languages, such as scoping rules, inheritance, support for functions as first-class citizens, automatic/manual memory allocation, and so forth. You cannot easily translate this kind of language-level features.

You may have noticed that for a few times I have hinted at the computer having no idea what it is doing. And that is the point: no computer has sentience of what it is supposed to do. We, the programmers, issue a bunch of orders and the computer mindlessly carries them out—without any thought about what actually it is or should be doing.

If you want a program to translate between programming languages, you first have to teach your computer all the rules about the original and destination programming languages as well as the logic necessary to translate one statement from one language to another language’s equivalent—essentially, you have to teach your computer to code.

So, yes, this is no easy task. Doing a 100% accurate translation is often not possible, because languages operate fundamentally differently. However, if we consider a restricted environment where only the basic features of the languages are explored (like in CodeCombat), then you can likely give it a try and obtain satisfactory results.

If you or anyone feels like giving it a try, feel free to follow through the links in this post and do your own research. I’m aware this is an extremely complicated task for the general public of this forum (damn, it is even for me!) but here is the thing: in order to be a good programmer, you must not be afraid to fail—as long as you learn something, that is what counts!

For those that are up to the challenge, you don’t necessarily need to follow my instructions. You can do your own research or pull something directly out of your head too. I guess an inexperienced programmer would initially try to do some string replacements, perhaps using regular expressions (henceforth regex). That’s clearly the wrong tool for the job, as most programming languages are way too complex to be properly parsed by regex, and translating a given language to different languages would also require different replacements depending on both origin and destination languages (n²-n), while with ASTs and code generation you have exactly one tool to parse and one tool to generate code for each given language (2n). Well, you don’t actually have to read or understand what I’m saying here, if you feel like you have a solution in mind, by all means, go ahead! The worst that can happen is learning why that was a bad idea in the hard way, which is, in my humble opinion, a pretty good way to learn and remember things. And as I said, as long as you learn something new, it is a win.

As a side note, human brains are much more well suited to learn programming language semantics and translate between programming languages than computers.

Thelightside · March 5, 2016, 11:34pm

(well this will be the stupidest idea ever but whatever)
How about if someone enters their code somewhere and ASAP someone (no one specifically yet) can give them the translated code.

nick · March 7, 2016, 3:18pm

Nice explanation, @UltCombo! The other thing about transforming code into other languages is that often the computer makes it look very messy; that part might even be harder than getting the basics of code generation and runtime libraries working for other languages.

@Thelightside: that could work! We could provide that service if there was enough demand from people willing to pay for it to have someone available to translate code examples. Or, if there were enough people that wanted to use it, we could build a system for crowdsourcing it out to the players. But I think that would be a little specific, since most of the time players just want help on their level, not necessarily translating from one language to another. So the systems we are thinking about building are just general paid mentoring system and crowdsourced help-me-with-my-level-right-now systems.

UltCombo · March 8, 2016, 1:10am

Excellent point!
A good example of this is Babel, a JavaScript compiler whose main goal is tranforming modern JavaScript code into JavaScript code that older environments (browsers/Node.js) are able to run. In the very beginning, Babel focused on outputting human-readable code, but several users noticed that it was not following the specification strictly. Nowadays, Babel abides by the specification as close as possible, but the code it outputs is nearly unintelligible!

I believe we don’t need to build a new system, there are some existing ones that should do the job. For example, you can open a GitHub repository dedicated to this code translation service. The repository does not have to contain any code, just a README file explaining what the repository is for. Then, people can use the issues tracker to post code translation requests and fund these requests through Bountysource, for example.

Thelightside · March 14, 2016, 9:30pm

So basically all I want to know now is that are any of my ideas gonna work???

George_Christovich · March 15, 2016, 7:13pm

Thanks,

I was coming to ask a question about some unexpected behavior from a piece of code. This topic and particularly UltCombo’s contribution answered my question nicely.

I find these survey topics very useful.

Thelightside · March 15, 2016, 10:36pm

Oh in that case @George_Christovich do u think that this topic would help others??

Thelightside · March 20, 2016, 3:05pm

So basically is it a yes or a no??Just asking in curiosity!!!

Neel_Sharma · March 23, 2016, 5:45pm

I personally like the ideas you posted but i’m not sure how they are going to manage it.

Thelightside · March 23, 2016, 9:48pm

Oh thanks @Neel_Sharma I really appreciate!

Topic		Replies	Views
为啥“安息之云山峰”里边有些有翻译有些没有（大部分）呢？ (Questions about missing translations) 中文 (Chinese)	9	2301	January 29, 2018
“Transpiling” from another language Archmage	3	2874	January 25, 2014
Cambio de lenguaje español (Spanish)	10	3092	July 9, 2020
Category definition for Diplomat Diplomat	42	8274	June 5, 2014
Почему половина игры на английском? русский (Russian)	1	1416	September 27, 2016

Python, Java, CoffeeScript, Clojure and Lua language translator?

Related topics