A Pointless to Js Transpiler
Hey, I have been following pointless documentation for a day now and it seems extremely interesting. I found pointless through r/ProgrammingLanguages. I visited your website and since I myself like to make programming languages, I was thinking what if we made a Pointless to Js transpiler to increase the performance of Pointless (I read about the performance concern on Pointless's website).
I see ways to simulate many features in Js such as partial application and pipeline operators. Any specific reasons as to why this couldn't be done will be appreciated and if you do decide to make a Js transpiler then I'd surely try to help as much as I can.
Hey, thanks for the message. I think that a Pointless to JS transpiler could be a cool project - it's one that I've though about before myself. I agree that most of the features in Pointless would translate well to JS - the biggest difficulty I think would be figuring out how to accommodate tail-call optimization, which JS doesn't support natively, but which Pointless requires. I know Elm, which transpiles to JS), handles tail-recursion optimization, but a transpiler for Pointless would have to optimize non-recursive tail-calls as well. I don't have any leads on how this problem might be solved - but I'd be happy to discuss any ideas you have!
I think tail call optimization is more of you know, an optimization rather than an innate feature of a language that needs implementation once the core is there. I think we can have a transpiler that transpiles without TCO but then we could use some babel plugin like this to transpile the Js that Pointless transpiler compiled. The aforementioned plugin works unless Pointless does tail-call optimization on mutually recursive functions.
I, in general, don't think TCO is something that needs to be prioritized to such a degree because mainstream functional languages like Clojure still do very naive TCO and get away with it. Another way to speed-up could be to re-write this interpreter in something more low-level like Nim. I mean you could re-write it in C but then garbage collection would be a problem. It could be a huge speed-up and I know enough Nim that I might be able to help if you decide to walk towards that route too.
I am saying all of this because one of my languages had major performance concerns(by virtue of being written in Python), it was used by some relatively small local institutes mainly for educational purposes because it was based on my native language. I was able to improve performance by compiling it to Js and I am currently writing an interpreter for it in Nim to have a faster yet equally user-friendly ecosystem.
Btw, is there a Pointless BNF?
No BNF yet I'm afraid - the parser code is fairly well commented, but a real grammar spec would certainly be good to have.
With regard to tail-recursion, the one concern is that since Pointless doesn't have normal looping structures, functions end up relying on tail-recursion pretty heavily. Removing tail-recursion optimization would cause many of the list-handling functions in the standard library to stop working for lists with length > 1000 (the call depth limit). Assuming that we have tail-recursion working, though, that might be good enough, even without full tail-call optimization. I'd have to look through the prelude to see if any functions currently use mutual tail-recursion.
You mention the possibility of re-implementing Pointless in a language like Nim or C - as it happens, Pointless was implemented in C up until recently, (see commit history up to c1117c. The C version was much faster than the current Dart implementation, in part because it was written as a bytecode compiler / VM instead of an interpreter, but I found that the time it took to make modifications to the C implementation made it hard to experiment with new language features.
That being said, using Nim for implementation could provide a good balance between performance and ease-of-use. You should take a look at the old C implementation and tell me if you think it would translate well to Nim!
I would start by commenting on tail-recursion. I understand that pointless itself doesn't have any loop structures and some list functions might be broken without tail-recursion optimization but compiling to Js would require having a standard prelude written in Js anyway so why not have standard prelude and runtime re-written by hand, using loops. The only problem I see is that it may be considered a bit "unethical" or a non-functional approach but remember that the language itself will still be largely functional and I don't theoretically see that as too huge of a concern.
As for the C implementation, just went through seemingly relevant parts of it and it appears that it would be well-translateable and would result in significantly shorter nim code. Like all the structs would become nim types, the stack size can be unlimited with nim sequences, GC wouldn't need to exist since nim does the work for you, most of the tokenizer would remain the same, compiler's translation would be straight forward. I nonetheless think that if you decide to go the VM route, it will still sort of have the same problem of being hard to be experimented with.
I think if keeping the implementation simple enough to experiment is a goal then a translation of the Dart implementation to nim isn't too farfetched either. The dispatch function in the interpreter rather than some weird visitor pattern would translate well to nim, all of the values would translate to nim's version of tagged unions which are very similar to C's version but just way more powerful. Nodes themselves could either translate to another tagged union visited by the aforementioned dispatch function (Way more efficient than the contrary) or to mutually recursive type definitions, visited by nim's multi-methods (Exponentially more elegant and readable than the contrary).
Yeah the question of how "pure" to keep the implementation if translating to JS isn't something I have a good answer to. I'm more inclined to focus on improving the interpreter / compiler situation than going town the transpiling path, although if that's something you'd like to explore you're welcome to!
If the goal here is to have a faster Pointless implementation, then going from a Dart-based interpreter to a Nim-based interpreter probably won't do much (I tried writing a bytecode compiler / VM in Dart and it wasn't any faster than the interpreter!). So for the work to be fruitful I think that writing a new compiler / VM in Nim would be the way to go. I'm working right now to make examples for Pointless and improve the language itself, so a re-implementation would be a project for the future, but again if that's something you'd like to start looking into I'm happy to provide any help / advice that I can.
Yes, I happened to be leaning more towards the idea of implementing it in nim. I think I would like to implement this programming language with either an interpreter or a byte-code compiler and VM, I just started working on the tokenizer yesterday and almost got it done. I would share it with you as soon as I am done.
Would an interpreter in C be fast? Because nim more or less is equivalent in speed with C, little differences here and there that may or may not matter. I would like to make an informed and thought out decision on it being a VM based or an interpreter. I think I'll make this decision when I'm done with the parser to make it a have it a bit saner.
Edit: Got the tokenizer fully working.
Awesome! Can't wait to see it!
I think there would be a substantial slowdown going from a c vm to a c interpreter -- more importantly, I think it might be harder to implement an interpreter in c than a vm. I'd recommend going the vm route.
I'll put together some local test cases that I have for the tokenizer and send them to you if you want to run them against your implementation.
I've also been working on a Pointless tutorial / example that you might find interesting: https://ptls.dev/tutorials/factorsVM.html
edit: oops, didn't mean to close the issue
Definitely send the tests over, they'll be a huge help. Ran it against a bunch of tutorial examples found some bugs and fixed them. Now it runs fine with all the tutorial examples I have tried. Yeah I think VM route seems more interesting. We'll see when the parser get's completed but it'll most probably be a VM. My PC stopped working for a day but now I got it fixed so, I'll hopefully get to implement the most of the parser tommorow.
Check out this fork for the tests: https://github.com/averynortonsmith/pointless
^ also contains an updated makefile -- you should be able to run make test to run the tests
The tests inputs are what's important -- the expected outputs are just the current output from the parser / tokenizer. That means, for example, that if you're implementation uses a different string representation for AST nodes, you'll get different outputs.
Thanks for the tests, worked out really well for me. Finally got all the tests running for the parser and the tokenizer. I was thinking why not implement both the interpreter and the VM/Compiler. I have the front-end done so, I don't think that would be amount to twice the work, and good news would be that we can experiment on one implementation (the interpreter) while keeping the other (VM and compiler) stable. We would test out a feature in the interpreter version and once a feature is completed and final, we can implement it in the VM version.
I think I will get around making the interpreter first and this interpreter would be done in like a day or two (at most, a eek) then I'll upload that implementation, and start working on the VM/Compiler.
edit: Tests really do make life easier, do you have some for the interpreter?
Sounds good! Take your time too -- I'm sure you've got other things on your plate and you don't want to get burned out!
I have some old interpreter tests lying around. I'll update them and send them to you tomorrow.
Here are some files from a previous attempt at a bytecode VM in Dart https://github.com/averynortonsmith/dartPtlsVM. I abandoned that approach when I did some preliminary performance tests and realized that it wasn't any faster than the interpreter I'd written. I think that Dart is too high-level for the performance benefits of a bytecode VM to really come through. Although I didn't finish the Dart VM, the compiler has some substantial improvements over the C implementation (in particular updates to the instruction set and handling of closures). Could be helpful when working on a new compiler, although I agree that doing the interpreter first makes sense.
You should also know that there are a few features of the current Dart Pointless implementation that I never implemented in the compilers / VMs I prototyped. These include, off the top of my head: try/catch/throw, readling lines from stdin as a lazy list, and the repl; though I have ideas about how all of these could be done.
You might also want to think about this when writing the interpreter #9
Ok here are some interpreter tests: https://github.com/averynortonsmith/pointless/tree/master/tests/interpreter
They don't cover all language features (and I tried to remove tests that covered removed language features but may have missed a few), but should still be helpful.
Once again thanks for the test. I got the basic interpreter working (except for the imports and prelude, I'll add them once I test everything out). I am fixing bugs and testing it right now, and hopefully, it'll be done pretty soon. For now, before reading the new issue that you sent I had already implemented the tracking of errors but I more or less do think that it's a particularly great idea because although it is possible that it can be improved but IMO that could still be way too much overhead, and if a user is going into a dynamic language he/she probably knows what they are getting themselves into, from the debugging point of view.
There are also many examples of successful languages not caring, such as Python. Also improving the current state of this feature would also require many substantial design considerations such as where to or not to track the objects?, when is it worth it for "Objects" and "Dicts" to be tracked? and other similar decisions.
It's definitely doable but if I was you, I would rather focus on improving the language's core right now then to go down a rabbit-hole of weird design decisions, as the former is what I consider is the more important choice for a relatively young programming language. Edit: Just a question. Where's the reduce function that's used by the prelude defined? NVM, figured it out.
Glad the tests are helpful! Let me know if you need any more around a specific feature.
Ok, I think I'm convinced to set-aside the object location tracking for now. I'll take it out of the current interpreter implementation.
Just got done with pretty much the entire interpreter, along with module system, and am currently improving a bit on errors. Should I upload it as a fork of this repository or as a separate repository?
Maybe a fork with your changes on a separate branch?
I can't wait to trying it out!
There are features that are yet to be implemented such as ~~caching imports and~~(Just added that) automated inclusion of prelude, but here it is.
Another thing I recently had on my mind was binary infix functions, should I add them, if so then I think I should do a PR here too. The idea came from another language I am working on, it's very similar to Pointless in concept except its purpose is to teach functional programming to people who already know other imperative programming languages.
The Nim implementation looks awesome! You're now one of two people in the world who understands Pointless top-to-bottom.
A few notes I have from trying out the implementation:
-
I see that you've created different types in Nim for the different types of ASTNode. It's a smart move, and probably something I should adopt for the Dart version as well, since it provides more type-safety.
-
It's interesting to see how the NIm interpreter structures the code: all of the handlers for different value types (accessing indicies, getting string reps, accessing special fields, ...) are grouped together, as opposed to the Dart version where each class manages its own set of handlers. It really demonstrates the functional / ADT way of structuring code vs the OOP way, and the pros and cons of both.
-
I see that you've added some explicit internal type-checks in the interpreter
"You called a label method on something that's not a label"-- great! -
Right now the interpreter the entire output list using
eagerListbefore displaying the output. If you instead have the interpreter evaluate the output lazily as each output element is calculated, then the interpreter will be able to produce real-time output. -
I fixed a few small bugs, including: (I made pull requests for these)
- making the debug handler print out the debug strings
- making
.!getTypefor numbers returnPtlsNumber - making the 2D array syntax give an array-of-arrays, rather than a list-of-arrays
-
import NodeTypes->import nodeTypes - adding code to the parser to unescape newlines in string literals (Nim's
unescapedoesn't do this) - changed interpreter to calculate
modPython-style - changed the output driver to print output elements on separate lines
-
I ran into an error where the objects results from different import statements seem to get mixed up. For example in this code:
import "examples/chart/chart.ptls" as chart import "examples/beer/beer.ptls" as beer output = [ chart.showBottle -- should throw an error, but works ]the chart object ends up with the same fields as the beer object.
-
I made a test script which bundles all of the prelude defs, and it seems like it works. However, it runs quite slowly. I did a little digging and found that almost all of the time is spend in the tokenizer / parser, rather than the interpreter. I wonder whether this has to do with the fact that string in Nim are passed by value, rather than by reference? I see that there are functions in the tokenizer / parser that pass strings as arguments, and there might be a lot of copying going on here that's slowing things down when the text of the program is large.
I'm super impressed -- I don't think that I could have pulled off a re-write like this as quickly as you did. Bravo!
For binary infix functions, you mean like how Nim as a mod b? I haven't considered custom operators / infix functions for Pointless yet (although I guess that that's essentially what the pipe operator |> is). How difficult are they to implement?
I'd be interested to hear more about your language project! Is it public yet?
You're now one of two people in the world who understands Pointless top-to-bottom.
Or top-down if you will. Also, I think I could attempt to write a BNF now, for anyone implementing this in the future.
Thanks for the PR, I accepted and merged the PR yesterday. It's just that when one works on something with so many moving parts it's often easy to miss stuff like this. About the import error, I was somehow able to fix it by calling addDefName instead of evaluating the code in the for loop from the dispatch function but I still don't know why that was a problem but I tested it like this.
-- x.ptls
xo = 5
-- y.ptls
yo = 6
-- main.ptls
import "x.ptls" as x
import "y.ptls" as y
output = [
x.xo,
y.yo
]
used to cause an error but now outputs
5
6
I think you are right about the speed of a prelude file slowing down because of value copying in nim but I think it's more subtle because the nim compiler rarely internally compiles immutable function parameters to be copied but it does compile var parameters to be copied on a function call. So, I think making more ref strings and sequences instead of actual string and sequences could very well speed it up. Also, I noticed a decent speed up running your aforementioned test script on my machine by using compiler flags nim compile -d:danger -d:release --opt:speed interpreter.nim instead of nim c interpreter.nim. It was in fact so significant on my (quite dated) machine, that the speed sky-rocketed from 14 secs to just under a single second and I think that this is the case because the latter command preserves stack traces while the former does not, and instead focus more on optimizing the program for speed.
As a side question, I read in the pointless docs and I quote,
For implementation reasons, comparison of other data structures (lists, arrays, sets, dicts, objects, and tuples not fitting the criteria above) using the basic equality operators compares these objects according to their location in memory.
What were the implementation reasons for that? I am not an expert in dart, in fact, rewriting this dart implementation is the first thing that I have done that's even vaguely related to dart, so that's why I may have missed that but the aforementioned limitation is removed in the nim implementation because I didn't see a good reason to keep it in.
As for the infix functions, I was thinking more in the terms of how Haskell does it, like what follows
add a b = a+b
1 `add` 2
The implementation can be done in multiple ways, the first and the one I used in Closkell, was just to make an infix token like you make a string token and check it to be composed of a valid identifier. Then in the parser simply add it to an existing precedence level or create a precedence level for it. Then a little restructuring would be required since right now the interpreter expects there to be a TokenType instead of a token but then it would have to expect a token and then check its type. If it's an infix token then we would look up the identifier and make sure it's a function then we would supply it with both RHS and LHS then 1 `add` 2 would be equivalent of saying add(1, 2)
Since the end goal is to make it equivalent to a function call there can be another method that I haven't personally tried but it should work. Making it entirely syntactic sugar that's just there at the parsing-time. In this approach, you'd have to do the same parsing as before but instead, change the resulting node to a Call with both sides as arguments rather than making it a BinOp node. It would make 1 `add` 2 syntactically equivalent to add(1,2).
About the language that I am working on Closkell(Derived from Haskell and Clojure and nodes to them being its primary inspiration), it's been a month since I have changed anything in it and it wasn't public when you asked because I have to do a little write up on how to use it so anyone would be able to do so, but its alpha implementation is public now, and here it is..
I was thinking if you have you considered transpilation to Lua? It seems to be able to perform full TCO. There appear to be very few features of Pointless that would be hard to represent in Lua. I already have a tokenizer and a parser so do you recommend going down this route? Why? or why not?
What were the implementation reasons for that?
Good question. The implementation constraint came from the old VM implementation. Checking structural equality is an inherently recursive operation, and I never found a good way to implement it in the context of a VM. You're right that structural equality is simple enough to implement in an interpreter when you can call eval recursively (though eagerList() or getValue).
When I wrote the interpreter I kept the old behavior to make potential VM-based implementations easier in the future. I also think it's better when possible to implement features at the language-level, rather than the interpreter-level (deepEq.ptls in the prelude defines structural equality functions). Adding infix functions could solve this problem if the operator == was made to call the prelude function deepEq, and the current shallow equality operator was changed to === or maybe replaced with a new language field like .!getEquals.
I do think that structural equality is a better default behavior. I might work on making the switch today.
I've been thinking that infix functions might make it easier to do operator overloading as well. For example, if objA + objB was syntactic sugar for objA.add(objB) then you could make objects that could utilize existing operators. It's an interesting idea.
I wonder whether adding this feature would improve Pointless. Since Pointless is designed with education in mind, there's always a balance between adding features and keeping the language simple. You mentioned that Closkell has similar design goals -- what made you choose to include infix calls in Closkell?
Switched Pointless to use structural equality: 6885802f3267b0ebdd1bb58aca5817635e773206
Also do you want me to put a link to your Nim fork in the Pointless readme?
I don't know much about Lua, although TCO makes it a promising compilation target. Do you have thoughts on how lazy variable definitions would be handled? It'd be awesome to see Pointless running at LuaJIT speeds!
Congratulations on making the switch to structural equality. I just changed my repository to remove "deepEq" from automatic prelude and added new fields to comply with the latest changes surrounding the new language fields of strings. Also, I added a BNF, here it is, and I think you should add it to your repository too for anyone who attempts an implementation in the future. Regarding adding the nim implementation's link to Pointless' ReadMe, I would be glad if you did but I also have another idea, what if you instead have an implementations page(like you rave one for why_pointless on the main repo) since I am rather sure that this won't be the last implementation of pointless and in this way, others including myself will be able to create more implementations and through PRs, link them in Pointless's main repository.
Do you have thoughts on how lazy variable definitions would be handled?
Yes, I do. I think it can be handled more or less in the same way, that they are handled in the interpreter. Each variable would just be a nullary function or a thunk. Some relevant trivia that you might know about Lua, JavaScript, and friends is that if, in these languages, you have a function like this
addx = function(n) return n+x end
and x isn't defined anywhere before the function is defined, then as long this is the case
addx = function(n) return n+x end
x = 2
print(addx(5))
the last call will still be successful so, exploiting this feature I could do something like this,
local fac
local output
local obj
fac = function() return function(n) return n()==1 and 1 or n()*fac()(function() return n()-1 end) end end
output = function() return fac()(obj().value) end
obj = function() return {
value = function() return 5 end
}
end
Now everything in the part above works out for us and we get 120 as the answer with lazy evaluation and order didn't matter either. I am working on a sort of proof of concept compiler for this, it's a bit messy right now and it just does basic stuff and doesn't do semantic checking, or object tracking but the best part is that there are very few nodes that don't translate well. Here's a gist that shows the basic compiler. The worst offender here is WithNode, do you have any suggestion as to how that would be translated.
As for the operator overloading, I think that desugaring a+b to a.add(b) is more of a OO thing than a functional since methods are being coupled with data. Can't say if that's good or bad, but that's definitely something to take into consideration when internally desugaring to a.add(b). A more functional approach would be if Pointless had PolyMorphism then you could do add(a, b) and select the most appropriate method.
what made you choose to include infix calls in Closkell?
There were many reasons but one of the largest ones was trees, they are something that every beginner should learn about, as soon as they can because they are a good introduction to recursion, traversal, and similar stuff, but building a tree by hand while only using prefix function calls is a pain, compare something along the lines of this
AddNode(MultiplyNode(2, AddNode(2, 2)), AddNode(5, 2))
to something like this
(2 `MultiplyNode` (2`AddNode`2)) `AddNode` (5 `MultiplyNode` 2)
You can see how one is way clearly more intuitive than the other and would help beginners master trees quicker because they can construct them by hand with more ease.