Every time someone posts a link to some new or popular database claiming some superior performance, I think of this "fragment". One individual took this idea and made a database system that beats them all. You never see it included in the benchmark comparisons for these other databases posted about on HN. I wonder why?
Oft forgotten: All source code looks "cryptic" to someone who has no previous exposure to programming.
As a newcomer to programming I would rather be saddled with trying to understand this "fragment" than trying to understand the beginnings of the source code for some other language like Python. Less lines of code to figure out and, if I were successful in understanding it, the payoff is much greater... in more ways than one.
Isn't there an often repeated statistic about the avg number of LOC a C programmer writes in a day?
Wouldn't it be nice if just one or a few lines was actually a complete program?
Once you get past the formatting and cryptic variable names it's actually a pretty straightforward program, have a look at one of the older threads if you are interested in understanding it:
$ ./j
warning: this program uses gets(), which is unsafe.
5 + 10
Segmentation fault: 11
$ ./j
warning: this program uses gets(), which is unsafe.
5 + 10 20 30
j(70679,0x7fffa46de3c0) malloc: *** error for object 0x7f7fc74001b8: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
I did run it back in the day, it works, I think it just happens to assume a 32 bit OS. You can fix the gist by changing this:
I *ma(n){ R (I*) malloc(n*4); }
to:
I *ma(n){ R (I*) malloc(n*8); }
Note that it's a very tiny interpreter for a very tiny simple language - it does not have any error handling (it still WILL segfault if you enter "incorrect" programs, where "correct" is whatever doesn't crash the interpreter :)), the letters a-z are the only available variable names etc., this is why it is so short. Once you get over the APLish way of writing the C code, and once you deduce what the operators are intended to do, it's a pretty straightforward C program.
Here is a sample session:
a=1,2,3
3
1 2 3
b=4,5,6
3
4 5 6
a+b
3
5 7 9
This piece of the gist is the "symbol table" for respectively binary and unary operators, you can use it to get started understanding the language:
I don't remember or haven't figured out how to dereference a boxed value, or what this can be used for.
The struct A has members:
t: The "type": 0=integer, 1=boxed reference
r: The "rank": size of array d
d: The "dimensions": Multiply to get the size of array p
p: The actual data. Can be integers or pointers to boxed data (cast to a 32-bit integer!)
I'd like to be awed and amazed, but the frightening thought of encountering such a thing in production and under tight deadline kind of doesn't let me :)
It's a tiny interpreter. You can create vectors, add them, concatenate them, store them to variables, count their lengths, and so on. An example session:
To understand the source, the key are the arrays `C vt[]`, `A(vd[])()`, and `A(vm[])()`, which maps the verb token to the dyads and monads respectively. `V1` functions are monads, `V2` are dyads.
I don't think so. It's easier to see if something is monadic or dyadic with the macros, and you don't need to parse the function head to know what args and of what types it takes.
Also a common naming scheme in music, which is where APL got the terms from: one note is a monad, two simultaneously is a dyad, three is a triad, four is a tetrad, etc.
Fwiw APL's use of monad in this sense predates the FP use of the term by several decades.
Monads, in the FP sense, come from the same construction in Category Theory. According to Wikipedia, this was first "invented" in 1958, but the term "monad" was coined later by Saunders Mac Lane.
Even if the details are a bit… inscrutable, I do like that figuring out the structure of this interpreter is quite easy as it's all on one page and easy to see all the “moving parts”. It perhaps looks a bit scary, but it is, IMO, much easier to understand than 1000 lines of Java spread out across 5 different files. In a way, there's less mental overhead.
Every time someone posts a link to some new or popular database claiming some superior performance, I think of this "fragment". One individual took this idea and made a database system that beats them all. You never see it included in the benchmark comparisons for these other databases posted about on HN. I wonder why?
Oft forgotten: All source code looks "cryptic" to someone who has no previous exposure to programming.
As a newcomer to programming I would rather be saddled with trying to understand this "fragment" than trying to understand the beginnings of the source code for some other language like Python. Less lines of code to figure out and, if I were successful in understanding it, the payoff is much greater... in more ways than one.
Isn't there an often repeated statistic about the avg number of LOC a C programmer writes in a day?
Wouldn't it be nice if just one or a few lines was actually a complete program?
Once you get past the formatting and cryptic variable names it's actually a pretty straightforward program, have a look at one of the older threads if you are interested in understanding it:
https://news.ycombinator.com/item?id=8533843
Unfortunately, I tried getting it to run and it simply doesn't work. Unless the expected behavior of every input is to crash the interpreter.
EDIT: In fact, you can see for yourself using https://gist.github.com/piotrklibert/4d32c8cc6fcf20643a257a2... (thanks, klibertp!)
From http://code.jsoftware.com/wiki/Studio/TasteofJPart1, `5 + 10` should be a valid J program, but it crashes. Is there any input that doesn't crash it?
I did run it back in the day, it works, I think it just happens to assume a 32 bit OS. You can fix the gist by changing this:
to:
Note that it's a very tiny interpreter for a very tiny simple language - it does not have any error handling (it still WILL segfault if you enter "incorrect" programs, where "correct" is whatever doesn't crash the interpreter :)), the letters a-z are the only available variable names etc., this is why it is so short. Once you get over the APLish way of writing the C code, and once you deduce what the operators are intended to do, it's a pretty straightforward C program.
Here is a sample session:
This piece of the gist is the "symbol table" for respectively binary and unary operators, you can use it to get started understanding the language:
It works! I compiled it with `gcc j.c -m32 -o j` and it runs now. Awesome.
Thanks for the helpful example program.
EDIT: Sort of. `5+10` crashes, as does `a=5 \n b=10`. But single digit numbers work, so this is pretty cool.
When I last looked at this I found:
plus a+b: addition
from a{b: selects the a-th element from b
find a~b: Not implemented? Or maybe this exploits some old-style C behavior?
rsh a#b: repeat a items of b
cat a,b: concatenate item a onto the list b
id +a: identity
size {a: get the size (in the first dimension) of a
iota ~a: enumerates all integers from 0 to a-1
box <a: Put a into an indirection box
sha #a: get the shape (size in all dimensions) of a
I don't remember or haven't figured out how to dereference a boxed value, or what this can be used for.
The struct A has members:
Wow. This is genuinely incredible. The interpreter works, and all of your examples run.
Thank you for detailing all of this.
I'd like to be awed and amazed, but the frightening thought of encountering such a thing in production and under tight deadline kind of doesn't let me :)
EDIT: I've edited the code a bit so that it doesn't look like PERL or J itself https://gist.github.com/piotrklibert/4d32c8cc6fcf20643a257a2...
I got as far as trying to build and run it, but bugged out, thusly:
Ah well, its still cute to try to read and understand ..
The code was written on paper. Without the usual compile/debug cycles during development.
The amazing part is how little cleanup work is required to make the code work. It's an entire programming language after all!
I got it to compile but what is it even supposed to do?
Should start a REPL. The `main` function looks like this:
which is a very direct expression of read-execute-print-loop.
It's a tiny interpreter. You can create vectors, add them, concatenate them, store them to variables, count their lengths, and so on. An example session:
To understand the source, the key are the arrays `C vt[]`, `A(vd[])()`, and `A(vm[])()`, which maps the verb token to the dyads and monads respectively. `V1` functions are monads, `V2` are dyads.
It's not 64-bit clean. If you compile it as 32-bit it will run.
J's commercially aligned cousin, K, is routinely encountered in financial/ trading backends. Part of job security, I presume.
I know some K, but I am not a particular fan of it. I admire the efficiency, though.
A really helpful edit would be the V1 and V2 macro expansions; then it's fairly straightforward C with an odd naming scheme.
I don't think so. It's easier to see if something is monadic or dyadic with the macros, and you don't need to parse the function head to know what args and of what types it takes.
> monadic or dyadic
Also known as unary and binary, TIL. They're arity, not FP monads.
Adicity is an alternate naming scheme for arity, based on Greek rather than Latin roots.
Also a common naming scheme in music, which is where APL got the terms from: one note is a monad, two simultaneously is a dyad, three is a triad, four is a tetrad, etc.
Fwiw APL's use of monad in this sense predates the FP use of the term by several decades.
Monads, in the FP sense, come from the same construction in Category Theory. According to Wikipedia, this was first "invented" in 1958, but the term "monad" was coined later by Saunders Mac Lane.
You could also run the preprocessor to make it clearer.
That's actually about as readable as the full J interpreter source code.
https://github.com/jsoftware/jsource
Jeezzz!!
Even if the details are a bit… inscrutable, I do like that figuring out the structure of this interpreter is quite easy as it's all on one page and easy to see all the “moving parts”. It perhaps looks a bit scary, but it is, IMO, much easier to understand than 1000 lines of Java spread out across 5 different files. In a way, there's less mental overhead.
/aside: Reminded me of "6 instances of SHA-3 and SHAKE" in 9 tweets:
http://keccak.noekeon.org/tweetfips202.html
https://twitter.com/TweetFIPS202