J one-page interpreter fragment (1992)

code.jsoftware.com

71 points by Tomte 8 years ago

Every time someone posts a link to some new or popular database claiming some superior performance, I think of this "fragment". One individual took this idea and made a database system that beats them all. You never see it included in the benchmark comparisons for these other databases posted about on HN. I wonder why?

Oft forgotten: All source code looks "cryptic" to someone who has no previous exposure to programming.

As a newcomer to programming I would rather be saddled with trying to understand this "fragment" than trying to understand the beginnings of the source code for some other language like Python. Less lines of code to figure out and, if I were successful in understanding it, the payoff is much greater... in more ways than one.

Isn't there an often repeated statistic about the avg number of LOC a C programmer writes in a day?

Wouldn't it be nice if just one or a few lines was actually a complete program?

stiff 8 years ago

Once you get past the formatting and cryptic variable names it's actually a pretty straightforward program, have a look at one of the older threads if you are interested in understanding it:

https://news.ycombinator.com/item?id=8533843

sillysaurus3 8 years ago
Unfortunately, I tried getting it to run and it simply doesn't work. Unless the expected behavior of every input is to crash the interpreter.
EDIT: In fact, you can see for yourself using https://gist.github.com/piotrklibert/4d32c8cc6fcf20643a257a2... (thanks, klibertp!)
```
  $ ./j
  warning: this program uses gets(), which is unsafe.
  5 + 10
  Segmentation fault: 11
  $ ./j
  warning: this program uses gets(), which is unsafe.
  5 + 10 20 30
  j(70679,0x7fffa46de3c0) malloc: *** error for object 0x7f7fc74001b8: incorrect checksum for freed object - object was probably modified after being freed.
  *** set a breakpoint in malloc_error_break to debug
  Abort trap: 6
```
From http://code.jsoftware.com/wiki/Studio/TasteofJPart1, `5 + 10` should be a valid J program, but it crashes. Is there any input that doesn't crash it?
- stiff 8 years ago
  
  I did run it back in the day, it works, I think it just happens to assume a 32 bit OS. You can fix the gist by changing this:
  I *ma(n){ R (I*) malloc(n*4); }
  to:
  I *ma(n){ R (I*) malloc(n*8); }
  Note that it's a very tiny interpreter for a very tiny simple language - it does not have any error handling (it still WILL segfault if you enter "incorrect" programs, where "correct" is whatever doesn't crash the interpreter :)), the letters a-z are the only available variable names etc., this is why it is so short. Once you get over the APLish way of writing the C code, and once you deduce what the operators are intended to do, it's a pretty straightforward C program.
  Here is a sample session:
  a=1,2,3 3 1 2 3 b=4,5,6 3 4 5 6 a+b 3 5 7 9
  This piece of the gist is the "symbol table" for respectively binary and unary operators, you can use it to get started understanding the language:
  C vt[] = "+{~<#,"; A(*vd[])() = { 0, plus, from, find, 0, rsh, cat}, (*vm[])() = { 0, id, size, iota, box, sha, 0};
  
  sillysaurus3 8 years ago
  
  It works! I compiled it with `gcc j.c -m32 -o j` and it runs now. Awesome.
  Thanks for the helpful example program.
  EDIT: Sort of. `5+10` crashes, as does `a=5 \n b=10`. But single digit numbers work, so this is pretty cool.
  
  1wd 8 years ago
  
  When I last looked at this I found:
  plus a+b: addition
  Example: 1+2 -> 3
  from a{b: selects the a-th element from b
  Example: 3{4,5,6,7,8 -> 7
  find a~b: Not implemented? Or maybe this exploits some old-style C behavior?
  rsh a#b: repeat a items of b
  Example: 3#9 -> [9,9,9] Example: 7#2,3 -> [2,3,2,3,2,3,2]
  cat a,b: concatenate item a onto the list b
  Example: 1,2,3 -> [1,2,3]
  id +a: identity
  Example: +7 -> 7
  size {a: get the size (in the first dimension) of a
  Example: {7,8,9 -> 3 Example: a=4,5 b=a#7 {b -> 4
  iota ~a: enumerates all integers from 0 to a-1
  Example: ~3 -> 0,1,2 Example: ~2,3 -> [0,1,2,3,4,5]
  box <a: Put a into an indirection box
  Example: <1 -> <1
  sha #a: get the shape (size in all dimensions) of a
  Example: #7,8,9 -> 3 Example: a=4,5 b=a#7 #b -> 4,5
  I don't remember or haven't figured out how to dereference a boxed value, or what this can be used for.
  The struct A has members:
  t: The "type": 0=integer, 1=boxed reference r: The "rank": size of array d d: The "dimensions": Multiply to get the size of array p p: The actual data. Can be integers or pointers to boxed data (cast to a 32-bit integer!)
  
  sillysaurus3 8 years ago
  
  Wow. This is genuinely incredible. The interpreter works, and all of your examples run.
  Thank you for detailing all of this.

klibertp 8 years ago

I'd like to be awed and amazed, but the frightening thought of encountering such a thing in production and under tight deadline kind of doesn't let me :)

EDIT: I've edited the code a bit so that it doesn't look like PERL or J itself https://gist.github.com/piotrklibert/4d32c8cc6fcf20643a257a2...

mmjaa 8 years ago
I got as far as trying to build and run it, but bugged out, thusly:
```
    $ ./one-page-j
    warning: this program uses gets(), which is unsafe.

    Segmentation fault: 11
```
Ah well, its still cute to try to read and understand ..
- jpfr 8 years ago
  
  The code was written on paper. Without the usual compile/debug cycles during development.
  The amazing part is how little cleanup work is required to make the code work. It's an entire programming language after all!
- kuroguro 8 years ago
  
  I got it to compile but what is it even supposed to do?
  
  klibertp 8 years ago
  
  Should start a REPL. The `main` function looks like this:
  main(){ C s[99]; while(gets(s)) pr(ex(wd(s))); }
  which is a very direct expression of read-execute-print-loop.
  
  icen 8 years ago
  
  It's a tiny interpreter. You can create vectors, add them, concatenate them, store them to variables, count their lengths, and so on. An example session:
  x=~5 5 0 1 2 3 4 y=x,x 10 0 1 2 3 4 0 1 2 3 4 x+y 10 0 2 4 6 8 0 1 2 3 4
  To understand the source, the key are the arrays `C vt[]`, `A(vd[])()`, and `A(vm[])()`, which maps the verb token to the dyads and monads respectively. `V1` functions are monads, `V2` are dyads.
- ksherlock 8 years ago
  
  It's not 64-bit clean. If you compile it as 32-bit it will run.
atemerev 8 years ago

J's commercially aligned cousin, K, is routinely encountered in financial/ trading backends. Part of job security, I presume.
I know some K, but I am not a particular fan of it. I admire the efficiency, though.
icen 8 years ago

A really helpful edit would be the V1 and V2 macro expansions; then it's fairly straightforward C with an odd naming scheme.
- klibertp 8 years ago
  
  I don't think so. It's easier to see if something is monadic or dyadic with the macros, and you don't need to parse the function head to know what args and of what types it takes.
  
  cormacrelf 8 years ago
  
  > monadic or dyadic
  Also known as unary and binary, TIL. They're arity, not FP monads.
  
  evincarofautumn 8 years ago
  
  Adicity is an alternate naming scheme for arity, based on Greek rather than Latin roots.
  
  mjn 8 years ago
  
  Also a common naming scheme in music, which is where APL got the terms from: one note is a monad, two simultaneously is a dyad, three is a triad, four is a tetrad, etc.
  Fwiw APL's use of monad in this sense predates the FP use of the term by several decades.
  
  Xophmeister 8 years ago
  
  Monads, in the FP sense, come from the same construction in Category Theory. According to Wikipedia, this was first "invented" in 1958, but the term "monad" was coined later by Saunders Mac Lane.
zakk 8 years ago

You could also run the preprocessor to make it clearer.

messe 8 years ago

That's actually about as readable as the full J interpreter source code.

https://github.com/jsoftware/jsource

feiss 8 years ago

Jeezzz!!

PeCaN 8 years ago

Even if the details are a bit… inscrutable, I do like that figuring out the structure of this interpreter is quite easy as it's all on one page and easy to see all the “moving parts”. It perhaps looks a bit scary, but it is, IMO, much easier to understand than 1000 lines of Java spread out across 5 different files. In a way, there's less mental overhead.

eternalban 8 years ago

/aside: Reminded me of "6 instances of SHA-3 and SHAKE" in 9 tweets:

http://keccak.noekeon.org/tweetfips202.html

https://twitter.com/TweetFIPS202