This is amazing! I love seeing FRACTRAN-shaped things on the homepage :) This reminds me of how 1-bit stacks are encoded in binary:
A stack of zeros and ones can be encoded in a single number by keeping with bit-shifting and incrementing.
Pushing a 0 onto the stack is equivalent to doubling the number.
Pushing a 1 is equivalent to doubling and adding 1.
Popping is equivalent to dividing by 2, where the remainder is the number.
I use something not too far off for my daily a programming based on a similar idea:
Rejoice is a concatenative programming language in which data is encoded as multisets that compose by multiplication. Think Fractran, without the rule-searching, or Forth without a stack.
You missed the part where the number is being used as a 1-bit stack. They never claimed novelty, it's just a neat technique that might be unfamiliar to most people.
This isn't unique, or even the least compute way to do this. For example, let f(x,y) = 1/(x-y). This too is universal. I think there's a theorem stating for any finite set of binary operators there is a single one replacing it.
write x#y for 1/(x-y).
x#0 = 1/(x-0) = 1/x, so you get reciprocals.
Then (x#y)#0 = 1/((1/(x-y)) - 0) = x-y, so subtraction.
it's common problem to show in any (insert various algebraic structure here ) inverse and subtraction gives all 4 elementary ops.
Do you claim all things you don’t understand are LLMs? This is what I mean by these and many of your other comments being extremely poor quality to the point of deliberate ignorance.
The paper above was published in 2012 [1], so that’s quite a feat for an LLM. This takes about zero effort to check.
Put some thought or effort into your claims; they’ll look less silly.
Ooh, that 2nd link has a nice construction by Terry Tao giving a clear way to show infinitely many such functions exist for pretty much any set of operations.
The exp and ln are infinite series. Exp is roughly the infinite series for cos AND the infinite series for sin. Hiding that every op is an infinite series behind a name doesn’t make things free. It just makes even trivial ops like 1+2 vastly more work.
They are not infinite series per se. They can be represented by infinite series in several ways but there are standard ways to define them that do not involve infinite series. The logarithm in particular is not even represented by an infinite series (in form of Taylor expansion) defined in the whole complex plane. And knowledge/use of trigonometric functions greatly precedes such infinite series representations.
Moreover, the point is not always numerical computation. I don’t think anybody argues that eml sounds like an efficient way to compute elementary functions numerically. It may or may not still be useful for symbolic computations.
The article is about producing all elementary functions, which 1/(x-y) clearly doesn’t, as it doesn’t produce any transcendental function. Like many of such universality-style results it may not have practical applications, but may still be interesting on its own right.
Any transcendental function can be produced by arithmetic, since its complete for R.
Go ahead and show how to compute exp or ln without an infinite series without circular reasoning. You can’t, since they’re transcendental.
There are infinitely many ways to make these binary operators. Picking extremely high compute cost ones really doesn’t make a good basis for computation.
> Any transcendental function can be produced by arithmetic, since its complete for R.
Not without some form of limit process or construction. You can approximate e with the basic arithmetic operations but not actually get an exact form in finite steps. And you definitely cannot transverse an infinite binary tree, so the main point of the result in the article is missed by your arguments.
Again, you are mixing separate things. Nobody said that eml is some way to approximate elementary functions more efficiently. It is a way to express elementary functions in a finite amount of operations. Meaning, computing symbolically, not numerically. Eg I may care that exp(3)*exp(2)=exp(5) without caring to approximate exp(5) numerically. The paper is literally under "Computer Science > Symbolic Computation", not "numerical analysis" or "engineering" after all.
And to be precise:
> Go ahead and show how to compute exp or ln without an infinite series without circular reasoning. You can’t, since they’re transcendental.
You don't necessarily need "infinite series", you need some limit process. A basic example is that exp(x) can be approximated by (1 + x/n)^n for large n. For the logarithm you can use a formula involving the arithmetic–geometric mean which you can approximate using an iterative process/recursion without infinite series. You can also approximate the exponential by using Newton's method together with that, see [0].
I don't think this can do any of the "standard" constants or what we generally consider to be closed-form expressions, though ! (E.g., no e, pi, exp, log, etc.)
Yes it can, by using the same infinite series that exp and ln use to compute. This one just costs less in money, hardware, energy, and is faster for basically every basic op.
It’s only finite by putting the infinite series into an operation.
And the basic monomial basis is not a single binary operation capable of reproducing the set of basic arithmetic ops. If you want trivial and basic, pick Peano postulates. But that’s not what this thread was about.
well, the statement is: is there a single operation, built from elementary operations, such that all _other_ elementary operations have finite representations.
this preprint answers that in the affirmative
otoh, (x, y) -> 1/(x-y) does not answer this question at all. you can argue that the preprint does so "via the infinite series in an operation" (which I have no idea what that means; surely if exp(x) qualifies then so must 1/(x-y) if we pick a monomial basis?) but ¯\_(ツ)_/¯
now, do I think that this is groundbreaking magical research (as I'm currently seeing on twitter) no... But it's neat!
> This includes constants such as e, pi, and i; arithmetic operations including addition, subtraction, multiplication, division, and exponentiation as well as the usual transcendental and algebraic functions.
And those come from the infinite series needed to compute exp and ln. They’re just as much work either way. The exp and ln way are vastly costlier for every op, including simply adding 1 and 2.
It's not about being costly or not, this is completely irrelevant to the point being made. eml is just some abstract function, that maps ℝ² to ℝ. Same as every other mathematical function it is only really defined by the infinite set of correspondences from one value to some other value. It is NOT exp(x) - ln(y), same as exp is not a series (as you wrongfully stated in another comment). exp can be expressed (and/or defined) as a series to a mathematician familiar with a notion of series, and eml can be expressed as exp(y) - ln(y) to a mathematician familiar with exp and ln. They can also be expressed/defined multiple other ways.
I am not claiming this is better than 1/(x-y) in any way (I have no idea, maybe it isn't if you look closely enough), but you are simply arguing against the wrong thing. Author didn't claim eml to be computationally efficient (it even feels weird to say that, since computational efficiency is not a trait of a mathematical function, but of a computer architecture implementing some program) or anything else, only that (eml, 1) are enough to produce every number and function that (admittedly, somewhat vaguely defined) a scientific calculator can produce.
However, I want to point out that it's weird 1/(x-y) didn't appear on that graph in Figure 1, since if it's as powerful as eml, it should have all the same connections as eml, and it's a pity Odrzywołek's paper misses it.
I’m fairly certain that the difference between the approaches is that the f(x,y) function you mentioned requires limits to represent certain concepts while the eml approach is essentially a tree or a chain of computations meant to represent a model of a system.
I understand your point. The paper is more about the depth of the tree to represent and audit a model versus the raw CPU clock cycles. It takes the exponent and logarithm as given since for all practical purposes, in a scientific context, they are.
To represent something like sin(x) with f(x,y) requires infinite steps. Conversely, with eml you get an exact result in around 4 using identities and such.
One could argue that we do Taylor Series approximations on the hardware to represent trigonometric functions, but that highlights the key aspect of the eml approach. You can write a paper with those four steps that describes an exact model, here sin(x). And people can take that paper and optimize the result.
This paper is about an auditable grammar that you can compute with.
They're not series, that's just a convenient way to think about defining and calculating them. I've never found it particularly useful to deal with the series definitions either, and none of the (good) approximation methods I'm aware of actually take that approach.
Moreover, EML is complete in a way that your suggested function isn't: If you take a finite combination of basis functions, can it build periodic functions? Hardy proved over a century ago that real (+,-,/,*,exp,ln) can't do this (and answering the paper's unresolved question about similar real-valued functions in the negative). EML being able to build periodic functions is a lot less surprising for obvious reasons, but still pretty neat.
You can reduce all Boolean logic to NAND, but that doesn't actually mean that semiconductor fabs translate their HDL to NAND gates, because it is possible to build complex gates that directly implement higher level operations in a single gate.
Your "cost of computation" objection can be easily resolved by adding more operators, which makes it boring from a research perspective.
Meanwhile the loss of expressivity can only be compensated by encoding algorithms directly into the expression tree. Your objection that an infinite series is a bad thing rings hollow, since you now introduce the concept of an infinitely sized expression tree. That sounds much more impractical than implementing an algorithm for the exponential and logarithm functions.
Show me a way to physically compute exp or ln that is less gates than add. More gates means costlier in $, more energy in compute, and for these functions, higher latency.
You don’t get to make up free ops, claim there is no cost in reality, and hand wave away reality.
There are infinitely many ways to do what the paper did. There’s no gain other than it’s pretty. It loses on every practical front to simply using current ops and architectures.
The feasibility of memristor analog circuits is evident, and I believe this paper represents a valuable early exploration. We've been constrained by Boolean logic for quite some time now.
The world has had many types of logic before and after Boolean logic was created, many used in computing. Boolean logic isn’t a constraint; it’s used where it’s useful, and others are used where they’re useful.
Conceptual elegance is worth something, isn't it? I don't mean just aesthetic pleasure, as in recreational mathematics, but there's often (not always) value in being able to succinctly describe a wide range of phenomena with a small number of primitives. It could open up new ways of understanding or working that wasn't possible before. Not saying this specific discovery fits the description, but it seems too early to dismiss the idea entirely based on its im/practicality compared to existing solutions.
Aren't there examples in the history of mathematics, where a new idea was criticized for being impractical, then later found to have applications or implications, possibly unexpected even to the person who discovered it?
No, it approximates exp poorly over an infinitesimally small interval compared to exp. Resistors and capacitors are no where ideal components, which is why they have spec sheets to show how quickly they diverge.
If we’re making sloppy approximations to a tiny range of exp, then I too can do it with a few terms.
Opus seems to be wired currently to get you to spend more money. Once you tell it "Stop defrauding me, just get to the right solution" it often gets it.
> A calculator with just two buttons, EML and the digit 1, can compute
everything a full scientific calculator does
Reminds me of the Iota combinator, one of the smallest formal systems that can be combined to produce a universal Turing machine, meaning it can express all of computation.
All possible 36 distinct level-2 eml functions of one variable (the first 18 of them with entirely Real outputs, the other 18 with "intermediate" complex-valued components):
It would be fun to catalogue all one-variable functions that can be represented as binary trees of fixed depth in this way and then encode the trees in binary. Lots of these functions in old math book tables would look very different with a plain hash lookup and the identities in such books would prove themselves.
EDIT: please change the article link to the most recent version (as of now still v2), it is currently pointing to the v1 version which misses the figures.
I'm still reading this, but if this checks out, this is one of the most significant discoveries in years.
Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?
Got a multidimensional and multivariate function to model (with random samples or a full map)? Just do gradient descent and convert it to approximant EML trees.
Perform gradient descent on EML function tree "phi" so that the derivatives in the Schroedinger equation match.
But as I said, still reading, this sounds too good to be true, but I have witnessed such things before :)
> Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?
Same reason all boolean logic isn't performed with combinations of NAND – it's computationally inefficient. Polynomials are (for their expressivity) very quick to compute.
Single transistors aren't yet logic gates by themselves; they are amplifiers with a very specific gain function that makes it possible to use them as switches. Logic gates usually consist of at least two transistors. See https://en.wikipedia.org/wiki/CMOS for an example of how it is done in CMOS technology.
Moreover, the amplifying function must exist at least in some gates, to restore the logic levels, but there is no need for it to exist in all gates.
For instance, any logic circuit can be implemented using AND gates and/or OR gates made with diodes, which have no gain, i.e. no amplifying function, together with 1-transistor inverters, which provide both the NOT function and the gain needed to restore the logic levels.
The logic functions such as NAND can be expressed in several way using simpler components, which correspond directly with the possible hardware implementations.
Nowadays, the most frequent method of logic implementation is by using parallel and series connections of switches, for which the MOS transistors are well suited.
Another way to express the logic functions is by using the minimum and maximum functions, which correspond directly with diode-based circuits.
All the logic functions can also be expressed using the 2 operations of the binary finite field, addition (a.k.a. XOR) and multiplication (a.k.a. AND).
This does not lead to simpler hardware implementations, but it can simplify some theoretical work, by using algebraic results. Actually this kind of expressing logic is the one that should have been properly named as Boolean logic, as the contribution of George Boole has been precisely replacing "false" and "true" with "0" and "1" and reinterpreting the classic logic operations as arithmetic operations. It is very weird to see in some programming languages a data type named Boolean, whose possible values are "false" and "true", instead of the historically correct Boolean values, "0" and "1", which can also be much more useful in programming than "false" and "true", by simplifying expressions, especially in array operations (which is why array-oriented languages like APL use "0" and "1", not "false" and "true").
"Pass transistors" are transistors being operated in pass/impedance switch mode.
Pass logic. Digital. [0]
This is extremely basic digital circuit design. You can create digital circuits as compositions of gates. But you can often implement the same logic, with fewer transistors, using pass logic.
Pass logic is also great for asynchronous digital design.
NAND also use both N and P-channel transistors, and have to use at least the same number, but frequently many more transistors for anything non-trivial.
But as you point out, as a computational basis, pass transistors have two variants.
> Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?
Because the EML basis makes simple functions (like +) hard to express.
consider how a wavefunction might be stored for numerically searching the ground state of a molecule, or a crystal lattice, humans appreciate 2D imagery that is about 1 kilopixel x 1 kilopixel; so expanding this to 3 dimensions that means the wavefunction would have to store 10 ^ 9 complex numbers (easily 4 GB at 16 bit precision for real and imaginary compnents so 4 bytes per complex number), do we really believe that a DAG variant of the EML construction would consume a bigger value to represent the analytically correct solution? do we really believe that the 4GB DAG variant of EML would produce a less accurate representation (i.e. less fidelity with the schroedinger equation?) If the ground state hydrogen atom is any indication, my money is on EML-style constructions, not naive 3D arrays modelling the wavefunction by brute force.
This also re-opens a lot of "party pooper" results in mathematics: impossibility of representing solutions to general quintic (fine print: if we restrict ourselves to arithmetic and roots/radicals). In mathematics and physics there have been a lot of "party pooper" results which later found more profound and interesting positive results by properly rephrasing the question. A negative result for a myopic question isn't very informative on its own.
> This also re-opens a lot of "party pooper" results in mathematics: impossibility of representing solutions to general quintic (fine print: if we restrict ourselves to arithmetic and roots/radicals).
Solving the quintic in terms of transcendental functions has already been done
From my experience of working in this problem domain for the last year, I'd say it is pretty powerful but the "too good to be true part" comes from that EML buys elegance through exponential expression blow-up. Multiplication alone requires depth-8 trees with 41+ leaves i.e. minimal operator vocabulary trades off against expression length. There's likely an information-theoretic sweet spot between these extremes.
It's interesting to see his EML approach whereas mine was more on generating a context sensitive homoiconic grammar.
I've had lots of success combining spectral neural nets (GNNs, FNOs, Neural Tangent Kernels) with symbolic regression and using Operad Theory and Category Theory as my guiding mathematical machinery
> Multiplication alone requires depth-8 trees with 41+ leaves i.e. minimal operator vocabulary trades off against expression length.
That is sort of comparable to how NAND simplify scaling.
Division is hell on gates.
The single component was the reason scaling went like it did.
There was only one gate structure which had to improve to make chips smaller - if a chip used 3 different kinds, then the scaling would've required more than one parallel innovation to go (sort of like how LED lighting had to wait for blue).
If you need two or more components, then you have to keep switching tools instead of hammer, hammer, hammer.
I'm not sure what you mean by this? It's true that any Boolean operation can be expressed in terms of two-input NAND gates, but that's almost never how real IC designers work. A typical standard cell library has lots of primitives, including all common gates and up to entire flip-flops and RAMs, each individually optimized at a transistor level. Realization with NAND2 and nothing else would be possible, but much less efficient.
Efficient numerical libraries likewise contain lots of redundancy. For example, sqrt(x) is mathematically equivalent to pow(x, 0.5), but sqrt(x) is still typically provided separately and faster. Anyone who thinks that eml() function is supposed to lead directly to more efficient computation has missed the point of this (interesting) work.
Yeah, what you're going to get is more efficient proofs: you can do induction on one case to get results about elementary functions. Not sure where anyone's getting computational efficiency thoughts from this.
In my experience this exponential expression blow-up is less the result of the approach of decomposing into a minimum of primitives, but rather a result from repetition in expression trees:
If we make the analogy from Bertrand Russel's Principia Mathematica, he derived fully expanded expressions, i.e. trees where the leaves only may refer to the same literals, everyone claimed this madness underscored how formal verification of natural mathematics was a fools errand, but nevertheless we see successful projects like metamath (us.metamath.org) where this exponential blow-up does not occur. It is easy to see why: instead of representing proofs as full trees, the proofs are represented as DAG's. The same optimization would be required for EML to prevent exponential blow-up.
Put differently: if we allow extra buttons besides {1, EML} for example to capture unary functions the authors mentally add an 'x' button so now the RPN calculator has {1, EML, x}; but wait if you want multivariate functions it becomes an RPN calculator with extra buttons {1, EML, x,y,z} for example.
But why stop there? in metamath proofs are compressed: if an expression or wff was proven before in the same proof, it first subproof is given a number, and any subsequent invocations of this N'th subproof refers to this number. Why only recall input parameters x,y,z but not recall earlier computed values/functions?
In fact every proof in metamath set.mm that uses this DAG compressibility, could be split into the main proof and the repeatedly used substatements could be automatically converted to explicitly separate lemma proofs, in which case metamath could dispose of the single-proof DAG compression (but it would force proofs to split up into lemma's + main proof.
None of the proven theorems in metamath's set.mm displays the feared exponential blowup.
Yes, metamath uses a large collection of specialized but reusable building blocks, so it doesn't blow up exponentially. However, if you want to "just do gradient descent" on general trees built from a single universal primitive, you now have to rediscover all those building blocks on the fly. And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.
Or you could accept that there's already a large collection of known useful special functions, and work with shallower trees of those instead, e.g. https://arxiv.org/abs/1905.11481
> And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.
Thats not really necessary, imagine somewhere near the top of the binary tree a leaf ("1" or "x" or ...), with the current brute force method thats a whole binary subtree with parameters going unused.
One could just as well use that whole culled binary subtree as a DAG node.
It does require more complex routing, but selecting input nodes is a sparse task, so those routing parameters can use sparse virtual parameters, say inner products of dense vectors in some vector space, so it doesn't need to take up much memory...
Its a way to make mathematical formulas completely unreadable.
Its a way to spend more time on computing functions like log (3 ems reqd) while using more precision.
Its a way to blow the mind of muggles reading hacker news.
Where do you see exponential blow-up? If you replace every function in an expression tree with a tree of eml functions, that is a size increase by a constant factor. And the factor does not seem unreasonable but in the range 10 to 100.
But that is not an increase in the expression size, that is the effort for searching for an expression tree that fits some target function. And that is no different from searching an expression based on common functions, that is of course also exponential in the expression tree height. The difference is that a eml-based tree will have a larger height - by some constant factor - than a tree based on common functions. On the other hand each vertex in an eml-tree can only be eml, one, or an input variable whereas in a tree based on common functions each vertex can be any of the supported basis functions - counting constants and inputs variables as nullary functions.
Ah I see I misunderstood your point, thanks for clarifying.
I think you are right, each node in some other expression tree formed of primitives in some set Prim would be increased by at most max(nodes(expr)) for expr in Prim.
That's essentially what the EML compiler is doing from what I understand.
yes, and even this search doesn't actually require trillions of parameters, since the switching parameters will be sparse, which means you can apply a FakeParameter trick: suppose I want a trillion sparse parameters, thats a million by a million. Let's just model those parameters as inner products of a million vectors each of some dimension N. Now its in the regime of megabytes or a GB.
For extreme regularization, one can even go down to 10 arbitrary precision numbers: if we have a single vector of 10 dimensions, we can re-order the components 10! different ways.
10! = 3 628 800
so we can retrieve ~3M vectors from it, and we can form about 10 T inner products.
I agree: I don't understand what happened, but the first "View PDF" resulted in a PDF where the hyperlinks to the figures didn't work. Upon closer inspection it wasn't v1 at all, thats a PNAS article. I am unable to remove the EDIT:... line in my original comment at this time...
While I'm really enjoying this paper, I think you are way overstating the significance here. This is mathematically interesting, and conceptually elegant, but there is nothing in this paper that suggests a competitive regression or optimisation approach.
I might have misunderstood, but from the two "Why do X when you can do just Y with EML" sentences, I think you are describing symbolic regression, which has been around for quite some time and is a serious grown-up technique these days. But even the best symbolic regression tools do not typically "replace" other regression approaches.
This isn't all that significant to anyone who has done Calculus 2 and knows about Taylor's Series.
All this really says is that the Taylor's expansions of e^x and ln x are sufficient to express to express trig functions, which is trivially true from Euler's formula as long as you're in the complex domain.
Arithmetic operations follow from the fact that e^x and ln x are inverses, in particular that e^ln(x) = x.
Taylor's series seem a bit like magic when you first see them but then you get to Real Analysis and find out there are whole classes of functions that they can't express.
This paper is interesting but it's not revolutionary.
There is a huge number of people who understand Taylor series, know how to compute them, and the things you can do with Taylor (and other kinds of) expansions. Yet none of us identified that this binary operation spans that lot of them, but I'm willing to read references to predating observations of the same kind. The author does mention alternative systems (incomplete in some specific sense) in the paper.
I did however keep thinking there was a lot of attention to trying to include special constants even though we don't know that much about these constants yet, while comparatively little attention went to say elliptic integrals etc.
When aiming for a wide catch, you'd include some of those esoteric functions, or erf() etc...
I also wished they had attempted to find a representation for derivative and integrals.
Given this amazing work, an efficient EML operator HW implementation could revolutionize a bunch of things. So the next thing might be an efficient EML HW implementation.
The compute, energy, and physical cost of this versus a simple x+y is easily an order of magnitude. It will not replace anything in computing, except maybe fringe experiments.
derivation of -x seems wrong. we can look at the execution trace on a stack machine, but it's actually not hard to see. starting from the last node before the output, we see that the tree has the form
and the claim is that, after it's expanded, z will be such that this whole thing is equal to -x. but with some algebra, this is happening only if
e^z = 0,
and there is no complex number z that satisfies this equation. indeed if we laboriously expand the given formula for z (the left branch of the tree), we see that it goes through ln(0), and compound expressions.
x^-1 has the same problem.
both formulae work ...sort of... if we allow ln(0) = Infinity and some other moxie, such as x / Infinity = 0 for all finite x.
> EML-compiled formulas work flawlessly in symbolic Mathematica and IEEE754 floating-point… This is because some formulas internally might rely on the following properties of extended reals: ln 0 = −∞, e^(−∞) = 0.
And then follows with:
> But EML expressions in general do not work ‘out of the box’ in pure Python/Julia or numerical Mathematica.
Thus, the paper’s completeness claim depends on a non-standard arithmetic convention (ln(0) = -∞), not just complex numbers as it primarily advertises. While the paper is transparent about this, it is however, buried on page 11 rather than foregrounded as a core caveat. Your comment deserves credit for flagging it.
I would not call a "non-standard arithmetic convention" that ln(0) = -∞.
This is the standard convention when doing operations in the extended real number line, i.e. in the set of the real numbers completed with positive and negative infinities.
When the overflow exception is disabled, any modern CPU implements the operations with floating-point numbers as operations in the extended real number line.
So in computing this convention has been standard for more than 40 years, while in mathematics it has been standard for a couple of centuries or so.
As always in mathematics, when computing expressions, i.e. when computing any kind of function, you must be aware very well which are the sets within which you operate.
If you work with real numbers (i.e. in a computer you enable the FP overflow exception), then ln(0) is undefined. However, if you work with the extended real number line, which is actually the default setting in most current programming languages, then ln(0) is well defined and it is -∞.
> using EML trees as trainable circuits ..., I demonstrate the feasibility of exact recovery of closed-form elementary functions from numerical data at shallow tree depths up to 4
That's awesome. I always wondered if there is some way to do this.
I was curious about that too. Gemini actually gave a decent list. Trig functions come from Euler's identity:
e^ix = cos x + i sin x
which means:
e^-ix = cos -x + i sin -x
= cos x - i sin x
so adding them together:
e^ix + e^-ix = 2 cos x
cos x = (e*ix - e^-ix) / 2
So I guess the real part of that.
Multiplication, division, addition and subtraction are all straightforward. So are hyperbolic trig functions. All other trig functions can be derived as per above.
I think what you want is the supplementary information, part II "completeness proof sketch" on page 12. You already spotted the formulas for "exp" and real natural "L"og; then x - y = eml(L(x), exp(y)) and from there apparently it is all "standard" identities. They list the arithmetic operators then some constants, the square root, and exponentials, then the trig stuff is on the next page.
You can find this link on the right side of the arxiv page:
Didn't read the paper, but it was easy for me to derive constants 0, 1, e and functions x + y, x - y, exp(x), ln(x), x * y, x / y. So seems to be enough for everything. Very elegant.
Although x + y is surprisingly more complicated than you'd expect at first. The construction first goes for exp(x) and ln(x) then to x - y and finally uses -y to get to x + y.
For completeness, there is also Peirce’s arrow aka NOR operation which is functionally complete. Fun applications iirc VMProtect copy protection system has an internal VM based on NOR.
That’s boolean functional completeness, which is kind of a trivial result (NAND, NOR). It mirrors this one insofar as the EDL operator is also a combination of a computation and a negation in the widest senses.
1. One should also add absolute value (as sqrt(x*x)?) as a desired function and from that min, max, signum in the available functions. Since the domain is complex some of them will be a bit weird, I am not sure.
2. I think, for any bijective function f(x) which, together with its inverse, is expressible using eml(), we can obtain another universal basis eml(f(x),f(y)) with the added constant f^-1(1). Interesting special case is when f=exp or f=ln. (This might also explain the EDL variant.)
3. The eml basis uses natural logarithm and exponent. It would be interesting to see if we could have a basis with function 2^x - log_2(y) and constants 1 and e (to create standard mathematical functions like exp,ln,sin...). This could be computationally more feasible to implement. As a number representation, it kinda reminds me of https://en.wikipedia.org/wiki/Elias_omega_coding.
4. I would like to see an algorithm how to find derivatives of the eml() trees. This could yield a rather clear proof why some functions do not have indefinite integrals in a symbolic form.
5. For some reason, extending the domain to complex numbers made me think about fuzzy logics with complex truth values. What would be the logarithm and exponential there? It could unify the Lukasiewicz and product logics.
I made a fun marimo notebook to try and derive these myself. I structured each cell in order based on the diagram at the end of the paper. It uses Sympy to determine if the function is correct or not.
Built a JS implementation today — monogate
https://explorer-taupe-five.vercel.app · npm install monogate
The interesting engineering problem was negation. The paper's SI gives one construction; we independently derived a two-regime approach — tower formula for y≤0, shift formula for y>0 — that stays stable to |y|<708 in IEEE 754.
We also extended to ℂ. After seeing pveierland's result in this thread (i constructible from {1} in K=75 under extended-reals convention), we investigated and documented the distinction: under strict principal-branch ln where ln(0) throws, whether {1} alone generates i remains open. Under the extended-reals convention used in the paper, their construction holds. Two different grammars, not contradictory results.
109 tests. MIT. github.com/almaguer1986/monogate
Can someone explain how is this different from lambda calculus, it seems like you can derive the same in both. I don't understand both well enough and hence the question.
Lambda calculus talks about computable functions, where the types of the inputs are typically something discrete, like `Bool` or `Nat`. Here, the domain is the real numbers.
The short answer is that the lambda calculus computes transformations on digital values while this is for building functions that can transform continuous (complex) values.
Any lambda term is equivalent to a combinatory term over a one-point basis (like λxλyλz. x z (y (λ_.z)) [1]). One difference is that lambda calculus doesn't distinguish between functions and numbers, and in this case no additional constant (like 1) is needed.
Lamda kind of does this in an analogous form, but does not allow you to derive this particular binary expression as a basis for elementary functions. There is a related concept with Iota [1], which allows you express every combinatoric SKI term and in turn every lambda definable function. But similar to this particular minimalist scientific function expression, it is mostly of interest for reductionist enthusiasts and not for any practical purpose.
Depending on your lambda calculus! From a categorical perspective a lambda calculus is just a nice syntax for Cartesian closed categories (or similar, e.g. *-autonomous categories for linear lambda calculus) so you can use it to reason about anything you can fit into that mould. For example, Paul Taylor likes to do exactly this: https://www.paultaylor.eu/ASD/analysis#lamcra
Both BJTs and FETs have intrinsic exponential/logarithmic behaviors (at low biases) due to charge density being given by the Fermi-Dirac distribution since electrons are fermions.
Stupid question maybe (I am no mathematician), but aren't exp and ln really primitives? Aren't they implemented in terms of +,-,/,* etc? Or do we assume that we have an infinite lookup table for all possible inputs?
> aren't exp and ln really primitives? Aren't they implemented in terms of +,-,/,* etc?
They're primitive in the sense that you can't compute exp(x) or log(x) using a finite combination of other elementary functions for any x.
If you allow infinite many operations, then you can easily find infinite sums or products of powers, or more complicated expressions to represent exp and log and other elementary functions.
> Or do we assume that we have an infinite lookup table for all possible inputs?
Essentially yes, you don't necessarily need an "implementation" to talk about a function, or more generally you don't need to explicitly construct an object from simpler pieces: you can just prove it satisfies some properties and that it is has to exist.
For exp(x), you could define the function as the solution to the diffedential equal df/dx = f(x) with initial condition f(0) = 1. Then you would enstablish that the solution exists and it's unique (it follows from the properties of the differential equation), call exp=f and there you have it. You don't necessarily know how to compute for any x, but you can assume exp(x) exists and it's a real number.
I think the point here is to explore the reduction of these functions to finite binary trees using a single binary operator and a single stopping constant. The operator used could be arbitrarily complex; the objective is to prove that other expressions in a certain family — in this case, the elementary functions — can be expanded as a finite (often incomplete) binary tree of that same operation.
In other words, this result does not aim to improve computability or bound the complexity of calculating the numerical value. Rather, it aims to exhibit this uniform, finite tree structure for the entire family of elementary expressions.
I think there is still an implicit restriction on the complexity of the operator for this to be interesting. Otherwise you could design an operator which accepts a pair x,y and performs one of 2^k elementary binary operations by reading off the first k bits of x and applying the specified operation on the remainder of x and y. (This is kind of like how real-valued computational models become too powerful for complexity theory to work if you allow bitwise operations.)
Exactly! If you didn't strictly limit the operator's complexity, you could just smuggle a Turing machine in via bitwise logic and turn the whole thing into a parlor trick. The beauty here is that eml(x,y) is a pure, continuous analytical function with no hidden branching whatsoever.
To clarify my earlier point: the author isn't trying to build a practical calculator or generate human-readable algebra. Using exp and ln isn't a cheat code because the goal is purely topological. The paper just proves that this massive, diverse family of continuous math can be mapped perfectly onto a uniform binary tree, without secretly burying a state machine inside the operator.
I agree, as the sibling comment there are two different things that are named "branches". Anyway, to get the principal branch in the microprocessor it's necessary to implement "atan2" that has a lot of special cases.
For example, IIRC ln( -inf.0 + y * i ) = ´+inf.0 + pi * sign(y)
You have a gate (called here "eml") that takes x and y and gives `exp(x) - log(y)`. Then you implement all other operations and elementary functions, including addition, multiplication etc, using only compositions of this gate/function (and the constant 1). You don't have addition as you start, you only have eml and 1. You define addition in terms of those.
In rpn notation you just put the input on the stack, right? The encodings seems like they could get pretty big, and encodings certainly wouldn't be unique, but you should be able to encode pretty much any constant you could think of.
Not sure it really compares to NAND() and the likes.
Simply because bool algebra doesn't have that many functions and all of them are very simple to implement.
A complex bool function made out of NANDs (or the likes) is little more complex than the same made out of the other operators.
Implementing even simple real functions out of eml() seems to me to add a lot of computational complexity even with both exp() and ln() implemented in hardware in O(1).
I think about stuff sum(), div() and mod().
Of course, I might be badly wrong as I am not a mathematician (not even by far).
But I don't see, at the moment, the big win on this.
This has no use for numeric computations, but it may be useful in some symbolic computations, where it may provide expressions with some useful properties, e.g. regarding differentiability, in comparison with alternatives.
eh, i didnt find that paragraph very helpful. it just restates what it means do decompose an expression into another one only relying on eml, and vaguely gestures at what this could mean, i was hoping for something more specific.
> Everyone learns many mathematical operations in school: fractions, roots, logarithms, and trigonometric functions (+, −, ×, /, sqrt, sin, cos, log, …), each with its own rules and a dedicated button on a scientific calculator. Higher mathematics reveals that many of these are redundant: for example, trigonometric ones reduce to the complex exponential. How far can this reduction go? We show that it goes all the way: a single operation, eml(x, y), replaces every one of them. A calculator with just two buttons, EML and the digit 1, can compute everything a full scientific calculator does. This is not a mere mathematical trick. Because one repeatable element suffices, mathematical expressions become uniform circuits, much like electronics built from identical transistors, opening new ways to encoding, evaluating, and discovering formulas across scientific computing.
Actually we know this for a long time. The universal approximation theorem states that any arbitrary function can be modelled through a nonlinear basis function so long as capacity is big enough.
The practical bit here is knowing how many basis functions can be approximated with a two operators. That’s new!
This could have some interesting hardware implications as well - it suggests that a large dedicated silicon instruction set could accelerate any mathematical algorithm provided it can be mapped to this primitive. It also suggests a compiler/translation layer should be possible as well as some novel visualization methods for functions and methods.
I'm not too familiar with the hardware world, but does EML look like the kind of computation that's hardware-friendly? Would love for someone with more expertise to chime in here.
Yes actually, it is very regular which usually lends itself to silicon implementations - the paper event talks about this briefly.
I think the bigger question is whether it will be more energy-optimal or silicon density-optimal than math libraries that are currently baked into these processors (FPUs).
There are also some edge cases "exp(exp(x))" and infinities that seem to result in something akin to "division by zero" where you need more than standard floating-point representations to compute - but these edge cases seem like compiler workarounds vs silicon issues.
A similar function operating on the real domain for powers and logs of 2 would be extremely hardware friendly. You can build it directly out of the floating point format. First K significand bits index a LUT. Do that for each argument and subtract them.
It gets a bit more difficult for the complex domain because you need rotation.
This paper seems to suggest that a chip with 10 pipeline stages of EML units could evaluate any elementary function (table 4) in a single pass.
I'm curious how this would compare to the dedicated sse or xmx instructions currently inside most processor's instruction sets.
Lastly, you could also create 5-depth or 6-depth EML tree in hardware (fpga most likely) and use it in lieu of the rust implementation to discover weight-optimal eml formulas for input functions much quicker, those could then feed into a "compiler" that would allow it to run on a similar-scale interpreter on the same silicon.
In simple terms: you can imagine an EML co-processor sitting alongside a CPUs standard math coprocessor(s): XMX, SSE, AMX would do the multiplication/tile math they're optimized for, and would then call the EML coprocessor to do exp,sin,log calls which are processed by reconfiguring the EML trees internally to process those at single-cycle speed instead of relaying them back to the main CPU to do that math in generalized instructions - likely something that takes many cycles to achieve.
You could also make an analog EML circuit in theory, using electrical primitives that have been around since the 60s. You could build a simple EML evaluator on a breadboard. Things like trig functions would be hard to reproduce, but you could technically evaluate output in electrical realtime (the time it takes the electrical signal to travel though these 8-10 analog amplifier stages).
I'm way too unschooled to say if it's important or not, but what really excites me is the Catalan structure ("Every EML expression is a binary tree [...] isomorphic to well-studied combinatorial objects like full binary trees and Catalan objects").
So, what happens if you take say the EML expression for addition, and invert the binary tree?
This looks interesting. I haven't looked in-detail, but my first thought is - why hasn't this been found in the past? Surely, people have been interested kind of question for awhile?
Not sure why this is "hilarious", but it's very nice. I almost wish I was keeping this history too, even though I never really even had a "PC" as this separate major thing, I just have a bunch of various devices that serve different purposes, and most my desk "PCs" are just laptops.
Eg ln is a rather complicated construct, it's not even a function. That's because for complex numbers, e^x is not bijective, and thus its inverse ain't a function.
So using that complicated construct to define something simpler like addition invites extra complexity.
Like when the Apollo guidance computer was made, the bottleneck was making integrated chips so they only made one, the NOR gate, and a whackton of routing to build out an entire CPU. Horribly complex routing, very simplified integrated circuit construction
> No comparable primitive has been known for continuous mathematics: computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations.
I was taught that these were all hypergeometric functions. What distinction is being drawn here?
Hypergeometric functions are functions with 4 parameters.
When you have a function with many parameters it becomes rather trivial to express simpler functions with it.
You could find a lot of functions with 4 parameters that can express all elementary functions.
Finding a binary operation that can do this, like in TFA, is far more difficult, which is why it has not been done before.
A function with 4 parameters can actually express not only any elementary function, but an infinity of functions with 3 parameters, e.g. by using the 4th parameter to encode an identifier for the function that must be computed.
> Hypergeometric functions are functions with 4 parameters.
Granted, but the claim in the abstract says:
>> computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations
And I don't see how this is true as to hypergeometric functions in a way that isn't shared by the approach in the paper.
> Finding a binary operation that can do this, like in TFA, is far more difficult, which is why it has not been done before.
> A function with 4 parameters can actually express not only any elementary function, but an infinity of functions with 3 parameters, e.g. by using the 4th parameter to encode an identifier for the function that must be computed.
These statements seem to be in direct conflict with each other; you can use the second parameter of a binary function to identify a unary function just as you can use the fourth parameter of a quaternary function to identify a trinary one.
There is an essential difference between binary functions and unary functions.
With binary functions you can compose them using a very complex composition graph.
With unary functions you can compose them only linearly, so in general it is impossible to make a binary function with unary functions.
You can make binary functions from unary functions only by using at least one other binary function. For instance, you can make multiplication from squaring, but only with the help of binary addition/subtraction.
So the one function that can be used to generate the others by composition must be at least binary, in order to be able to generate functions with an arbitrary number of parameters.
This is why in mathematics there are many domains where the only required primitives are a small number of binary functions, but there is none where strictly unary functions are sufficient. (However, it may be possible to restrict the binary functions to very simple functions, e.g. making a tuple from components, for instance the CONS function of LISP I.)
I think that you may have replied before I saved my entire response, so I am not sure how much of it you had read before replying yourself.
I have replied to your last statement:
> "you can use the second parameter of a binary function to identify a unary function just as you can use the fourth parameter of a quaternary function to identify a trinary one."
As I have explained above, what you propose does not work. It works in functions with 3 or more parameters, but it does not work in binary functions, because you cannot make binary functions from unary functions (without using some auxiliary binary functions).
> As I have explained above, what you propose does not work. It works in functions with 3 or more parameters, but it does not work in binary functions, because you cannot make binary functions from unary functions (without using some auxiliary binary functions).
I have no idea what you're trying to say. If you can use one parameter to identify a desired function, then obviously you can use a function of arity n+1 to define as many functions of arity n as you want, and it doesn't matter what the value of n is.
For example:
selector(3, "sin") = sin 3
selector(3, "log2") = log₂ 3
This works going from arity 4 to arity 3, and it also works going from arity 2 to arity 1. Your "response" talks about going from arity 1 to arity 2, a non sequitur.
The subject of the parent article is expressing all the "elementary functions".
This requires expressing binary functions, like addition and multiplication.
You cannot do this by using only the set of unary functions, which can indeed be generated by a function with 2 parameters, one of which selects an unary function.
he is saying that if you reserve the second argument of a binary operator as a "function selection indicator", that you have restricted yourself to an alphabet of unary functions. This means that you could at most effectively model some unary functions, but not arbitrary expression involving +,x,-,/, ^, etc.
Unless you had hit upon a very magical binary function where certain special values of the second parameter happens to coincide with useful unary functions, without those values trampling on a useful binary mode or region of your binary function, but the search space for such a special binary function is so large that you shouldn't demand us to disprove the existence, but rather employ your non-surprisal at the EML result and challenge you to present such a binary function, so we can challenge you to demonstrate how it captures binary functions like addition,products, exponentiation with arbitrary base etc.
So, can we see your construction, or if you refuse to present one, we may conclude you have implicitly reconsidered your position and understand the theoretical elegance this EML (and presumably many other) basis brings?
If you've never worked through a derivation/explanation of the Y combinator, definitely find one (there are many across the internet) and work through it until the light bulb goes off. It's pretty incredible, it almost seems like "matter ex nihilo" which shouldn't work, and yet does.
It's one of those facts that tends to blow minds when it's first encountered, I can see why one would name a company after it.
Halfway through I was imagining aliens to whom this operator comes naturally and our math is weird. By the end I found out that we might be those aliens.
I couldn't find any information on this, but is it possible that given how nicely exponentiation and logarithms differentiate and integrate, is it possible that this operator may be useful to simplify the process of finding symbolic solutions to integrals and derivatives?
Agreed on integrals, but the derivative is relatively simple?
If f(x) = exp(x) - ln(x) then f’(x) = exp(x) - 1/x, which is representable in eml form as well.
To the overall point though, I don’t think it helps make derivatives easier though. To refactor a function to eml’s is far more work than refactoring into something that’s trivially differentiable with the product rule and chain rule.
Dreadfully slow for integer math but probably some similar performance to something like a CORDIC for specific operations. If you can build an FPU that does exp() and ln() really fast, it's simple binary tree traversal to find the solution.
You already have an FPU that approximates exp() and ln() really fast, because float<->integer conversions approximate the power 2 functions respectively. Doing it accurately runs face-first into the tablemaker's dilemma, but you could do this with just 2 conversions, 2 FMAs (for power adjustments), and a subtraction per. A lot of cases would be even faster. Whether that's worth it will be situational.
Anything discussing fast inverse sqrt will go over the logarithmic side [0], as it's the key insight behind that code. Exp is just the other direction. It's not widely documented in text otherwise, to my knowledge.
It would almost always be much, much worse. Practical numerical libraries (whether implemented in hardware or software) contain lots of redundancy, because their goal is to give you an optimized primitive as close as possible to the operation you actually want. For example, the library provides an optimized tan(x) to save you from calling sin(x)/cos(x), because one nasty function evaluation (as a power series, lookup table, CORDIC, etc.) is faster than two nasty function evaluations and a divide.
Of course the redundant primitives aren't free, since they add code size or die area. In choosing how many primitives to provide, the designer of a numerical library aims to make a reasonable tradeoff between that size cost and the speed benefit.
This paper takes that tradeoff to the least redundant extreme because that's an interesting theoretical question, at the cost of transforming commonly-used operations with simple hardware implementations (e.g. addition, multiplication) into computational nightmares. I don't think anyone has found a practical application for their result yet, but that's not the point of the work.
Traditional processors, even highly dedicated ones like TMUs in gpus, still require being preconfigured substantially in order to switch between sin/cos/exp2/log2 function calls, whereas a silicon implementation of an 8-layer EML machine could do that by passing a single config byte along with the inputs. If you had a 512-wide pipeline of EML logic blocks in modern silicon (say 5nm), you could get around 1 trillion elementary function evaluations per second on 2.5ghz chip. Compare this with a 96 core zen5 server CPU with AVX-512 which can do about 50-100 billion scalar-equivalent evaluations per second across all cores only for one specific unchanging function.
Take the fastest current math processors: TMUs on a modern gpu: it can calculate sin OR cos OR exp2 OR log2 in 1 cycle per shader unit... but that is ONLY for those elementary functions and ONLY if they don't change - changing the function being called incurs a huge cycle hit, and chaining the calculations also incurs latency hits. An EML coprocessor could do arcsinh(x² + ln(y)) in the same hardware block, with the same latency as a modern cpu can do a single FMA instruction.
So to clarify, you think that replacing every multiplication with 24 transcendental function evaluations (12 eml(x, y), each of which evaluates exp(x) and ln(y) plus the subtraction; see the paper's Fig 2) is somehow a win?
The fact that addition, subtraction, and multiplication run quickly on typical processors isn't arbitrary--those operations map well onto hardware, for roughly the same reasons that elementary school students can easily hand-calculate them. General transcendental functions are fundamentally more expensive in time, die area, and/or power, for the same reasons that elementary school students can't easily hand-calculate them. A primitive where all arithmetic (including addition, subtraction, or negation) involves multiple transcendental function evaluations is not computationally faster, lower-power, lower-area, or otherwise better in any other practical way.
The comments here are filled with people who seem to be unaware of this, and it's pretty weird. Do CS programs not teach computer arithmetic anymore?
For basic arithmetic, this is not required nor would it be faster, as it is not likely advantageous for bulk static transcendal functions. Where this becomes interesting is when combining them OR when chaining them where today they must come back out to the main process for reconfiguration and then re-issued.
Practical terms:
Jacobian (heavily used in weather and combustion simulation): The transcendental calls, mostly exp(-E_a/RT), are the actual clock-cycle bottleneck. The GPU's SFU computes one exp2 at a time per SM. The ALU then has to convert it (exp(x) = exp2(x × log2(e))), multiply by the pre-exponential factor, and accumulate partial derivatives. It's a long serial chain for each reaction rate.
The core of this is the Arrhenius rate, (A × T^n × exp(-E_a/(R×T))), which involves an exponentiation, a division, a multiplication, and an exponential. On a GPU, that's multiple SFU calls chained with ALU ops. In an EML tree, the whole expression compiles to a single tree that flows through the pipeline in one pass.
GPU (PreJacGPU) is currently the state of the art for speed on these simulations - a moderate width 8-depth EML machine could process a very complex Jacobian as fast as the gpu can evaluate one exp(). Even on a sub-optimal 250mhz FPGA, an entire 50x50 Jacobian would be about 3.5 microseconds vs 50 microseconds PER Jacobian on an A100.
If you put that same logic path into an ASIC, you'd be about 20x the fPGA's speed - in the nanoseconds per round. And this is not like you're building one function into an ASIC it's general purpose. You just feed it a compiled tree configuration and run your data through it.
For anything like linear algebra math, which is also used here, you'd delegate that to the dedicated math functions on the processor - it wouldn't make sense to do those in this.
> The core of this is the Arrhenius rate, (A × T^n × exp(-E_a/(R×T))), which involves an exponentiation, a division, a multiplication, and an exponential. On a GPU, that's multiple SFU calls chained with ALU ops. In an EML tree, the whole expression compiles to a single tree that flows through the pipeline in one pass.
I think you're missing the reason why the GPU kicks you out of the fast path when you need that special function. The special function evaluation is fundamentally more expensive in energy, whether that cost is paid in area or time. Evaluation of the special functions with throughput similar to the arithmetic throughput would require much more area for the special functions, which for most computation isn't a good tradeoff. That's why the GPU's designers chose to make your exp2 slow.
Replacing everything with dozens of cascaded special functions makes everything uniform, but it's uniformly much worse. I feel like you're assuming that by parallelizing your "EML tree" in dedicated hardware that problem goes away; but area isn't free in either dollars or power, so it doesn't.
There is a huge market for "its faster" at the cost of efficiency, but I don't think your claim that an EML hardware block would be inherently less inefficient than the same workload running on a GPU. If you think it would be, back it up with some numbers.
A 10-stage EML pipeline would be about the size of an avx-512 instruction block on a modern CPU, in the realm of ~0.1mm2 on a 5nm process node (collectively including the FMA units behind it), at it's entirety about 1% of the CPU die. None of this suggests that even a ~500 wide 10-stage EML pipeline would be consuming anywhere near the power of a modern datacenter GPU (which wastes a lot of it's energy moving things from memory to ALU to shader core...).
Not sure if you're arguing from a hypothetical position or practical one but you seem to be narrowing your argument to "well for simple math it's less efficient" but that's not the argument being made at all.
> you seem to be narrowing your argument to "well for simple math it's less efficient" but that's not the argument being made at all.
What? Unless the thing you want to compute happens to be exactly that eml() function (no multiplication, no addition, no subtraction unless it's an exponential minus a log, etc.) or almost so, it is unquestionably less efficient. If you believe otherwise, then please provide the eml() implementation of a practically useful function of your choice (e.g. that Arrhenius rate). Then we can count the superfluous transcendental function evaluations vs. a conventional implementation, and try to understand what benefit could outweigh them.
> A 10-stage EML pipeline would be about the size of an avx-512 instruction block on a modern CPU
Can you explain where you got that conclusion? And what do you think a "10-stage EML pipeline" would be useful for? Remember that the multiply embedded in your Arrhenius rate is already 8 layers and 12 operations.
Also, can you confirm whether you're working with an LLM here? You're making a lot of unsupported and oddly specific claims that don't make sense to me, and I'm trying to understand where they're coming from.
It's a kind of superposition representation a la Kolmogorov-Arnold, a learnable functional basis for elementary functions g(x,y)=f(x) - f^{-1}(y) in this sense with f=exp.
On my side I like direct sementic connections, but find convoluted indirections conflated through lazy sigles strongly repulsive. I can appreciate an acronym that make and the direct connection and playful indirect reference to expanded terms.
With NAND gates you can make any discrete system, but you can only approximate a continuous system.
This work is about continuous systems, even if the reduction of many kinds of functions to compositions of a single kind of function is analogous to the reductions of logic functions to composing a few kinds or a single kind of logic functions.
I actually do not value the fact that the logic functions can be expressed using only NAND. For understanding logic functions it is much more important to understand that they can be expressed using either only AND and NOT, or only OR and NOT, or only XOR and AND (i.e. addition and multiplication modulo 2).
Using just NAND or just NOR is a trick that does not provide any useful extra information. There are many things, including classes of mathematical functions or instruction sets for computers, which can be implemented using a small number of really independent primitives.
In most or all such cases, after you arrive to the small set of maximally simple and independent primitives, you can reduce them to only one primitive.
However that one primitive is not a simpler primitive, but it is a more complex one, which can do everything that the maximally simple primitives can do, and it can recreate those primitives by composition with itself.
Because of its higher complexity, it does not actually simplify anything. Moreover, generating the simpler primitives by composing the more complex primitive with itself leads to redundancies, if implemented thus in hardware.
This is a nice trick, but like I have said, it does not improve in any way the understanding of that domain or its practical implementations in comparison with thinking in terms of the multiple simpler primitives.
For instance, one could take a CMOS NAND gate as the basis for implementing a digital circuit, but you understand better how the CMOS logic actually works when you understand that actually the AND function and the NOT function are localized in distinct parts of that CMOS NAND gate. This understanding is necessary when you have to design other gates for a gate library, because even if using a single kind of gate is possible, the performance of this approach is quite sub-optimal, so you almost always you have to design separately, e.g. a XOR gate, instead of making it from NAND gates or NOR gates.
In CMOS logic, NAND gates and NOR gates happen to be the simplest gates that can restore at output the same logic levels that are used at input. This confuses some people to think that they are the simplest CMOS gates, but they are not the simplest gates when you remove the constraint of restoring the logic levels. This is why you can make more complex logic gates, e.g. XOR gates or AND-OR-INVERT gates, which are simpler than they would be if you made them from distinct NAND gates or NOR gates.
Got curious to see whether SymPy could be used to evaluate the expressions, so I used Claude Code to build a quick evaluator. Numeric and symbolic results appear to agree:
It's basically using the "-" embedded in the definition of the eml operator.
Table 4 shows the "size" of the operators when fully expanded to "eml" applications, which is quite large for +, -, ×, and /.
Here's one approach which agrees with the minimum sizes they present:
eml(x, y ) = exp(x) − ln(y) # 1 + x + y
eml(x, 1 ) = exp(x) # 2 + x
eml(1, y ) = e - ln(y) # 2 + y
eml(1, exp(e - ln(y))) = ln(y) # 6 + y; construction from eq (5)
ln(1) = 0 # 7
After you have ln and exp, you can invert their applications in the eml function
eml(ln x, exp y) = x - y # 9 + x + y
Using a subtraction-of-subtraction to get addition leads to the cost of "27" in Table 4; I'm not sure what formula leads to 19 but I'm guessing it avoids the expensive construction of 0 by using something simpler that cancels:
The problem with symbolic regression is ln(y) is undefined at 0, so you can't freely generate expressions with it. We need to guard it with something like ln(1+y*y) or ln(1+|y|) or return undefined.
I don't mean to shit on their interesting result, but exp or ln are not really that elementary themselves... it's still an interesting result, but there's a reason that all approximations are done using series of polynomials (taylor expansion).
Sorry, re-reading this, I should have said "most". As the other reply mentions, Pade approx. are also well liked for numerical methods.
I personally mostly do my everyday work using taylor expansion (mostly explicit numerical methods in comp. EM because they're cheaper these days and it's simpler to write down) so it's what first comes to mind.
A quick meta-take here: it is hard to assess the level of expertise here on HN. Some might be just tangentially interested, other might have degrees in the specific topic. Others might maintain a scientific computing library. Domains vary too: embedded systems, robotics, spacecraft navigation, materials modeling, or physics simulation. Until/unless people step up and fill the gaps somehow, we have little notion of identity nor credentialing, for better and for worse.*
So it really helps when people explain (1) their context** and (2) their reasoning. Communicating well is harder than people think. Many comments are read by hundreds or more (thousands?) of people, most of whom probably have no idea who we are, what we know, or what we do with our brains on a regular basis. It is generous and considerate to other people to slow down and really explain where we're coming from.
So, when I read "most people use Taylor approximations"...
1. my first question is "on what basis can someone say this?"
2. for what domains might this somewhat true? False?
3. but the bigger problem is that claims like the above don't teach. i.e. When do Taylor series methods fall short? Why? When are the other approaches more useful?
Here's my quick take... Taylor expansions tends to work well when you are close to the expansion point and the function is analytic. Taylor expansions work less well when these assumptions don't hold. More broadly they don't tend to give uniform accuracy across a range. So Taylor approximations are usually only local. Other methods (Padé, minimax, etc) are worth reaching for when other constraints matter.
* I think this is a huge area we're going to need to work on in the age where anyone can sound like an expert.
** In the case above, does "comp. EM" mean "computational electromagnetics" or something else? The paper talks about "EML" so it makes me wonder if "EM" is a typo. All of these ambiguities add up and make it hard for people to understand each other.
I do computational electromagnetism, specifically plasma simulation. In the field solver and particle pushers (I mainly do explicit codes meaning we just approximate the derivatives numerically) we only do taylor expansion so that the derivatives are essentially second order accurate. We don't bother going further, although I can, because in my domain, being more "accurate" as a function of step size (dx in approximating f(x)->f(x+dx)) yields less of a profit vs just decreasing step sizes and grid sizes (or increasing resolution), and even then, the numerical accuracy pales in comparison to say setting up the physical problem wrong (the focus of a simulated laser pulse being ten wavelengths out of focus).
Replying to some of your questions (1 and 2), this is from the perspective of a computational scientist, and a less theoretical type who works closely with experimentalists. This I am closer to a user of codes to model experiments than I am to someone who does a lot of analytic or fundamental theory, although my experience and perspective is probably close to others who are computational-ish in other domains like engineering, for the reasons I'll explain below.
For 3, most physically useful simulations that are not merely theoretical exercises (that is, simulations that are more predictive or explanative of actual experiments scientists want to do) will not consist of analytic functions you can write down. First, say that I suppose initial conditions in a problem has an aspect that is analytic (me setting my laser profile as a gaussian pulse), once the interaction with a plasma target occurs, the result you obtain (and thus predictions a simulation will make that can be compared to experiment) will not be gaussian but will evolve due to the complex physics modeled in the simulation. And a gaussian as an initial condition is already an approximation to an actual experiment. An "easy sim" for me is doing a best fit to the waist from a profile they'll read from a power meter, and using a gaussian that closely matches that, while a more realistic simulation would be me directly taking that data they have on an excel sheet and feeding that into the simulation directly as an initial condition. In most real world scenarios, most ICs already aren't analytic and must be solved numerically. By the way, this isn't that different for how engineers use computational codes too. Not many airplane wings are spheres or cylinders, you'd likely have to import the design for a wing from a CAD file into say an aerodynamics fluid code.
So in all these cases, the bottleneck isn't really approximating analytic functions you can write down either in closed form or even in series form as to the nth degree. Many people in the computational domain do not need accuracy beyond two or three terms in a taylor series. This is because it is usually easier to just cut down dx and do more steps in total rather than using a large dx and requiring more terms...and this before using any more sophisticated approximations. No code I know uses Pade approximations, I just know that some libraries for special functions (that may be one or two function calls exist in a code I use) use them.
Also, just a quick example you can try. Let's look at exp, for small argument (this only really works for small argument, you obviously can't do taylor expansion well for large argument). Consider the following:
>>> np.exp(0.4231)
np.float64(1.5266869570289792)
I will see how many terms I need to get 4 digits of accuracy (note that I had four digits in my input, so even ignoring sig figs, I probably shouldn't expect better than 4 digits for a result, note that numpy itself is numerical too so shouldn't be considered exact although i'll trust the first four digits here too)
>>> x = 0.4231
>>> 1
1
>>> 1 + x
1.4231
>>> 1 + x + x**2/2
1.512606805
>>> 1 + x + x**2/2 + x**3/(3*2)
1.5252302480651667
>>> 1 + x + x**2/2 + x**3/(3*2) + x**4/24
1.5265654927553847
Note that by 3 terms (x**3) I'm already off in the last digit by 1, by 4 terms, it's converged enough. Given a fp register, you can reuse powers of x from the last term, this is already dangerously cheap, why do you even need better than this in a practical sense? Most results I see in the wild do not even reach requiring this level of precision in a single simulation. I am a scientist however, it's possible engineers need more precision.
For us, more sophisticated methods on the "calculating transcendental functions" level are not really required, which is why they don't appear in codes I usually see. What we need better are things that make the actual elemental operations, like fma and the like, faster. Things like avx512 are far more interesting to me, for example.
In numerical analysis, elementary function membership, like special function membership, is ambiguous. In many circumstances, it’s entirely reasonable to describe the natural logarithm as a special function.
I don't think this is ever making it past the editor of any journal, let alone peer review.
Elementary functions such as exponentiation, logarithms and trigonometric functions are
the standard vocabulary of STEM education. Each comes with its own rules and a dedicated button on a scientific calculator;
What?
and No comparable primitive has been known for continuous mathematics: computing elementary
functions such as sin, cos, √
, and log has always required multiple distinct operations.
Here we show that a single binary operator
Yeah, this is done by using tables and series. His method does not actually facilitate the computation of these functions.
There is no such things as "continuous mathematics". Maybe he meant to say continuous function?
Looking at page 14, it looks like he reinvented the concept of the vector valued function or something. The whole thing is rediscovering something that already exists.
This preprint was written by a researcher at an accredited university with a PhD in physics. I'm sure they know what a vector valued function is.
The point of this paper is not to revolutionize how a scientific calculator functions overnight, its to establish a single binary operation that can reproduce the rest of the typical continuous elementary operations via repeated application, analogous to how a NAND or NOR gate creates all of the discrete logic gates. Hence, "continuous mathematics" as opposed to discrete mathematics. It seems to me you're being overly negative without solid reasoning.
its to establish a single binary operation that can reproduce the rest of the typical continuous elementary operations via repeated application,
But he didn't show this though. I skimmed the paper many times. He creates multiple branches of these trees in the last page, so it's not truly a single nested operation.
Well, it is still the case, even if not explicitly shown. Personally I think it almost boils down to school math, with some details around complex logarithms; the rest seems to be simpler.
Some of us had the wondrous epiphany as children that we could build any digital device we could dream of (yes, up to and including a full computer, CPU, RAM, peripherals, and all) out of SN7400 NAND gates that we could buy down at the local Radio Shack, if only we could scrape together enough change to buy the requisite number of parts and sockets and Tefzel wire.
Obviously, I can't speak for all of us who had that epiphany, but I strongly suspect that most of us who had that epiphany would find this result joyous.
The principal result is "all elementary functions can be represented by this function and constant 1". I'm not sure if this was known before. Applications are another matter, but I suspect interesting ones do exist.
According to Gemini 3.1 Pro this would shoot the current weather forecasting power through the roof (and math processing in general):
The plan is to use this new "structurally flawless mathematical primitive" EML (this is all beyond me, was just having some fun trying to make it cook things together) in TPUs made out of logarithmic number system circuits. EML would have DAGs to help with the exponential bloat problem. Like CERN has these tiny fast "harcode models" as an inspiration. All this would be bounded by the deductive causality of Pedro Domingoses Tensor Logic and all of this would einsum like a mf. I hope it does.
I understand enough for it's arguments' symmetry to have an impact. Used Deep Research, it had the paper from the link as input plus some previous discussions about Tensor Logic and the new hardcoded neuroweb - like processors. Didn't make those up either.
"I just want to get this out of my hands in case I made the model stumble upon something important. It's reasoning seems solid but I'm no expert. Here it is, go crazy:"
This is amazing! I love seeing FRACTRAN-shaped things on the homepage :) This reminds me of how 1-bit stacks are encoded in binary:
A stack of zeros and ones can be encoded in a single number by keeping with bit-shifting and incrementing.
I use something not too far off for my daily a programming based on a similar idea:
Rejoice is a concatenative programming language in which data is encoded as multisets that compose by multiplication. Think Fractran, without the rule-searching, or Forth without a stack.
https://wiki.xxiivv.com/site/rejoice
Did you just explain base-2 numbers on the HN forums as if it were novel?
It's dry humor?
You missed the part where the number is being used as a 1-bit stack. They never claimed novelty, it's just a neat technique that might be unfamiliar to most people.
I never claimed it was novel, chill.
adding a zero to the left of a binary integer doesn't double it
Wouldn't you also need to keep track of the stack's size, to know if there are leading zeros?
For trailing zeros yeah, or if you care for stack overflow/underflow. Here's a few primitives if you wanna try it out:
https://paste.sr.ht/~rabbits/cd2369cc7c72bfad0fcd83e27682095...
This isn't unique, or even the least compute way to do this. For example, let f(x,y) = 1/(x-y). This too is universal. I think there's a theorem stating for any finite set of binary operators there is a single one replacing it.
write x#y for 1/(x-y).
x#0 = 1/(x-0) = 1/x, so you get reciprocals. Then (x#y)#0 = 1/((1/(x-y)) - 0) = x-y, so subtraction.
it's common problem to show in any (insert various algebraic structure here ) inverse and subtraction gives all 4 elementary ops.
I haven't checked this carefully, but this note seems to give a short proof (modulo knowing some other items...) https://dmg.tuwien.ac.at/goldstern/www/papers/notes/singlebi...
yes, but are you currently experiencing both hypergraphia and chatbot AI induced psychosis while also thinking about this problem?
It's math. You can check it yourself instead of this (and many other) thoughtless posts.
i'm mocking the LLM-generated scientific article you're replying to, not you. i'm agreeing with you haha
Do you claim all things you don’t understand are LLMs? This is what I mean by these and many of your other comments being extremely poor quality to the point of deliberate ignorance.
The paper above was published in 2012 [1], so that’s quite a feat for an LLM. This takes about zero effort to check.
Put some thought or effort into your claims; they’ll look less silly.
[1] https://orcid.org/0000-0002-0438-633X
Good find.
It cites a paper from 1935: https://www.pnas.org/doi/10.1073/pnas.21.5.252
Here is a bit more: https://mathoverflow.net/questions/57465/can-we-unify-additi...
Ooh, that 2nd link has a nice construction by Terry Tao giving a clear way to show infinitely many such functions exist for pretty much any set of operations.
> This too is universal
Could that be used to derive trigonometric functions with single distinct expressions?
The exp and ln are infinite series. Exp is roughly the infinite series for cos AND the infinite series for sin. Hiding that every op is an infinite series behind a name doesn’t make things free. It just makes even trivial ops like 1+2 vastly more work.
They are not infinite series per se. They can be represented by infinite series in several ways but there are standard ways to define them that do not involve infinite series. The logarithm in particular is not even represented by an infinite series (in form of Taylor expansion) defined in the whole complex plane. And knowledge/use of trigonometric functions greatly precedes such infinite series representations.
Moreover, the point is not always numerical computation. I don’t think anybody argues that eml sounds like an efficient way to compute elementary functions numerically. It may or may not still be useful for symbolic computations.
The article is about producing all elementary functions, which 1/(x-y) clearly doesn’t, as it doesn’t produce any transcendental function. Like many of such universality-style results it may not have practical applications, but may still be interesting on its own right.
Any transcendental function can be produced by arithmetic, since its complete for R.
Go ahead and show how to compute exp or ln without an infinite series without circular reasoning. You can’t, since they’re transcendental.
There are infinitely many ways to make these binary operators. Picking extremely high compute cost ones really doesn’t make a good basis for computation.
> Any transcendental function can be produced by arithmetic, since its complete for R.
Not without some form of limit process or construction. You can approximate e with the basic arithmetic operations but not actually get an exact form in finite steps. And you definitely cannot transverse an infinite binary tree, so the main point of the result in the article is missed by your arguments.
Again, you are mixing separate things. Nobody said that eml is some way to approximate elementary functions more efficiently. It is a way to express elementary functions in a finite amount of operations. Meaning, computing symbolically, not numerically. Eg I may care that exp(3)*exp(2)=exp(5) without caring to approximate exp(5) numerically. The paper is literally under "Computer Science > Symbolic Computation", not "numerical analysis" or "engineering" after all.
And to be precise:
> Go ahead and show how to compute exp or ln without an infinite series without circular reasoning. You can’t, since they’re transcendental.
You don't necessarily need "infinite series", you need some limit process. A basic example is that exp(x) can be approximated by (1 + x/n)^n for large n. For the logarithm you can use a formula involving the arithmetic–geometric mean which you can approximate using an iterative process/recursion without infinite series. You can also approximate the exponential by using Newton's method together with that, see [0].
[0] Fast Computations of the Exponential Function https://link.springer.com/chapter/10.1007/3-540-49116-3_28
You win the internet today!
I don't think this can do any of the "standard" constants or what we generally consider to be closed-form expressions, though ! (E.g., no e, pi, exp, log, etc.)
Yes it can, by using the same infinite series that exp and ln use to compute. This one just costs less in money, hardware, energy, and is faster for basically every basic op.
I think the point is that it is _finite_. if you allow infinite expressions then the basic monomial basis or quotients thereof are “even simpler”
It’s only finite by putting the infinite series into an operation.
And the basic monomial basis is not a single binary operation capable of reproducing the set of basic arithmetic ops. If you want trivial and basic, pick Peano postulates. But that’s not what this thread was about.
well, the statement is: is there a single operation, built from elementary operations, such that all _other_ elementary operations have finite representations.
this preprint answers that in the affirmative
otoh, (x, y) -> 1/(x-y) does not answer this question at all. you can argue that the preprint does so "via the infinite series in an operation" (which I have no idea what that means; surely if exp(x) qualifies then so must 1/(x-y) if we pick a monomial basis?) but ¯\_(ツ)_/¯
now, do I think that this is groundbreaking magical research (as I'm currently seeing on twitter) no... But it's neat!
I think this is the novel bit:
> This includes constants such as e, pi, and i; arithmetic operations including addition, subtraction, multiplication, division, and exponentiation as well as the usual transcendental and algebraic functions.
And those come from the infinite series needed to compute exp and ln. They’re just as much work either way. The exp and ln way are vastly costlier for every op, including simply adding 1 and 2.
It's not about being costly or not, this is completely irrelevant to the point being made. eml is just some abstract function, that maps ℝ² to ℝ. Same as every other mathematical function it is only really defined by the infinite set of correspondences from one value to some other value. It is NOT exp(x) - ln(y), same as exp is not a series (as you wrongfully stated in another comment). exp can be expressed (and/or defined) as a series to a mathematician familiar with a notion of series, and eml can be expressed as exp(y) - ln(y) to a mathematician familiar with exp and ln. They can also be expressed/defined multiple other ways.
I am not claiming this is better than 1/(x-y) in any way (I have no idea, maybe it isn't if you look closely enough), but you are simply arguing against the wrong thing. Author didn't claim eml to be computationally efficient (it even feels weird to say that, since computational efficiency is not a trait of a mathematical function, but of a computer architecture implementing some program) or anything else, only that (eml, 1) are enough to produce every number and function that (admittedly, somewhat vaguely defined) a scientific calculator can produce.
However, I want to point out that it's weird 1/(x-y) didn't appear on that graph in Figure 1, since if it's as powerful as eml, it should have all the same connections as eml, and it's a pity Odrzywołek's paper misses it.
His paper misses infinitely many such functions.
I’m fairly certain that the difference between the approaches is that the f(x,y) function you mentioned requires limits to represent certain concepts while the eml approach is essentially a tree or a chain of computations meant to represent a model of a system.
Computing exp or ln is an infinite series, and vastly more compute. Hiding series behind a name doesn’t make them free to compute.
I understand your point. The paper is more about the depth of the tree to represent and audit a model versus the raw CPU clock cycles. It takes the exponent and logarithm as given since for all practical purposes, in a scientific context, they are.
To represent something like sin(x) with f(x,y) requires infinite steps. Conversely, with eml you get an exact result in around 4 using identities and such.
One could argue that we do Taylor Series approximations on the hardware to represent trigonometric functions, but that highlights the key aspect of the eml approach. You can write a paper with those four steps that describes an exact model, here sin(x). And people can take that paper and optimize the result. This paper is about an auditable grammar that you can compute with.
They're not series, that's just a convenient way to think about defining and calculating them. I've never found it particularly useful to deal with the series definitions either, and none of the (good) approximation methods I'm aware of actually take that approach.
Moreover, EML is complete in a way that your suggested function isn't: If you take a finite combination of basis functions, can it build periodic functions? Hardy proved over a century ago that real (+,-,/,*,exp,ln) can't do this (and answering the paper's unresolved question about similar real-valued functions in the negative). EML being able to build periodic functions is a lot less surprising for obvious reasons, but still pretty neat.
You're completely missing the point here.
You can reduce all Boolean logic to NAND, but that doesn't actually mean that semiconductor fabs translate their HDL to NAND gates, because it is possible to build complex gates that directly implement higher level operations in a single gate.
Your "cost of computation" objection can be easily resolved by adding more operators, which makes it boring from a research perspective.
Meanwhile the loss of expressivity can only be compensated by encoding algorithms directly into the expression tree. Your objection that an infinite series is a bad thing rings hollow, since you now introduce the concept of an infinitely sized expression tree. That sounds much more impractical than implementing an algorithm for the exponential and logarithm functions.
Show me a way to physically compute exp or ln that is less gates than add. More gates means costlier in $, more energy in compute, and for these functions, higher latency.
You don’t get to make up free ops, claim there is no cost in reality, and hand wave away reality.
There are infinitely many ways to do what the paper did. There’s no gain other than it’s pretty. It loses on every practical front to simply using current ops and architectures.
The feasibility of memristor analog circuits is evident, and I believe this paper represents a valuable early exploration. We've been constrained by Boolean logic for quite some time now.
The world has had many types of logic before and after Boolean logic was created, many used in computing. Boolean logic isn’t a constraint; it’s used where it’s useful, and others are used where they’re useful.
> no gain other than it’s pretty
Conceptual elegance is worth something, isn't it? I don't mean just aesthetic pleasure, as in recreational mathematics, but there's often (not always) value in being able to succinctly describe a wide range of phenomena with a small number of primitives. It could open up new ways of understanding or working that wasn't possible before. Not saying this specific discovery fits the description, but it seems too early to dismiss the idea entirely based on its im/practicality compared to existing solutions.
Aren't there examples in the history of mathematics, where a new idea was criticized for being impractical, then later found to have applications or implications, possibly unexpected even to the person who discovered it?
> Show me a way to physically compute exp or ln that is less gates than add.
IIRC a resistor in series to a capacitor does the trick, for exp.
No, it approximates exp poorly over an infinitesimally small interval compared to exp. Resistors and capacitors are no where ideal components, which is why they have spec sheets to show how quickly they diverge.
If we’re making sloppy approximations to a tiny range of exp, then I too can do it with a few terms.
This makes a good benchmark LLMs:
``` look at this paper: https://arxiv.org/pdf/2603.21852
now please produce 2x+y as a composition on EMLs ```
Opus(paid) - claimed that "2" is circular. Once I told it that ChatGPT have already done this, finished successfully.
ChatGPT(free) - did it from the first try.
Grok - produced estimation of the depth of the formula.
Gemini - success
Deepseek - Assumed some pre-existing knowledge on what EML is. Unable to fetch the pdf from the link, unable to consume pdf from "Attach file"
Kimi - produced long output, stopped and asked to upgrade
GLM - looks ok
> Once I told it that ChatGPT have already done this, finished successfully.
TIL you can taunt LLMs. I guess they exhibit more competitive spirit than I thought.
Opus seems to be wired currently to get you to spend more money. Once you tell it "Stop defrauding me, just get to the right solution" it often gets it.
I am like "Yeah ok, use the Arcee Trinity models!" and its like, you got it boss, 3 opus agents in parallel, got it!
I always start the chat with "we have been going in circles" before giving any context.
So what is the correct answer?
I copy and pasted the abstract into DeepSeek and asked your question. It's a bit unfair to penalise it for not knowing PDFs.
It got a result.
I changed the prompt to this:
""" Consider a mathematical function EML defined as `eml(x,y)=exp(x)−ln(y)`
Please produce `sin(x)/x` as a composition on EMLs and constant number 1 (one). """
If you like creating such things, consider contributing to Terminal Bench Science, https://www.tbench.ai/news/tb-science-announcement.
meta.ai in instant mode gets it first try too (I think?)
``` 2x + y = \operatorname{eml}\Big(1,\; \operatorname{eml}\big(\operatorname{eml}(1,\; \operatorname{eml}(\operatorname{eml}(1,\; \operatorname{eml}(\operatorname{eml}(L_2 + L_x, 1), 1) \cdot \operatorname{eml}(y,1)),1)\big),1\big)\Big) ```
for me Gemini hallucinated EML to mean something else despite the paper link being provided: "elementary mathematical layers"
this should be a tangential proof for the dying bunch of people who still believe that LLMs are just parrots. EML are literally a new invention
> A calculator with just two buttons, EML and the digit 1, can compute everything a full scientific calculator does
Reminds me of the Iota combinator, one of the smallest formal systems that can be combined to produce a universal Turing machine, meaning it can express all of computation.
https://en.wikipedia.org/wiki/Iota_and_Jot
All possible 36 distinct level-2 eml functions of one variable (the first 18 of them with entirely Real outputs, the other 18 with "intermediate" complex-valued components):
https://imgur.com/a/K7AoOFi
It would be fun to catalogue all one-variable functions that can be represented as binary trees of fixed depth in this way and then encode the trees in binary. Lots of these functions in old math book tables would look very different with a plain hash lookup and the identities in such books would prove themselves.
EDIT: please change the article link to the most recent version (as of now still v2), it is currently pointing to the v1 version which misses the figures.
I'm still reading this, but if this checks out, this is one of the most significant discoveries in years.
Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?
Got a multidimensional and multivariate function to model (with random samples or a full map)? Just do gradient descent and convert it to approximant EML trees.
Perform gradient descent on EML function tree "phi" so that the derivatives in the Schroedinger equation match.
But as I said, still reading, this sounds too good to be true, but I have witnessed such things before :)
> Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?
Same reason all boolean logic isn't performed with combinations of NAND – it's computationally inefficient. Polynomials are (for their expressivity) very quick to compute.
They are done with transistors though. Transistors form an efficient, single element, universal digital basis.
And are a much less arbitrary choice than NAND, vs. NOR, XOR, etc.
Using transistors as conceptual digital logic primitives, where power dissipation isn't a thing, Pass Logic is "The Way".
Single transistors aren't yet logic gates by themselves; they are amplifiers with a very specific gain function that makes it possible to use them as switches. Logic gates usually consist of at least two transistors. See https://en.wikipedia.org/wiki/CMOS for an example of how it is done in CMOS technology.
That is correct.
Moreover, the amplifying function must exist at least in some gates, to restore the logic levels, but there is no need for it to exist in all gates.
For instance, any logic circuit can be implemented using AND gates and/or OR gates made with diodes, which have no gain, i.e. no amplifying function, together with 1-transistor inverters, which provide both the NOT function and the gain needed to restore the logic levels.
The logic functions such as NAND can be expressed in several way using simpler components, which correspond directly with the possible hardware implementations.
Nowadays, the most frequent method of logic implementation is by using parallel and series connections of switches, for which the MOS transistors are well suited.
Another way to express the logic functions is by using the minimum and maximum functions, which correspond directly with diode-based circuits.
All the logic functions can also be expressed using the 2 operations of the binary finite field, addition (a.k.a. XOR) and multiplication (a.k.a. AND).
This does not lead to simpler hardware implementations, but it can simplify some theoretical work, by using algebraic results. Actually this kind of expressing logic is the one that should have been properly named as Boolean logic, as the contribution of George Boole has been precisely replacing "false" and "true" with "0" and "1" and reinterpreting the classic logic operations as arithmetic operations. It is very weird to see in some programming languages a data type named Boolean, whose possible values are "false" and "true", instead of the historically correct Boolean values, "0" and "1", which can also be much more useful in programming than "false" and "true", by simplifying expressions, especially in array operations (which is why array-oriented languages like APL use "0" and "1", not "false" and "true").
"Pass transistors" are transistors being operated in pass/impedance switch mode.
Pass logic. Digital. [0]
This is extremely basic digital circuit design. You can create digital circuits as compositions of gates. But you can often implement the same logic, with fewer transistors, using pass logic.
Pass logic is also great for asynchronous digital design.
[0] https://en.wikipedia.org/wiki/Pass_transistor_logic
> Transistors form an efficient, single element, universal digital basis
But transistors can be N or P-channel, so it’s not a single logical primitive, like e.g. NAND-gates.
True.
NAND also use both N and P-channel transistors, and have to use at least the same number, but frequently many more transistors for anything non-trivial.
But as you point out, as a computational basis, pass transistors have two variants.
The Cray 1 was built 100% from NOR gates and SRAM.
50 years ago.
Correct. Your point being? Digital logic didn't change.
> I'm still reading this, but if this checks out, this is one of the most significant discoveries in years.
It seems like a neat parlour trick, indeed. But significant discovery?
> Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?
Because the EML basis makes simple functions (like +) hard to express.
Not to diminish this very cool discovery!
consider how a wavefunction might be stored for numerically searching the ground state of a molecule, or a crystal lattice, humans appreciate 2D imagery that is about 1 kilopixel x 1 kilopixel; so expanding this to 3 dimensions that means the wavefunction would have to store 10 ^ 9 complex numbers (easily 4 GB at 16 bit precision for real and imaginary compnents so 4 bytes per complex number), do we really believe that a DAG variant of the EML construction would consume a bigger value to represent the analytically correct solution? do we really believe that the 4GB DAG variant of EML would produce a less accurate representation (i.e. less fidelity with the schroedinger equation?) If the ground state hydrogen atom is any indication, my money is on EML-style constructions, not naive 3D arrays modelling the wavefunction by brute force.
This also re-opens a lot of "party pooper" results in mathematics: impossibility of representing solutions to general quintic (fine print: if we restrict ourselves to arithmetic and roots/radicals). In mathematics and physics there have been a lot of "party pooper" results which later found more profound and interesting positive results by properly rephrasing the question. A negative result for a myopic question isn't very informative on its own.
> This also re-opens a lot of "party pooper" results in mathematics: impossibility of representing solutions to general quintic (fine print: if we restrict ourselves to arithmetic and roots/radicals).
Solving the quintic in terms of transcendental functions has already been done
https://en.wikipedia.org/wiki/Bring_radical#The_Hermite%E2%8...
From my experience of working in this problem domain for the last year, I'd say it is pretty powerful but the "too good to be true part" comes from that EML buys elegance through exponential expression blow-up. Multiplication alone requires depth-8 trees with 41+ leaves i.e. minimal operator vocabulary trades off against expression length. There's likely an information-theoretic sweet spot between these extremes.
It's interesting to see his EML approach whereas mine was more on generating a context sensitive homoiconic grammar.
I've had lots of success combining spectral neural nets (GNNs, FNOs, Neural Tangent Kernels) with symbolic regression and using Operad Theory and Category Theory as my guiding mathematical machinery
Link to your work?
> Multiplication alone requires depth-8 trees with 41+ leaves i.e. minimal operator vocabulary trades off against expression length.
That is sort of comparable to how NAND simplify scaling.
Division is hell on gates.
The single component was the reason scaling went like it did.
There was only one gate structure which had to improve to make chips smaller - if a chip used 3 different kinds, then the scaling would've required more than one parallel innovation to go (sort of like how LED lighting had to wait for blue).
If you need two or more components, then you have to keep switching tools instead of hammer, hammer, hammer.
I'm not sure what you mean by this? It's true that any Boolean operation can be expressed in terms of two-input NAND gates, but that's almost never how real IC designers work. A typical standard cell library has lots of primitives, including all common gates and up to entire flip-flops and RAMs, each individually optimized at a transistor level. Realization with NAND2 and nothing else would be possible, but much less efficient.
Efficient numerical libraries likewise contain lots of redundancy. For example, sqrt(x) is mathematically equivalent to pow(x, 0.5), but sqrt(x) is still typically provided separately and faster. Anyone who thinks that eml() function is supposed to lead directly to more efficient computation has missed the point of this (interesting) work.
Yeah, what you're going to get is more efficient proofs: you can do induction on one case to get results about elementary functions. Not sure where anyone's getting computational efficiency thoughts from this.
Are you under the impression that CPUs are made exclusively from NAND gates? You can't be serious.
Might’ve gotten mixed up with CMOS dominance, or I’m ignorant.
I believe you're not ignorant. But many folks probably lack the process knowledge (CMOS) required to understand why :-)
https://en.wikipedia.org/wiki/Mead%E2%80%93Conway_VLSI_chip_...
I'm guessing is what they're really talking about. Which is not about NAND gates.
Just to add a bit, but modern digital circuits are almost exclusively MOS, but even the "complementary" bit isn't universal in a large IC.
In my experience this exponential expression blow-up is less the result of the approach of decomposing into a minimum of primitives, but rather a result from repetition in expression trees:
If we make the analogy from Bertrand Russel's Principia Mathematica, he derived fully expanded expressions, i.e. trees where the leaves only may refer to the same literals, everyone claimed this madness underscored how formal verification of natural mathematics was a fools errand, but nevertheless we see successful projects like metamath (us.metamath.org) where this exponential blow-up does not occur. It is easy to see why: instead of representing proofs as full trees, the proofs are represented as DAG's. The same optimization would be required for EML to prevent exponential blow-up.
Put differently: if we allow extra buttons besides {1, EML} for example to capture unary functions the authors mentally add an 'x' button so now the RPN calculator has {1, EML, x}; but wait if you want multivariate functions it becomes an RPN calculator with extra buttons {1, EML, x,y,z} for example.
But why stop there? in metamath proofs are compressed: if an expression or wff was proven before in the same proof, it first subproof is given a number, and any subsequent invocations of this N'th subproof refers to this number. Why only recall input parameters x,y,z but not recall earlier computed values/functions?
In fact every proof in metamath set.mm that uses this DAG compressibility, could be split into the main proof and the repeatedly used substatements could be automatically converted to explicitly separate lemma proofs, in which case metamath could dispose of the single-proof DAG compression (but it would force proofs to split up into lemma's + main proof.
None of the proven theorems in metamath's set.mm displays the feared exponential blowup.
Yes, metamath uses a large collection of specialized but reusable building blocks, so it doesn't blow up exponentially. However, if you want to "just do gradient descent" on general trees built from a single universal primitive, you now have to rediscover all those building blocks on the fly. And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.
Or you could accept that there's already a large collection of known useful special functions, and work with shallower trees of those instead, e.g. https://arxiv.org/abs/1905.11481
> And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.
Thats not really necessary, imagine somewhere near the top of the binary tree a leaf ("1" or "x" or ...), with the current brute force method thats a whole binary subtree with parameters going unused.
One could just as well use that whole culled binary subtree as a DAG node.
It does require more complex routing, but selecting input nodes is a sparse task, so those routing parameters can use sparse virtual parameters, say inner products of dense vectors in some vector space, so it doesn't need to take up much memory...
Yeah, seems like classic representation vs traversal complexity trade-off a la David Marr.
Its not too good to be trough...
Its a way to make mathematical formulas completely unreadable. Its a way to spend more time on computing functions like log (3 ems reqd) while using more precision. Its a way to blow the mind of muggles reading hacker news.
ikrima, Do you have any links to your research or paper titles?
Where do you see exponential blow-up? If you replace every function in an expression tree with a tree of eml functions, that is a size increase by a constant factor. And the factor does not seem unreasonable but in the range 10 to 100.
The exponential blowup is in the symbolic regression section, where to search among depth K trees requires 2^K parameters.
As an example, searching for sqrt(x) would require a tree of depth ~40 which is in the trillion-parameter regime.
But that is not an increase in the expression size, that is the effort for searching for an expression tree that fits some target function. And that is no different from searching an expression based on common functions, that is of course also exponential in the expression tree height. The difference is that a eml-based tree will have a larger height - by some constant factor - than a tree based on common functions. On the other hand each vertex in an eml-tree can only be eml, one, or an input variable whereas in a tree based on common functions each vertex can be any of the supported basis functions - counting constants and inputs variables as nullary functions.
Ah I see I misunderstood your point, thanks for clarifying.
I think you are right, each node in some other expression tree formed of primitives in some set Prim would be increased by at most max(nodes(expr)) for expr in Prim.
That's essentially what the EML compiler is doing from what I understand.
yes, and even this search doesn't actually require trillions of parameters, since the switching parameters will be sparse, which means you can apply a FakeParameter trick: suppose I want a trillion sparse parameters, thats a million by a million. Let's just model those parameters as inner products of a million vectors each of some dimension N. Now its in the regime of megabytes or a GB.
For extreme regularization, one can even go down to 10 arbitrary precision numbers: if we have a single vector of 10 dimensions, we can re-order the components 10! different ways.
10! = 3 628 800
so we can retrieve ~3M vectors from it, and we can form about 10 T inner products.
What URL should we change it to?
The URL is already pointing at v2, which apparently is the newest one requested by the comment above.
> Submitted on 23 Mar 2026 (v1), last revised 4 Apr 2026 (this version, v2)
Thanks! that's what it looked like to me too.
I agree: I don't understand what happened, but the first "View PDF" resulted in a PDF where the hyperlinks to the figures didn't work. Upon closer inspection it wasn't v1 at all, thats a PNAS article. I am unable to remove the EDIT:... line in my original comment at this time...
No worries! We definitely want the best link so such comments are helpful.
IMO arxiv links should pretty much always be to the abstract (ie .../abs/...) as opposed to particular versions of the html or pdf.
While I'm really enjoying this paper, I think you are way overstating the significance here. This is mathematically interesting, and conceptually elegant, but there is nothing in this paper that suggests a competitive regression or optimisation approach.
I might have misunderstood, but from the two "Why do X when you can do just Y with EML" sentences, I think you are describing symbolic regression, which has been around for quite some time and is a serious grown-up technique these days. But even the best symbolic regression tools do not typically "replace" other regression approaches.
I can't say I'm surprised at this result at all, in fact I'm surprised something like this wasn't already known.
This isn't all that significant to anyone who has done Calculus 2 and knows about Taylor's Series.
All this really says is that the Taylor's expansions of e^x and ln x are sufficient to express to express trig functions, which is trivially true from Euler's formula as long as you're in the complex domain.
Arithmetic operations follow from the fact that e^x and ln x are inverses, in particular that e^ln(x) = x.
Taylor's series seem a bit like magic when you first see them but then you get to Real Analysis and find out there are whole classes of functions that they can't express.
This paper is interesting but it's not revolutionary.
There is a huge number of people who understand Taylor series, know how to compute them, and the things you can do with Taylor (and other kinds of) expansions. Yet none of us identified that this binary operation spans that lot of them, but I'm willing to read references to predating observations of the same kind. The author does mention alternative systems (incomplete in some specific sense) in the paper.
I did however keep thinking there was a lot of attention to trying to include special constants even though we don't know that much about these constants yet, while comparatively little attention went to say elliptic integrals etc.
When aiming for a wide catch, you'd include some of those esoteric functions, or erf() etc...
I also wished they had attempted to find a representation for derivative and integrals.
Given this amazing work, an efficient EML operator HW implementation could revolutionize a bunch of things. So the next thing might be an efficient EML HW implementation.
You should mark sarcasm as subtle as this.
The compute, energy, and physical cost of this versus a simple x+y is easily an order of magnitude. It will not replace anything in computing, except maybe fringe experiments.
derivation of -x seems wrong. we can look at the execution trace on a stack machine, but it's actually not hard to see. starting from the last node before the output, we see that the tree has the form
and the claim is that, after it's expanded, z will be such that this whole thing is equal to -x. but with some algebra, this is happening only if
and there is no complex number z that satisfies this equation. indeed if we laboriously expand the given formula for z (the left branch of the tree), we see that it goes through ln(0), and compound expressions.
x^-1 has the same problem.
both formulae work ...sort of... if we allow ln(0) = Infinity and some other moxie, such as x / Infinity = 0 for all finite x.
ah, the paper acknowledges this. my bad for jumping to the diagrams!
On page 11, the paper explicitly states:
> EML-compiled formulas work flawlessly in symbolic Mathematica and IEEE754 floating-point… This is because some formulas internally might rely on the following properties of extended reals: ln 0 = −∞, e^(−∞) = 0.
And then follows with:
> But EML expressions in general do not work ‘out of the box’ in pure Python/Julia or numerical Mathematica.
Thus, the paper’s completeness claim depends on a non-standard arithmetic convention (ln(0) = -∞), not just complex numbers as it primarily advertises. While the paper is transparent about this, it is however, buried on page 11 rather than foregrounded as a core caveat. Your comment deserves credit for flagging it.
The author does address this further on page 14 of SI and provides an alternative of:
−z = 1 − (e − ((e − 1) − z))
I would not call a "non-standard arithmetic convention" that ln(0) = -∞.
This is the standard convention when doing operations in the extended real number line, i.e. in the set of the real numbers completed with positive and negative infinities.
When the overflow exception is disabled, any modern CPU implements the operations with floating-point numbers as operations in the extended real number line.
So in computing this convention has been standard for more than 40 years, while in mathematics it has been standard for a couple of centuries or so.
As always in mathematics, when computing expressions, i.e. when computing any kind of function, you must be aware very well which are the sets within which you operate.
If you work with real numbers (i.e. in a computer you enable the FP overflow exception), then ln(0) is undefined. However, if you work with the extended real number line, which is actually the default setting in most current programming languages, then ln(0) is well defined and it is -∞.
Apparently Python throws an exception. This surprised me, I expected it to only throw for integers. Throwing for floats is weird and unsafe.
though if you use numpy floats you only get a warning:
JavaScript works as expected:
yeah, it's annoying that author talks about RPN notation, but only gives found formulas in form of images
looks like it computes ln(1)=0, then computes e-ln(0)=+inf, then computes e-ln(+inf)=-inf
> using EML trees as trainable circuits ..., I demonstrate the feasibility of exact recovery of closed-form elementary functions from numerical data at shallow tree depths up to 4
That's awesome. I always wondered if there is some way to do this.
> For example, exp(x)=eml(x,1), ln(x)=eml(1,eml(eml(1,x),1)), and likewise for all other operations
I read the paper. Is there a table covering all other math operations translated to eml(x,y) form?
last page of the PDF has several tree's that represent a few common math functions.
I was curious about that too. Gemini actually gave a decent list. Trig functions come from Euler's identity:
which means:
so adding them together:
So I guess the real part of that.
Multiplication, division, addition and subtraction are all straightforward. So are hyperbolic trig functions. All other trig functions can be derived as per above.
I think what you want is the supplementary information, part II "completeness proof sketch" on page 12. You already spotted the formulas for "exp" and real natural "L"og; then x - y = eml(L(x), exp(y)) and from there apparently it is all "standard" identities. They list the arithmetic operators then some constants, the square root, and exponentials, then the trig stuff is on the next page.
You can find this link on the right side of the arxiv page:
https://arxiv.org/src/2603.21852v2/anc/SupplementaryInformat...
Didn't read the paper, but it was easy for me to derive constants 0, 1, e and functions x + y, x - y, exp(x), ln(x), x * y, x / y. So seems to be enough for everything. Very elegant.
Although x + y is surprisingly more complicated than you'd expect at first. The construction first goes for exp(x) and ln(x) then to x - y and finally uses -y to get to x + y.
For completeness, there is also Peirce’s arrow aka NOR operation which is functionally complete. Fun applications iirc VMProtect copy protection system has an internal VM based on NOR.
Quick google seach brings up https://github.com/pr701/nor_vm_core, which has a basic idea
That’s boolean functional completeness, which is kind of a trivial result (NAND, NOR). It mirrors this one insofar as the EDL operator is also a combination of a computation and a negation in the widest senses.
That's quite interesting.
Few ideas that come to my mind when reading this:
1. One should also add absolute value (as sqrt(x*x)?) as a desired function and from that min, max, signum in the available functions. Since the domain is complex some of them will be a bit weird, I am not sure.
2. I think, for any bijective function f(x) which, together with its inverse, is expressible using eml(), we can obtain another universal basis eml(f(x),f(y)) with the added constant f^-1(1). Interesting special case is when f=exp or f=ln. (This might also explain the EDL variant.)
3. The eml basis uses natural logarithm and exponent. It would be interesting to see if we could have a basis with function 2^x - log_2(y) and constants 1 and e (to create standard mathematical functions like exp,ln,sin...). This could be computationally more feasible to implement. As a number representation, it kinda reminds me of https://en.wikipedia.org/wiki/Elias_omega_coding.
4. I would like to see an algorithm how to find derivatives of the eml() trees. This could yield a rather clear proof why some functions do not have indefinite integrals in a symbolic form.
5. For some reason, extending the domain to complex numbers made me think about fuzzy logics with complex truth values. What would be the logarithm and exponential there? It could unify the Lukasiewicz and product logics.
I made a fun marimo notebook to try and derive these myself. I structured each cell in order based on the diagram at the end of the paper. It uses Sympy to determine if the function is correct or not.
https://gist.github.com/CGamesPlay/9d1fd0a9a3bd432e77c075fb8...
I made https://github.com/nullwiz/emlvm/tree/main yesterday, for fun :^)
Built a JS implementation today — monogate https://explorer-taupe-five.vercel.app · npm install monogate The interesting engineering problem was negation. The paper's SI gives one construction; we independently derived a two-regime approach — tower formula for y≤0, shift formula for y>0 — that stays stable to |y|<708 in IEEE 754. We also extended to ℂ. After seeing pveierland's result in this thread (i constructible from {1} in K=75 under extended-reals convention), we investigated and documented the distinction: under strict principal-branch ln where ln(0) throws, whether {1} alone generates i remains open. Under the extended-reals convention used in the paper, their construction holds. Two different grammars, not contradictory results. 109 tests. MIT. github.com/almaguer1986/monogate
Can someone explain how is this different from lambda calculus, it seems like you can derive the same in both. I don't understand both well enough and hence the question.
Lambda calculus talks about computable functions, where the types of the inputs are typically something discrete, like `Bool` or `Nat`. Here, the domain is the real numbers.
The short answer is that the lambda calculus computes transformations on digital values while this is for building functions that can transform continuous (complex) values.
Any lambda term is equivalent to a combinatory term over a one-point basis (like λxλyλz. x z (y (λ_.z)) [1]). One difference is that lambda calculus doesn't distinguish between functions and numbers, and in this case no additional constant (like 1) is needed.
[1] https://github.com/tromp/AIT/blob/master/ait/minbase.lam
Lamda kind of does this in an analogous form, but does not allow you to derive this particular binary expression as a basis for elementary functions. There is a related concept with Iota [1], which allows you express every combinatoric SKI term and in turn every lambda definable function. But similar to this particular minimalist scientific function expression, it is mostly of interest for reductionist enthusiasts and not for any practical purpose.
[1] https://en.wikipedia.org/wiki/Iota_and_Jot
Lambda calculus is about discrete computations, this is about continuous functions. You can’t reason about continuous functions in lambda calculus.
Depending on your lambda calculus! From a categorical perspective a lambda calculus is just a nice syntax for Cartesian closed categories (or similar, e.g. *-autonomous categories for linear lambda calculus) so you can use it to reason about anything you can fit into that mould. For example, Paul Taylor likes to do exactly this: https://www.paultaylor.eu/ASD/analysis#lamcra
What would physical EML gates be implemented in reality?
Posts like these are the reason i check HN every day
probably with op-amps
Both BJTs and FETs have intrinsic exponential/logarithmic behaviors (at low biases) due to charge density being given by the Fermi-Dirac distribution since electrons are fermions.
Stupid question maybe (I am no mathematician), but aren't exp and ln really primitives? Aren't they implemented in terms of +,-,/,* etc? Or do we assume that we have an infinite lookup table for all possible inputs?
> aren't exp and ln really primitives? Aren't they implemented in terms of +,-,/,* etc?
They're primitive in the sense that you can't compute exp(x) or log(x) using a finite combination of other elementary functions for any x. If you allow infinite many operations, then you can easily find infinite sums or products of powers, or more complicated expressions to represent exp and log and other elementary functions.
> Or do we assume that we have an infinite lookup table for all possible inputs?
Essentially yes, you don't necessarily need an "implementation" to talk about a function, or more generally you don't need to explicitly construct an object from simpler pieces: you can just prove it satisfies some properties and that it is has to exist.
For exp(x), you could define the function as the solution to the diffedential equal df/dx = f(x) with initial condition f(0) = 1. Then you would enstablish that the solution exists and it's unique (it follows from the properties of the differential equation), call exp=f and there you have it. You don't necessarily know how to compute for any x, but you can assume exp(x) exists and it's a real number.
I think the point here is to explore the reduction of these functions to finite binary trees using a single binary operator and a single stopping constant. The operator used could be arbitrarily complex; the objective is to prove that other expressions in a certain family — in this case, the elementary functions — can be expanded as a finite (often incomplete) binary tree of that same operation.
In other words, this result does not aim to improve computability or bound the complexity of calculating the numerical value. Rather, it aims to exhibit this uniform, finite tree structure for the entire family of elementary expressions.
I think there is still an implicit restriction on the complexity of the operator for this to be interesting. Otherwise you could design an operator which accepts a pair x,y and performs one of 2^k elementary binary operations by reading off the first k bits of x and applying the specified operation on the remainder of x and y. (This is kind of like how real-valued computational models become too powerful for complexity theory to work if you allow bitwise operations.)
Exactly! If you didn't strictly limit the operator's complexity, you could just smuggle a Turing machine in via bitwise logic and turn the whole thing into a parlor trick. The beauty here is that eml(x,y) is a pure, continuous analytical function with no hidden branching whatsoever.
To clarify my earlier point: the author isn't trying to build a practical calculator or generate human-readable algebra. Using exp and ln isn't a cheat code because the goal is purely topological. The paper just proves that this massive, diverse family of continuous math can be mapped perfectly onto a uniform binary tree, without secretly burying a state machine inside the operator.
> The beauty here is that eml(x,y) is a pure, continuous analytical function with no hidden branching whatsoever.
They use the complex version of logarithm, that has a lot of branching problems.
Different sense of “branching”
Yep.
Well, the paper explicitly takes the principal branch to solve this.
So it isn't exploiting the branching for computation.
I agree, as the sibling comment there are two different things that are named "branches". Anyway, to get the principal branch in the microprocessor it's necessary to implement "atan2" that has a lot of special cases.
For example, IIRC ln( -inf.0 + y * i ) = ´+inf.0 + pi * sign(y)
You have a gate (called here "eml") that takes x and y and gives `exp(x) - log(y)`. Then you implement all other operations and elementary functions, including addition, multiplication etc, using only compositions of this gate/function (and the constant 1). You don't have addition as you start, you only have eml and 1. You define addition in terms of those.
So, like brainf*ck (the esoteric programming language), but for maths?
But even tighter. With eml and 1 you could encode a funtion in rpn as bits.
Although you also need to encode where to put the input.
The real question is what emoji to use for eml when written out.
> The real question is what emoji to use for eml when written out.
Some Emil or another, I suppose. Maybe the one from Ratatouille, or maybe this one: https://en.wikipedia.org/wiki/Emil_i_L%C3%B6nneberga
Not brainf*ck. This is the SUBLEQ equivalent of math https://en.wikipedia.org/wiki/One-instruction_set_computer#S...
Did you maybe mean to respond to the parent of my comment?
So brainf*ck in binary?
I'm kidding, of course. You can encode anything in bits this way.
In rpn notation you just put the input on the stack, right? The encodings seems like they could get pretty big, and encodings certainly wouldn't be unique, but you should be able to encode pretty much any constant you could think of.
More like lambda calculus, but for continuous functions.
Not sure it really compares to NAND() and the likes.
Simply because bool algebra doesn't have that many functions and all of them are very simple to implement.
A complex bool function made out of NANDs (or the likes) is little more complex than the same made out of the other operators.
Implementing even simple real functions out of eml() seems to me to add a lot of computational complexity even with both exp() and ln() implemented in hardware in O(1). I think about stuff sum(), div() and mod().
Of course, I might be badly wrong as I am not a mathematician (not even by far).
But I don't see, at the moment, the big win on this.
This has no use for numeric computations, but it may be useful in some symbolic computations, where it may provide expressions with some useful properties, e.g. regarding differentiability, in comparison with alternatives.
This is neat, but could someone explain the significance or practical (or even theoretical) utility of it?
second, please help us laypeople here
Read the paper. On the third page is a "Significance statement".
eh, i didnt find that paragraph very helpful. it just restates what it means do decompose an expression into another one only relying on eml, and vaguely gestures at what this could mean, i was hoping for something more specific.
From the paper:
> Everyone learns many mathematical operations in school: fractions, roots, logarithms, and trigonometric functions (+, −, ×, /, sqrt, sin, cos, log, …), each with its own rules and a dedicated button on a scientific calculator. Higher mathematics reveals that many of these are redundant: for example, trigonometric ones reduce to the complex exponential. How far can this reduction go? We show that it goes all the way: a single operation, eml(x, y), replaces every one of them. A calculator with just two buttons, EML and the digit 1, can compute everything a full scientific calculator does. This is not a mere mathematical trick. Because one repeatable element suffices, mathematical expressions become uniform circuits, much like electronics built from identical transistors, opening new ways to encoding, evaluating, and discovering formulas across scientific computing.
Actually we know this for a long time. The universal approximation theorem states that any arbitrary function can be modelled through a nonlinear basis function so long as capacity is big enough. The practical bit here is knowing how many basis functions can be approximated with a two operators. That’s new!
This could have some interesting hardware implications as well - it suggests that a large dedicated silicon instruction set could accelerate any mathematical algorithm provided it can be mapped to this primitive. It also suggests a compiler/translation layer should be possible as well as some novel visualization methods for functions and methods.
I'm not too familiar with the hardware world, but does EML look like the kind of computation that's hardware-friendly? Would love for someone with more expertise to chime in here.
Yes actually, it is very regular which usually lends itself to silicon implementations - the paper event talks about this briefly.
I think the bigger question is whether it will be more energy-optimal or silicon density-optimal than math libraries that are currently baked into these processors (FPUs).
There are also some edge cases "exp(exp(x))" and infinities that seem to result in something akin to "division by zero" where you need more than standard floating-point representations to compute - but these edge cases seem like compiler workarounds vs silicon issues.
A similar function operating on the real domain for powers and logs of 2 would be extremely hardware friendly. You can build it directly out of the floating point format. First K significand bits index a LUT. Do that for each argument and subtract them.
It gets a bit more difficult for the complex domain because you need rotation.
This paper seems to suggest that a chip with 10 pipeline stages of EML units could evaluate any elementary function (table 4) in a single pass.
I'm curious how this would compare to the dedicated sse or xmx instructions currently inside most processor's instruction sets.
Lastly, you could also create 5-depth or 6-depth EML tree in hardware (fpga most likely) and use it in lieu of the rust implementation to discover weight-optimal eml formulas for input functions much quicker, those could then feed into a "compiler" that would allow it to run on a similar-scale interpreter on the same silicon.
In simple terms: you can imagine an EML co-processor sitting alongside a CPUs standard math coprocessor(s): XMX, SSE, AMX would do the multiplication/tile math they're optimized for, and would then call the EML coprocessor to do exp,sin,log calls which are processed by reconfiguring the EML trees internally to process those at single-cycle speed instead of relaying them back to the main CPU to do that math in generalized instructions - likely something that takes many cycles to achieve.
You could also make an analog EML circuit in theory, using electrical primitives that have been around since the 60s. You could build a simple EML evaluator on a breadboard. Things like trig functions would be hard to reproduce, but you could technically evaluate output in electrical realtime (the time it takes the electrical signal to travel though these 8-10 analog amplifier stages).
I'm way too unschooled to say if it's important or not, but what really excites me is the Catalan structure ("Every EML expression is a binary tree [...] isomorphic to well-studied combinatorial objects like full binary trees and Catalan objects").
So, what happens if you take say the EML expression for addition, and invert the binary tree?
This looks interesting. I haven't looked in-detail, but my first thought is - why hasn't this been found in the past? Surely, people have been interested kind of question for awhile?
I like this guy. https://th.if.uj.edu.pl/~odrzywolek/homepage/index.html
Oh my god the PC history is hilarious
https://th.if.uj.edu.pl/~odrzywolek/homepage/personal/pc/pc_...
Not sure why this is "hilarious", but it's very nice. I almost wish I was keeping this history too, even though I never really even had a "PC" as this separate major thing, I just have a bunch of various devices that serve different purposes, and most my desk "PCs" are just laptops.
Definitely prefer his old-school page to the vibe-coded-design page,
https://th.if.uj.edu.pl/~odrzywolek/
Interesting, but is the required combination of EML gates less complex than using other primitives?
In general, no.
It's about symbolic computation more than calculations.
Yes, and it's not all that useful there either.
Eg ln is a rather complicated construct, it's not even a function. That's because for complex numbers, e^x is not bijective, and thus its inverse ain't a function.
So using that complicated construct to define something simpler like addition invites extra complexity.
Depends on how you define complexity?
Like when the Apollo guidance computer was made, the bottleneck was making integrated chips so they only made one, the NOR gate, and a whackton of routing to build out an entire CPU. Horribly complex routing, very simplified integrated circuit construction
Reminds me a bit of the coolest talk I ever got to see in person: https://youtu.be/FITJMJjASUs?si=Fx4hmo77A62zHqzy
It’s a derivation of the Y combinator from ruby lambdas
Have you gone through The Little Schemer?
More on topic:
> No comparable primitive has been known for continuous mathematics: computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations.
I was taught that these were all hypergeometric functions. What distinction is being drawn here?
Hypergeometric functions are functions with 4 parameters.
When you have a function with many parameters it becomes rather trivial to express simpler functions with it.
You could find a lot of functions with 4 parameters that can express all elementary functions.
Finding a binary operation that can do this, like in TFA, is far more difficult, which is why it has not been done before.
A function with 4 parameters can actually express not only any elementary function, but an infinity of functions with 3 parameters, e.g. by using the 4th parameter to encode an identifier for the function that must be computed.
> Hypergeometric functions are functions with 4 parameters.
Granted, but the claim in the abstract says:
>> computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations
And I don't see how this is true as to hypergeometric functions in a way that isn't shared by the approach in the paper.
> Finding a binary operation that can do this, like in TFA, is far more difficult, which is why it has not been done before.
> A function with 4 parameters can actually express not only any elementary function, but an infinity of functions with 3 parameters, e.g. by using the 4th parameter to encode an identifier for the function that must be computed.
These statements seem to be in direct conflict with each other; you can use the second parameter of a binary function to identify a unary function just as you can use the fourth parameter of a quaternary function to identify a trinary one.
There is an essential difference between binary functions and unary functions.
With binary functions you can compose them using a very complex composition graph.
With unary functions you can compose them only linearly, so in general it is impossible to make a binary function with unary functions.
You can make binary functions from unary functions only by using at least one other binary function. For instance, you can make multiplication from squaring, but only with the help of binary addition/subtraction.
So the one function that can be used to generate the others by composition must be at least binary, in order to be able to generate functions with an arbitrary number of parameters.
This is why in mathematics there are many domains where the only required primitives are a small number of binary functions, but there is none where strictly unary functions are sufficient. (However, it may be possible to restrict the binary functions to very simple functions, e.g. making a tuple from components, for instance the CONS function of LISP I.)
What are you responding to?
I think that you may have replied before I saved my entire response, so I am not sure how much of it you had read before replying yourself.
I have replied to your last statement:
> "you can use the second parameter of a binary function to identify a unary function just as you can use the fourth parameter of a quaternary function to identify a trinary one."
As I have explained above, what you propose does not work. It works in functions with 3 or more parameters, but it does not work in binary functions, because you cannot make binary functions from unary functions (without using some auxiliary binary functions).
> As I have explained above, what you propose does not work. It works in functions with 3 or more parameters, but it does not work in binary functions, because you cannot make binary functions from unary functions (without using some auxiliary binary functions).
I have no idea what you're trying to say. If you can use one parameter to identify a desired function, then obviously you can use a function of arity n+1 to define as many functions of arity n as you want, and it doesn't matter what the value of n is.
For example:
selector(3, "sin") = sin 3
selector(3, "log2") = log₂ 3
This works going from arity 4 to arity 3, and it also works going from arity 2 to arity 1. Your "response" talks about going from arity 1 to arity 2, a non sequitur.
The subject of the parent article is expressing all the "elementary functions".
This requires expressing binary functions, like addition and multiplication.
You cannot do this by using only the set of unary functions, which can indeed be generated by a function with 2 parameters, one of which selects an unary function.
he is saying that if you reserve the second argument of a binary operator as a "function selection indicator", that you have restricted yourself to an alphabet of unary functions. This means that you could at most effectively model some unary functions, but not arbitrary expression involving +,x,-,/, ^, etc.
Unless you had hit upon a very magical binary function where certain special values of the second parameter happens to coincide with useful unary functions, without those values trampling on a useful binary mode or region of your binary function, but the search space for such a special binary function is so large that you shouldn't demand us to disprove the existence, but rather employ your non-surprisal at the EML result and challenge you to present such a binary function, so we can challenge you to demonstrate how it captures binary functions like addition,products, exponentiation with arbitrary base etc.
So, can we see your construction, or if you refuse to present one, we may conclude you have implicitly reconsidered your position and understand the theoretical elegance this EML (and presumably many other) basis brings?
If you've never worked through a derivation/explanation of the Y combinator, definitely find one (there are many across the internet) and work through it until the light bulb goes off. It's pretty incredible, it almost seems like "matter ex nihilo" which shouldn't work, and yet does.
It's one of those facts that tends to blow minds when it's first encountered, I can see why one would name a company after it.
Halfway through I was imagining aliens to whom this operator comes naturally and our math is weird. By the end I found out that we might be those aliens.
I couldn't find any information on this, but is it possible that given how nicely exponentiation and logarithms differentiate and integrate, is it possible that this operator may be useful to simplify the process of finding symbolic solutions to integrals and derivatives?
It transform a simple expression like x+y into a long chain of "eml" applications, so:
Derivatives: No. Exercise: Write the derivative of f(x)=eml(x,x)
Integrals: No. No. No. Integrals of composition are a nightmare, and here they use long composition chain like g(x)=eml(1,eml(eml(1,x),1)).
Agreed on integrals, but the derivative is relatively simple?
If f(x) = exp(x) - ln(x) then f’(x) = exp(x) - 1/x, which is representable in eml form as well.
To the overall point though, I don’t think it helps make derivatives easier though. To refactor a function to eml’s is far more work than refactoring into something that’s trivially differentiable with the product rule and chain rule.
You mean
and I still have to macroexpand a few
but I got really bored
How would an architecture with a highly-optimized hardware implementation of EML compare with a traditional math coprocessor?
Dreadfully slow for integer math but probably some similar performance to something like a CORDIC for specific operations. If you can build an FPU that does exp() and ln() really fast, it's simple binary tree traversal to find the solution.
You already have an FPU that approximates exp() and ln() really fast, because float<->integer conversions approximate the power 2 functions respectively. Doing it accurately runs face-first into the tablemaker's dilemma, but you could do this with just 2 conversions, 2 FMAs (for power adjustments), and a subtraction per. A lot of cases would be even faster. Whether that's worth it will be situational.
What are some of your favorite sources to dig into this field as a whole?
Anything discussing fast inverse sqrt will go over the logarithmic side [0], as it's the key insight behind that code. Exp is just the other direction. It's not widely documented in text otherwise, to my knowledge.
[0] https://github.com/francisrstokes/githublog/blob/main/2024/5...
Right, I meant to say fast and accurate. You could run into quite a lot of floating point error using this for everything.
It would almost always be much, much worse. Practical numerical libraries (whether implemented in hardware or software) contain lots of redundancy, because their goal is to give you an optimized primitive as close as possible to the operation you actually want. For example, the library provides an optimized tan(x) to save you from calling sin(x)/cos(x), because one nasty function evaluation (as a power series, lookup table, CORDIC, etc.) is faster than two nasty function evaluations and a divide.
Of course the redundant primitives aren't free, since they add code size or die area. In choosing how many primitives to provide, the designer of a numerical library aims to make a reasonable tradeoff between that size cost and the speed benefit.
This paper takes that tradeoff to the least redundant extreme because that's an interesting theoretical question, at the cost of transforming commonly-used operations with simple hardware implementations (e.g. addition, multiplication) into computational nightmares. I don't think anyone has found a practical application for their result yet, but that's not the point of the work.
I actually don't think this is true -
Traditional processors, even highly dedicated ones like TMUs in gpus, still require being preconfigured substantially in order to switch between sin/cos/exp2/log2 function calls, whereas a silicon implementation of an 8-layer EML machine could do that by passing a single config byte along with the inputs. If you had a 512-wide pipeline of EML logic blocks in modern silicon (say 5nm), you could get around 1 trillion elementary function evaluations per second on 2.5ghz chip. Compare this with a 96 core zen5 server CPU with AVX-512 which can do about 50-100 billion scalar-equivalent evaluations per second across all cores only for one specific unchanging function.
Take the fastest current math processors: TMUs on a modern gpu: it can calculate sin OR cos OR exp2 OR log2 in 1 cycle per shader unit... but that is ONLY for those elementary functions and ONLY if they don't change - changing the function being called incurs a huge cycle hit, and chaining the calculations also incurs latency hits. An EML coprocessor could do arcsinh(x² + ln(y)) in the same hardware block, with the same latency as a modern cpu can do a single FMA instruction.
So to clarify, you think that replacing every multiplication with 24 transcendental function evaluations (12 eml(x, y), each of which evaluates exp(x) and ln(y) plus the subtraction; see the paper's Fig 2) is somehow a win?
The fact that addition, subtraction, and multiplication run quickly on typical processors isn't arbitrary--those operations map well onto hardware, for roughly the same reasons that elementary school students can easily hand-calculate them. General transcendental functions are fundamentally more expensive in time, die area, and/or power, for the same reasons that elementary school students can't easily hand-calculate them. A primitive where all arithmetic (including addition, subtraction, or negation) involves multiple transcendental function evaluations is not computationally faster, lower-power, lower-area, or otherwise better in any other practical way.
The comments here are filled with people who seem to be unaware of this, and it's pretty weird. Do CS programs not teach computer arithmetic anymore?
For basic arithmetic, this is not required nor would it be faster, as it is not likely advantageous for bulk static transcendal functions. Where this becomes interesting is when combining them OR when chaining them where today they must come back out to the main process for reconfiguration and then re-issued.
Practical terms: Jacobian (heavily used in weather and combustion simulation): The transcendental calls, mostly exp(-E_a/RT), are the actual clock-cycle bottleneck. The GPU's SFU computes one exp2 at a time per SM. The ALU then has to convert it (exp(x) = exp2(x × log2(e))), multiply by the pre-exponential factor, and accumulate partial derivatives. It's a long serial chain for each reaction rate.
The core of this is the Arrhenius rate, (A × T^n × exp(-E_a/(R×T))), which involves an exponentiation, a division, a multiplication, and an exponential. On a GPU, that's multiple SFU calls chained with ALU ops. In an EML tree, the whole expression compiles to a single tree that flows through the pipeline in one pass.
GPU (PreJacGPU) is currently the state of the art for speed on these simulations - a moderate width 8-depth EML machine could process a very complex Jacobian as fast as the gpu can evaluate one exp(). Even on a sub-optimal 250mhz FPGA, an entire 50x50 Jacobian would be about 3.5 microseconds vs 50 microseconds PER Jacobian on an A100.
If you put that same logic path into an ASIC, you'd be about 20x the fPGA's speed - in the nanoseconds per round. And this is not like you're building one function into an ASIC it's general purpose. You just feed it a compiled tree configuration and run your data through it.
For anything like linear algebra math, which is also used here, you'd delegate that to the dedicated math functions on the processor - it wouldn't make sense to do those in this.
> The core of this is the Arrhenius rate, (A × T^n × exp(-E_a/(R×T))), which involves an exponentiation, a division, a multiplication, and an exponential. On a GPU, that's multiple SFU calls chained with ALU ops. In an EML tree, the whole expression compiles to a single tree that flows through the pipeline in one pass.
I think you're missing the reason why the GPU kicks you out of the fast path when you need that special function. The special function evaluation is fundamentally more expensive in energy, whether that cost is paid in area or time. Evaluation of the special functions with throughput similar to the arithmetic throughput would require much more area for the special functions, which for most computation isn't a good tradeoff. That's why the GPU's designers chose to make your exp2 slow.
Replacing everything with dozens of cascaded special functions makes everything uniform, but it's uniformly much worse. I feel like you're assuming that by parallelizing your "EML tree" in dedicated hardware that problem goes away; but area isn't free in either dollars or power, so it doesn't.
There is a huge market for "its faster" at the cost of efficiency, but I don't think your claim that an EML hardware block would be inherently less inefficient than the same workload running on a GPU. If you think it would be, back it up with some numbers.
A 10-stage EML pipeline would be about the size of an avx-512 instruction block on a modern CPU, in the realm of ~0.1mm2 on a 5nm process node (collectively including the FMA units behind it), at it's entirety about 1% of the CPU die. None of this suggests that even a ~500 wide 10-stage EML pipeline would be consuming anywhere near the power of a modern datacenter GPU (which wastes a lot of it's energy moving things from memory to ALU to shader core...).
Not sure if you're arguing from a hypothetical position or practical one but you seem to be narrowing your argument to "well for simple math it's less efficient" but that's not the argument being made at all.
> you seem to be narrowing your argument to "well for simple math it's less efficient" but that's not the argument being made at all.
What? Unless the thing you want to compute happens to be exactly that eml() function (no multiplication, no addition, no subtraction unless it's an exponential minus a log, etc.) or almost so, it is unquestionably less efficient. If you believe otherwise, then please provide the eml() implementation of a practically useful function of your choice (e.g. that Arrhenius rate). Then we can count the superfluous transcendental function evaluations vs. a conventional implementation, and try to understand what benefit could outweigh them.
> A 10-stage EML pipeline would be about the size of an avx-512 instruction block on a modern CPU
Can you explain where you got that conclusion? And what do you think a "10-stage EML pipeline" would be useful for? Remember that the multiply embedded in your Arrhenius rate is already 8 layers and 12 operations.
Also, can you confirm whether you're working with an LLM here? You're making a lot of unsupported and oddly specific claims that don't make sense to me, and I'm trying to understand where they're coming from.
> eml(x,y)=exp(x)-ln(y)
Exp and ln, isn't the operation its own inverse depending on the parameter? What a neat find.
> isn't the operation its own inverse depending on the parameter?
This is a function from ℝ² to ℝ. It can't be its own inverse; what would that mean?
eml(1,eml(x,1)) = eml(eml(1,x),1) = exp(ln(x)) = ln(exp(x)) = x
But f(x) = eml(1, x) and g(x) = eml(x, 1) are different operations. What operation are you saying is supposed to be its own inverse?
eml(1,eml(x,1)) = e + x
and
eml(eml(1,x),1) = e^e * x
Okay, I’m tired. Not quite inverse but per the title , must be a way.
I was mistaken above in the first identity, it is
eml(1,eml(x,1)) = e - x
Which then if you iterate gives x (ie is inverse of itself).
eml(1,eml(eml(1,eml(x,1)),1)) = x
It's a kind of superposition representation a la Kolmogorov-Arnold, a learnable functional basis for elementary functions g(x,y)=f(x) - f^{-1}(y) in this sense with f=exp.
eml(x, y) pronounced... "email"?
Interesting!
One thing I wonder now: NAND is symmetric while this isn't, could something similar be found where function(x, y) = function(y, x)?
Very nice, though I'm not found of the name.
What comes to my mind as an alternative which I would subjectivity finer is "axe". Think axiom or axiology.
Anyone with other suggestions? Or even remarks on this one?
i think eml is fine, names should be connected to the thing they represent so 'exponential minus log' makes sense to me
Gust and color are hard to conciliate.
On my side I like direct sementic connections, but find convoluted indirections conflated through lazy sigles strongly repulsive. I can appreciate an acronym that make and the direct connection and playful indirect reference to expanded terms.
Is the the same as saying everything can be made from nand gates?
With NAND gates you can make any discrete system, but you can only approximate a continuous system.
This work is about continuous systems, even if the reduction of many kinds of functions to compositions of a single kind of function is analogous to the reductions of logic functions to composing a few kinds or a single kind of logic functions.
I actually do not value the fact that the logic functions can be expressed using only NAND. For understanding logic functions it is much more important to understand that they can be expressed using either only AND and NOT, or only OR and NOT, or only XOR and AND (i.e. addition and multiplication modulo 2).
Using just NAND or just NOR is a trick that does not provide any useful extra information. There are many things, including classes of mathematical functions or instruction sets for computers, which can be implemented using a small number of really independent primitives.
In most or all such cases, after you arrive to the small set of maximally simple and independent primitives, you can reduce them to only one primitive.
However that one primitive is not a simpler primitive, but it is a more complex one, which can do everything that the maximally simple primitives can do, and it can recreate those primitives by composition with itself.
Because of its higher complexity, it does not actually simplify anything. Moreover, generating the simpler primitives by composing the more complex primitive with itself leads to redundancies, if implemented thus in hardware.
This is a nice trick, but like I have said, it does not improve in any way the understanding of that domain or its practical implementations in comparison with thinking in terms of the multiple simpler primitives.
For instance, one could take a CMOS NAND gate as the basis for implementing a digital circuit, but you understand better how the CMOS logic actually works when you understand that actually the AND function and the NOT function are localized in distinct parts of that CMOS NAND gate. This understanding is necessary when you have to design other gates for a gate library, because even if using a single kind of gate is possible, the performance of this approach is quite sub-optimal, so you almost always you have to design separately, e.g. a XOR gate, instead of making it from NAND gates or NOR gates.
In CMOS logic, NAND gates and NOR gates happen to be the simplest gates that can restore at output the same logic levels that are used at input. This confuses some people to think that they are the simplest CMOS gates, but they are not the simplest gates when you remove the constraint of restoring the logic levels. This is why you can make more complex logic gates, e.g. XOR gates or AND-OR-INVERT gates, which are simpler than they would be if you made them from distinct NAND gates or NOR gates.
I’d be really interested in an analysis of tau in light of this discovery. Would tau fit more naturally here than pi, as it does in other examples?
The construction so far uses ln(-1) to get to pi - so far no easy way to tau.
So, to follow up I'd say in the context of EML I think ln(-1) = pi*i is the natural fit. Pi and Tau are both more advanced constructs.
Got curious to see whether SymPy could be used to evaluate the expressions, so I used Claude Code to build a quick evaluator. Numeric and symbolic results appear to agree:
“ Elementary functions, for many students epitomized by the dreaded sine and cosine, ” dreaded?
I hope this was presented at SIGBOVIK.
Zero will also be handy in definitions: `0=eml(1,eml(eml(1,1),1))`.
And i is obviously `sqrt(-1)`
Looks like he bruteforced all combinations of two mathematical operations no ?
I wonder how this combines with Richardson's Theorem.
Next step is to build an analog scientific calculator with only EML gates
Could this be used to prove e+pi is transcendental?
there ought to be a special section on HN entitled "things that will make you feel thoroughly inadequate".
How does one actually add with this?
Well, once you've derived unary exp and ln you can get subtraction, which then gets you unary negation and you have addition.
And then by using the fact that the exponential turns addition into multiplication, you get multiplication (and subtraction gives division).
Don't know adding, but multiplication has diagram on the last page of the PDF.
xy = eml(eml(1, eml(eml(eml(eml(1, eml(eml(1, eml(1, x)), 1)), eml(1, eml(eml(1, eml(y, 1)), 1))), 1), 1)), 1)
From Table 4, I think addition is slightly more complicated?
x+y = ln(exp(x) * exp(y))
exp(a) = eml(a, 1) ln(a)=eml(1,eml(eml(1,a),1))
Plugging those in is an excercise to the reader
might need to turn the paper sideways
You use multiplication in the first line, which you have not expressed through eml yet.
Because of how exp and log turn addition into multiplication and vice versa, once you have the one, you get the other easily.
Thanks for posting that. You had a transcribing typo which was corrected in the ECMAScript below. Here's the calculation for 5 x 7:
> 35.00000000000001
For larger or negative inputs you get a NaN because ECMAScript has limited precision and doesn't handle imaginary numbers.
> For larger or negative inputs you get a NaN because ECMAScript has limited precision and doesn't handle imaginary numbers.
This also shows why EML is not practical for computation.
It's basically using the "-" embedded in the definition of the eml operator.
Table 4 shows the "size" of the operators when fully expanded to "eml" applications, which is quite large for +, -, ×, and /.
Here's one approach which agrees with the minimum sizes they present:
After you have ln and exp, you can invert their applications in the eml function
Using a subtraction-of-subtraction to get addition leads to the cost of "27" in Table 4; I'm not sure what formula leads to 19 but I'm guessing it avoids the expensive construction of 0 by using something simpler that cancels:
The problem with symbolic regression is ln(y) is undefined at 0, so you can't freely generate expressions with it. We need to guard it with something like ln(1+y*y) or ln(1+|y|) or return undefined.
The article uses extended arithmetic where ln(0) = -∞.
This like Subleq where you can run EForth and EForth itself self-bootstraps:
https://github.com/howerj/muxleq
PD: you don't need gforth to compile:
Edit muxleq.fth, add some goodies by editing these values:
Here are my settings.
Then, create a new ".dec" file (subleq program)
Now, to run EForth everytime:
add these further in muxleq.fth code to have them:
The ideal place would be just below the ': dabs ' defition.
Totally gives nerd sniping vibes:
https://xkcd.com/356/
I guess you folks don't know about iota & jot: https://en.wikipedia.org/wiki/Iota_and_Jot
The paper somehow seems to be missing the most interesting part, i.e. the optimal constructions of functions from eml in a readable format.
Here is my attempt. I think they should be optimal up to around 15 eml.nodrs, the latter might not be:
# 0
1=1
# 1
exp(x)=eml(x,1)
e-ln(x)=eml(1,x)
e=exp(1)
# 2
e-x=e-ln(exp(x))
# 3
0=e-e
ln(x)=e-(e-ln(x))
exp(x)-exp(y)=eml(x,exp(exp(y)))
# 4
id(x)=e-(e-x)
inf=e-ln(0)
x-ln(y)=eml(ln(x),y)
# 5
x-y=x-ln(exp(y))
-inf=e-ln(inf)
# 6
-ln(x)=eml(-inf,x)
ln(ln(x))=ln(ln(x))
# 7
-x=-ln(exp(x))
-1=-1
x^-1=exp(-ln(x))
ln(x)+ln(y)=e-((e-ln(x))-ln(y))
ln(x)-ln(y)=ln(x)-ln(y) # using x - ln(y)
# 8
xy=exp(ln(x)+ln(y))
x/y=exp(ln(x)-ln(y))
# 9
x + y = ln(exp(x))+ln(exp(y))
2 = 1+1
# 10
ipi = ln(-1)
# 13
-i</i>pi=-ln(-1)
x^y = exp(ln(x)y)
# 16
1/2 = 2^-1
# 17
x/2 = x/2
x</i>2 = x2
# 20
ln(sqrt(x)) = ln(x)/2
# 21
sqrt(x) = exp(ln(sqrt(x)))
# 25
sqrt(xy) = exp((ln(x)+ln(y))/2)
# 27
ln(i)=ln(sqrt(-1))
# 28
i = sqrt(-1)
-pi^2 = (i</i>pi)(ipi)
# 31
pi^2 = (ipi)(-ipi)
# 37
exp(x</i>i)=exp(xi)
# 44
exp(-x</i>i)=exp(-(xi))
# 46
pi = (i</i>pi)/i
# 90+x?
2cos(x)=exp(xi)+exp(-xi))
# 107+x?
cos(x) = (2cos(x))/2
# 118+x?
2sin(x)=(exp(x*i)-exp(-xi))/i # using exp(x)-exp(y)
# 145+x?
sin(x) = (2sin(x))/2
# 217+3x?
tan(x) = 2sin(x)/(2cos(x))
I don't mean to shit on their interesting result, but exp or ln are not really that elementary themselves... it's still an interesting result, but there's a reason that all approximations are done using series of polynomials (taylor expansion).
> but there's a reason that all approximations are done using series of polynomials (taylor expansion).
"All" is a tall claim. Have a look at https://perso.ens-lyon.fr/jean-michel.muller/FP5.pdf for example. Jump to slide 18:
> Forget about Taylor series
> Taylor series are local best approximations: they cannot compete on a whole interval.
There is no need to worry about "sh-tt-ng" on their result when there is so much to learn about other approximation techniques.
Padé approximations are not discussed as much, but they are much more stable than Taylor series approximations.
Sorry, re-reading this, I should have said "most". As the other reply mentions, Pade approx. are also well liked for numerical methods.
I personally mostly do my everyday work using taylor expansion (mostly explicit numerical methods in comp. EM because they're cheaper these days and it's simpler to write down) so it's what first comes to mind.
A quick meta-take here: it is hard to assess the level of expertise here on HN. Some might be just tangentially interested, other might have degrees in the specific topic. Others might maintain a scientific computing library. Domains vary too: embedded systems, robotics, spacecraft navigation, materials modeling, or physics simulation. Until/unless people step up and fill the gaps somehow, we have little notion of identity nor credentialing, for better and for worse.*
So it really helps when people explain (1) their context** and (2) their reasoning. Communicating well is harder than people think. Many comments are read by hundreds or more (thousands?) of people, most of whom probably have no idea who we are, what we know, or what we do with our brains on a regular basis. It is generous and considerate to other people to slow down and really explain where we're coming from.
So, when I read "most people use Taylor approximations"...
1. my first question is "on what basis can someone say this?"
2. for what domains might this somewhat true? False?
3. but the bigger problem is that claims like the above don't teach. i.e. When do Taylor series methods fall short? Why? When are the other approaches more useful?
Here's my quick take... Taylor expansions tends to work well when you are close to the expansion point and the function is analytic. Taylor expansions work less well when these assumptions don't hold. More broadly they don't tend to give uniform accuracy across a range. So Taylor approximations are usually only local. Other methods (Padé, minimax, etc) are worth reaching for when other constraints matter.
* I think this is a huge area we're going to need to work on in the age where anyone can sound like an expert.
** In the case above, does "comp. EM" mean "computational electromagnetics" or something else? The paper talks about "EML" so it makes me wonder if "EM" is a typo. All of these ambiguities add up and make it hard for people to understand each other.
I do computational electromagnetism, specifically plasma simulation. In the field solver and particle pushers (I mainly do explicit codes meaning we just approximate the derivatives numerically) we only do taylor expansion so that the derivatives are essentially second order accurate. We don't bother going further, although I can, because in my domain, being more "accurate" as a function of step size (dx in approximating f(x)->f(x+dx)) yields less of a profit vs just decreasing step sizes and grid sizes (or increasing resolution), and even then, the numerical accuracy pales in comparison to say setting up the physical problem wrong (the focus of a simulated laser pulse being ten wavelengths out of focus).
Replying to some of your questions (1 and 2), this is from the perspective of a computational scientist, and a less theoretical type who works closely with experimentalists. This I am closer to a user of codes to model experiments than I am to someone who does a lot of analytic or fundamental theory, although my experience and perspective is probably close to others who are computational-ish in other domains like engineering, for the reasons I'll explain below.
For 3, most physically useful simulations that are not merely theoretical exercises (that is, simulations that are more predictive or explanative of actual experiments scientists want to do) will not consist of analytic functions you can write down. First, say that I suppose initial conditions in a problem has an aspect that is analytic (me setting my laser profile as a gaussian pulse), once the interaction with a plasma target occurs, the result you obtain (and thus predictions a simulation will make that can be compared to experiment) will not be gaussian but will evolve due to the complex physics modeled in the simulation. And a gaussian as an initial condition is already an approximation to an actual experiment. An "easy sim" for me is doing a best fit to the waist from a profile they'll read from a power meter, and using a gaussian that closely matches that, while a more realistic simulation would be me directly taking that data they have on an excel sheet and feeding that into the simulation directly as an initial condition. In most real world scenarios, most ICs already aren't analytic and must be solved numerically. By the way, this isn't that different for how engineers use computational codes too. Not many airplane wings are spheres or cylinders, you'd likely have to import the design for a wing from a CAD file into say an aerodynamics fluid code.
So in all these cases, the bottleneck isn't really approximating analytic functions you can write down either in closed form or even in series form as to the nth degree. Many people in the computational domain do not need accuracy beyond two or three terms in a taylor series. This is because it is usually easier to just cut down dx and do more steps in total rather than using a large dx and requiring more terms...and this before using any more sophisticated approximations. No code I know uses Pade approximations, I just know that some libraries for special functions (that may be one or two function calls exist in a code I use) use them.
Also, just a quick example you can try. Let's look at exp, for small argument (this only really works for small argument, you obviously can't do taylor expansion well for large argument). Consider the following:
>>> np.exp(0.4231)
np.float64(1.5266869570289792)
I will see how many terms I need to get 4 digits of accuracy (note that I had four digits in my input, so even ignoring sig figs, I probably shouldn't expect better than 4 digits for a result, note that numpy itself is numerical too so shouldn't be considered exact although i'll trust the first four digits here too)
>>> x = 0.4231
>>> 1
1
>>> 1 + x
1.4231
>>> 1 + x + x**2/2
1.512606805
>>> 1 + x + x**2/2 + x**3/(3*2)
1.5252302480651667
>>> 1 + x + x**2/2 + x**3/(3*2) + x**4/24
1.5265654927553847
Note that by 3 terms (x**3) I'm already off in the last digit by 1, by 4 terms, it's converged enough. Given a fp register, you can reuse powers of x from the last term, this is already dangerously cheap, why do you even need better than this in a practical sense? Most results I see in the wild do not even reach requiring this level of precision in a single simulation. I am a scientist however, it's possible engineers need more precision.
For us, more sophisticated methods on the "calculating transcendental functions" level are not really required, which is why they don't appear in codes I usually see. What we need better are things that make the actual elemental operations, like fma and the like, faster. Things like avx512 are far more interesting to me, for example.
Thanks for sharing that! Carry on :)
Elementary function is a technical term that this paper uses correctly, not a generic prescription of simplicity.
See https://en.wikipedia.org/wiki/Elementary_function.
Then this is a good math paper. Everyone asking for "elm gates" (if such a thing even is possible or efficient) ought to relax a bit.
In numerical analysis, elementary function membership, like special function membership, is ambiguous. In many circumstances, it’s entirely reasonable to describe the natural logarithm as a special function.
Whoa, this is huge!
My dearest congrats to the author in case s/he shows around this site ^^.
Judging by the title, I thought I would have a good laugh, like when the doctor discovered numerical integration and published a paper.
But no...
This is about continuous math, not ones and zeroes. Assuming peer review proves it out, this is outstanding.
I don't think this is ever making it past the editor of any journal, let alone peer review.
Elementary functions such as exponentiation, logarithms and trigonometric functions are the standard vocabulary of STEM education. Each comes with its own rules and a dedicated button on a scientific calculator;
What?
and No comparable primitive has been known for continuous mathematics: computing elementary functions such as sin, cos, √ , and log has always required multiple distinct operations. Here we show that a single binary operator
Yeah, this is done by using tables and series. His method does not actually facilitate the computation of these functions.
There is no such things as "continuous mathematics". Maybe he meant to say continuous function?
Looking at page 14, it looks like he reinvented the concept of the vector valued function or something. The whole thing is rediscovering something that already exists.
This preprint was written by a researcher at an accredited university with a PhD in physics. I'm sure they know what a vector valued function is.
The point of this paper is not to revolutionize how a scientific calculator functions overnight, its to establish a single binary operation that can reproduce the rest of the typical continuous elementary operations via repeated application, analogous to how a NAND or NOR gate creates all of the discrete logic gates. Hence, "continuous mathematics" as opposed to discrete mathematics. It seems to me you're being overly negative without solid reasoning.
its to establish a single binary operation that can reproduce the rest of the typical continuous elementary operations via repeated application,
But he didn't show this though. I skimmed the paper many times. He creates multiple branches of these trees in the last page, so it's not truly a single nested operation.
Well, it is still the case, even if not explicitly shown. Personally I think it almost boils down to school math, with some details around complex logarithms; the rest seems to be simpler.
The formulas are provided in the supplementary information file, as mentioned in the paper. https://arxiv.org/src/2603.21852v2/anc/SupplementaryInformat... You want page 9.
> so it's not truly a single nested operation.
Some of us had the wondrous epiphany as children that we could build any digital device we could dream of (yes, up to and including a full computer, CPU, RAM, peripherals, and all) out of SN7400 NAND gates that we could buy down at the local Radio Shack, if only we could scrape together enough change to buy the requisite number of parts and sockets and Tefzel wire.
Obviously, I can't speak for all of us who had that epiphany, but I strongly suspect that most of us who had that epiphany would find this result joyous.
The principal result is "all elementary functions can be represented by this function and constant 1". I'm not sure if this was known before. Applications are another matter, but I suspect interesting ones do exist.
According to Gemini 3.1 Pro this would shoot the current weather forecasting power through the roof (and math processing in general):
The plan is to use this new "structurally flawless mathematical primitive" EML (this is all beyond me, was just having some fun trying to make it cook things together) in TPUs made out of logarithmic number system circuits. EML would have DAGs to help with the exponential bloat problem. Like CERN has these tiny fast "harcode models" as an inspiration. All this would be bounded by the deductive causality of Pedro Domingoses Tensor Logic and all of this would einsum like a mf. I hope it does.
Behold, The Weather Dominator!
Congrats, you made a hallucination machine successfully hallucinate?
I understand enough for it's arguments' symmetry to have an impact. Used Deep Research, it had the paper from the link as input plus some previous discussions about Tensor Logic and the new hardcoded neuroweb - like processors. Didn't make those up either.
I hope you get help. Mental health issues are not fun.
A quote from a blog post I just got out:
"I just want to get this out of my hands in case I made the model stumble upon something important. It's reasoning seems solid but I'm no expert. Here it is, go crazy:"
https://notes2self.bearblog.dev/the-weather-dominator/
Here's my NoteboojLM podcast on the subject :-) Sick.
https://notebooklm.google.com/notebook/e0a54a54-c644-4c89-9d...