Not sure if anyone from the Malloy team is going to read this comment, but if so then please add a dedicated comparison page between equivalent SQL and Malloy codes, especially for complex queries. That will do more PR works than 100 pages of detailed introductory materials.
From technical documentations, I can see some potentials for significant simplifications on complex analytical queries but most people don't have such time or interest on new languages.
If this is what I think it is, that wouldn’t be pretty. I used looker heavily a few years ago and its generated SQL is pretty gnarly. Imagine a wall of COALESCE( COALESCE( COALESCE(…
Is it actually equivalent? I know the SQL is compiled from Malloy but I imagine the handwritten version, with comments, would look better in comparison.
I've certainly done plenty of complex sql querying in my time, but I don't think making nice-looking code for solving wordle bears much relation to normal analytical sql workflows.
Amongst many other things, the README tells me that `Malloy is immediately understandable by SQL users, and far easier to use and learn`. Having just looked at this code sample, I really don't agree with that README.
What's nest actually doing? Is it an inner join/lateral/apply? An outer join/lateral/apply? A correlated subquery? If so, what's the key? It doesn't look like it has a key, so it's equivalent to a cross join? That's hardly ever useful. Even this example query feels like it would just be better as 3 queries if it's just a cross join. I don't want every row to have a list of the exact same 20 records on it.
How does aggregate know what function to use? The examples on the aggregate page actually list aggregate functions, but everywhere else it's just... magically decided?
It supports TOP or LIMIT, but where's the windowing? Where's OFFSET 100 ROWS FETCH NEXT 25 ROWS ONLY for page 5?
What the heck does "order by: 2" mean? Even if it's an ordinal position, that's not a great example. Yes, SQL technically allows you to use ordinal positioning of columns in the ORDER BY, but it's horrible practice to use that because it's difficult to maintain. There's no way to tell what the original intent was, and the output columns change all the time when queries are reused.
My real question, however, is: "How is this actually easier?". Yes, sure, using symbols instead of English words is shorter, and a lot of programmers mistake a lower character count for being easier. However, it's not actually easier. It's just a bit less typing.
1) Nesting builds nested results (like GraphQL). This is particularly hard to do in SQL but is allows very large complex data sets to be returned in a single query.
You can pre-define calculations with 'measure:' or decalre them explicitly in a query.
> where's the windowing
It's missing with a bunch of other things (like union for example). Its coming of course. The goal is that everything represent able in SQL is represent able in Malloy.
> heck does "order by: 2" mean
Same thing as it does in SQL. We try and have reasonable defaults.
The goal here is to be able to create data models that are re-usable and compose-able and verifiable. Yeah, there is a learning curve as there is with anything powerful. There are many things expressible in Malloy that cannot be easily expressed in SQL.
I believe that declarative is typically better at letting the db compile it to something efficient while imperative is typically better when the author of the request knows exactly what they're doing.
> It compiles to SQL, so the actual execution plan is as arbitrary as the SQL this compiles to.
Yes, I'm purely talking syntax. I find the syntax of SQL to be arbitrary. Do I need a single quote or two? Do I need parentheses here? Or double parentheses? What does this bind, exactly?
I believe that there is lots to be gained by making the syntax more regular, without affecting semantics.
It almost is as if someone repurposed YAML as a query language, which I guess is doable since SQL is also a declarative language, which YAML is used for a lot as well.
But I can’t help but feel like this is basically a different way to describe the same constructs as we have in SQL. It almost feels like a query builder’s syntax.
To me, the real improvements over SQL are when they part ways with its constructs, such as seen in for example Datalog.
Thank you for making the effort the author's couldn't be bothered to.
It's horrible btw. You can't even tell if and when order in this syntax is important.
It's also not clear if it's declarative or operational syntax. But since they are compiling to SQL it has to be the first. Yet the syntax strongly misleads people into assuming a particular execution strategy.
Not even naming of things is consistent (why shorten "array" into "arr" but not minute into "min").
It just looks like random syntax choices, without a cohesive rationale made by people that think the declarative syntax of SQL looks worse than for-loops, completely ignoring that the database will not execute anything the way it is structured here.
Yeah I don't like it. And I am not trying to gate keep as SQL is one of the few things I am decent at ... it just looks clunky and way too verbose. SQL is far easier to read. To me this has the verbosity of C whereas SQL is clean like Python (non-typed python).
I share your enthusiasm for SQL. I've been using it for nearly 20 years now, and it's an important technology. But I've done a lot of SQL generation for tools that manage arbitrary tables and SQL definitely makes this hard problem virtually impossible (or at least too hard for yours truly). So I try to keep an open mind for anything that purports to solve these problems.
> it just looks clunky and way too verbose.
Surely you need to compare this with the equivalent SQL before claiming that the SQL is cleaner? What would be the equivalent SQL.
> To me this has the verbosity of C whereas SQL is clean like Python (non-typed python).
> Not even naming of things is consistent (why shorten "array" into "arr" but not minute into "min").
I don't believe it shortens "array" to "arr", does it? "Dep" and "arr" here seem to mean departure and arrival, which I'm reasonably sure is usual for the domain (flights).
You're correct although they did demonstrate just how incomprehensible this DSL is. At least with SQL it's easy to distinguish between your keywords, types and columns.
SQL is complete but ugly: everything is expressible, but nothing is reusable; simple ideas are complex to express; the language is verbose and lacks smart defaults.
It begs for examples of the ugly/verbose/complex SQL and the beautiful/succinct/simple Malloy. Looking at the samples folder did no work for me.
Having worked in a Coffeescript codebase for the past couple of years, its horrible to read. It just turns into a giant mess with multiple layers of nesting and everything just kind of mushed together. Coffeescript looks fun to play with, but having it in a large production codebase is an exercise in confusion and frustration. It lets people be clever with their code. Don't be clever.
Think of a sales orders table. Think of all the orders in there that should be considered "invalid" for some reason or another.
It would be awesome to define some logic, one time, in one place, for "invalid orders" and then have end users be able to say "from invalid orders", "stores having no invalid orders", "customers with multiple invalid orders", "top 3 reasons orders are invalid" ...
And of course "valid orders" might just be defined as "not invalid orders"
Instead this is probably multiple views, CTEs, UDFs, and still some logic in the main query to handle NULLs or joins or whatnot.
Even little things like hierarchies - I don't want to have separate sprocs for salesByCountry, salesByRegion. salesByState,and then an umbrella dynamic sproc to figure out which one to call.
It's ironic because SQL is such a human friendly declarative language that it has such poor ability to create meaningful shorthand expressions.
I've written SQL for nearly 30 years, I love it, but natural language transpilers have shown me some of its limits as an expressive tool.
It's a separate system, which has downsides. As an upside, by being a component of your system, the integrations with monitoring, alerting, quality checks, and dashboards is usually pretty easy.
Can you recommend any good implementations of the ideal state? I worry that SQL is too much to learn, but natural language is way too ambiguous to map to anything resembling logic. (The best we can do, for natural logic -> executable logic is legislative code... And the ambiguity there is famous.)
All these languages that compile to SQL seem to forget one fundamental problem, when this stuff fails (queries becomes slow etc). We are left to debug auto-generated SQL, which is a complete nightmare..
> All these languages that compile to SQL seem to forget one fundamental problem, when this stuff fails (queries becomes slow etc). We are left to debug auto-generated SQL, which is a complete nightmare..
My first inclination when hitting slow malloy (if I ever were to use it) wouldn't be to look at the auto-generated SQL, but to analyze a query plan directly, much like I would be doing for a slow SQL query.
The auto-generated SQL from Microsoft's DAX Language for Analysis Services and PowerBi is not bad. I think it comes down to how well developed the expression engine converts to SQL. Microsoft though has had many years fine tuning their expression engine (Vertipaq). It may take a while to develop a competant engine.
I am not really sure we need a 'better SQL' - SQL has been doing what SQL does in pretty much the same way for 50+/- years - if you work with relational data, learn SQL.
Very few technologies have had the staying power that SQL still holds.
Is that a good argument? C also does what it does for 50+- years. If you work with code, learn C.
SQL is a syntax for queries, and it is just bad at some sides. Global-only scoping, issues with common expressions, excessive roundtrips, inability to fetch hierarchical data effectively. Even syntactically it is an IDETIFICATION DIVISION-style spells, whose structure is hard to feel.
This tool is just another way to write those very same SQL queries - I have yet to see any tool that purports to write the SQL for you, output SQL that is better than someone that actually understands relational data and sql can do - it is usually not even close.
This isn't a new language, its a layer on top of a language that already works, and works well.
SQL is my least disliked technology of all the ones I use in anger. And reading one solid SQL book has served me well for 10+ years now. The only ask I would want is better errors. But even without it, I rarely spend much time debugging SQL or trying to figure out what Clever Programmer intended their code to do.
I think the SQL DML is far from optimal but it mostly works, even though it is quite difficult to use for some cases. It will be with us for a long time.
SQL DDL, on the other hand, is broken as a standard because it assumes implementation details of the database engine that were widely true when it SQL was designed but are nonsensical or increasingly divergent in modern database engine architectures e.g. the many tacit assumptions about index properties.
After having gone over the README, I still have no idea what a Malloy expression looks like.
The majority of it is taken up telling me what it is and how to install and configure their VS Code extension, but not a single “ah-ha that seems interesting” example to entice me to try it.
"Currently, all joins in Malloy are left outer joins, and must be between the primary key of one table and a foreign key in the explore being joined to."
>Malloy reads the schema so you don’t need to model everything. Malloy allows creation of re-usable metrics and logic, but there’s no need for boilerplate code that doesn’t add anything new.
That is the one thing that I miss in SQL. I find it extremely wasteful (developer hours) that every SQL query needs to redefine the whole relational model from scratch. The RDBM knows the model, it should use that knowledge to automate the necessary joins.
I get this, but I prefer being explicit over less typing. I have implemented schemas in the past involving multiple routes between tables, and self-joins, and getting an engine to work all that out would be a problem. Easier to just do more typing.
SQL is more like asm. It's the language of the databases.
In my opinion, you'll have to create a language that compiles to SQL that is better than handwritten SQL and is easier to write, just like writing in C is easier and gcc can create better asm than you.
The problem is that ASM_SQL is easy to learn and write and has a lot of features, so that this new C_SQL must be even easier to learn and write and/or a lot more powerful.
It's a high bar to overcome and reordering and renaming SQL things is not going to make it.
SQL is closer to C in this analogy. Most relational databases engines don't actually run SQL. They have a native internal programming language (closer to assembly code) that is much cleaner and sometimes more expressive, which SQL is translated into. This internal language is then either interpreted or JIT-ed into machine code. In rare cases, the internal language even has a nice expressive and composable text syntax that programmers can write directly. This makes everything easier and more flexible from perspective of people writing the database engine, even if the intent is to only ever run SQL queries at the public interface.
Most relational databases mostly don't run on SQL anymore, that is a legacy interface that is exported for compatibility. Unfortunately, that is still the only vaguely portable programming interface and everyone knows it, so that's what we use.
In my book the only thing a better sql needs is to start selects with the from clause. -‘from orders select orderid’ Which would make for better autocomplete among other things.
I lean towards this idea but it might make non trivially joined queries harder to understand, since it shifts the select to the middle or the end of the query.
So you change:
1. What I want
2. Which tables
3. Which filter
4. What grouping
Putting what you want further down the list might be more confusing rather than less.
IMO it also needs to default destructive statements (update, delete) to one row limit, and you have to explicitly choose to override that.
In the last 10 years of my career, many a dev & prod databases have been accidentally harmed by an overzealous query.
This could also add a warning for "left-over" non-updated rows, for example if the developer signals that only one row should be updated, but the WHERE clause has many rows, its usually a sign of a missing unique key, something I've seen before as well.
100% on restructuring to improve tooling. discoverability is the big thing we lack. from would be a little more verbose than necessary but otherwise yes!
That entire README and not a single line of Malloy to look at? And no comparison to existing SQL? I'd love to give this a try but the wall of text could be/should be replaced with some simple side by side examples and brief descriptions of why Malloy is "better" than SQL.
Not a big fan of SQL dsl's personally. If you feel you'd need a better query syntax and plotting for interactive use, I'd suggest looking at the dataframe libraries that have a larger community to discover how to solve things and have ironed out issues.
Judging by the examples it seems different but equivalent. It certainly doesn't seem obviously better. Personally it's not compelling enough for me to want to try it out.
Two issues: First, to be a modeling language, tools have to be able to discover entities and relationships by reading a model definition. Second, to be successful as a modeling language, users need to be able to author models easily. If it has to be hand-coded it will not be successful.
For the first issue, this language has a non-standard syntax. It doesn't use JSON or XML. So either I have to write a parser on my own, or Google has to provide parsers in every language. Also, models are expressed as "explores" [1] where they define the fields available, and the query to use to get the data. But that's not what I expect from a modeling language. The purpose of modeling languages is to enable users to write new ad-hoc queries on the fly. If the queries have to be pre-written into an "explore" then this is not really a modeling language, it is just a query language.
I don't see this taking off as a modeling language.
Looks like one of the benefits is the ability to query nested data sets. When i need that i usually do that by generating a single json response in the query using row_to_json, but that's not really standard sql, and requires an additional deserialization step. I wonder what sql statement they generate to support nested sets.
I work in a relational lang as hobby. I have seen as many sql replacements as I can, and most commit some or many problems. The biggest one, IMHO, is they stay as constrained to be "just a query lang" removing any hope to be useful at large.
for p in products ?where #price > 0.0 do
print(p)
end
let top10 = products ?limit 10
let positive = ?where #price > 0.0
for p in top10 ?positive do
print(p)
end
BTW, this is not new. This is more or less how was with FoxPro (with more verbose syntax, but same feel).
Instead this, like many other attempt, make "relational language" be embarrassed to be an ACTUAL programming language. And it looks more like a clone of GraphQL than of SQL, btw...
I agree with the other commenters, I don't see much from the query language itself. However, the plotting integration is very interesting for exploratory-type analyses or quickly figuring out what's going on with your data in a pinch, I found myself wishing for something like that on many occasions.
As far as the "next SQL" goes, my hunch is that we should learn our lesson from the various ETL tools that became popular in the last two decades and base it on infinitary stream algebras, it seems like it's the correct computational model to deal with generic, potentially non-uniform data. (I'm afraid I don't have the chops to argue the idea further, though --)
I think 2 things could make this awesome:
- built-in testing: testability for SQL still hasn’t been solved but it feels like Malloy could do it
- compile to Dataflow
I am getting Data Analysis Expressions ("DAX") vibes. Feels like what it tries to do is similar to the DAX language used by PowerBi and Analysis Services for Direct Query but expressed differently
I am honestly having a hard time reading their examples versus DAX.
Something missing that DAX has is a selected values feature. I imagine looker might have something similar?
Looks nicer but not sure if it brings enough functional benefit to overcome the lingua franca of data processing. Time will tell but if they can get a few major database engines to support it natively, maybe it can take off. I am somewhat skeptical though.
Not sure if anyone from the Malloy team is going to read this comment, but if so then please add a dedicated comparison page between equivalent SQL and Malloy codes, especially for complex queries. That will do more PR works than 100 pages of detailed introductory materials.
From technical documentations, I can see some potentials for significant simplifications on complex analytical queries but most people don't have such time or interest on new languages.
If this is what I think it is, that wouldn’t be pretty. I used looker heavily a few years ago and its generated SQL is pretty gnarly. Imagine a wall of COALESCE( COALESCE( COALESCE(…
Thanks, that is reasonable feedback. We'll work on updating the documentation.
Here is an example of Malloy solving today's worlde in 50 lines of Malloy
https://looker-open-source.github.io/malloy/documentation/ex...
And here is the equivalent SQL.
https://gist.github.com/lloydtabb/32f46e7ecbb2da1a443d1adbe9...
Is it actually equivalent? I know the SQL is compiled from Malloy but I imagine the handwritten version, with comments, would look better in comparison.
I've certainly done plenty of complex sql querying in my time, but I don't think making nice-looking code for solving wordle bears much relation to normal analytical sql workflows.
Example looks like this:
https://looker-open-source.github.io/malloy/documentation/ex...
Amongst many other things, the README tells me that `Malloy is immediately understandable by SQL users, and far easier to use and learn`. Having just looked at this code sample, I really don't agree with that README.
That's why they tell rather than show.
Which part of it you couldn’t understand?
I'll bite. What is that code example actually returning? I don't have a clue what fields I expect to get back from that.
I’d expect
And so on.
Scroll to the bottom of the page. It shows the returned JSON
https://looker-open-source.github.io/malloy/documentation/pa...
Here is the SQL we generate.
https://gist.github.com/lloydtabb/8c144d2dac978dda9bf3ec4d6b...
What's nest actually doing? Is it an inner join/lateral/apply? An outer join/lateral/apply? A correlated subquery? If so, what's the key? It doesn't look like it has a key, so it's equivalent to a cross join? That's hardly ever useful. Even this example query feels like it would just be better as 3 queries if it's just a cross join. I don't want every row to have a list of the exact same 20 records on it.
How does aggregate know what function to use? The examples on the aggregate page actually list aggregate functions, but everywhere else it's just... magically decided?
It supports TOP or LIMIT, but where's the windowing? Where's OFFSET 100 ROWS FETCH NEXT 25 ROWS ONLY for page 5?
What the heck does "order by: 2" mean? Even if it's an ordinal position, that's not a great example. Yes, SQL technically allows you to use ordinal positioning of columns in the ORDER BY, but it's horrible practice to use that because it's difficult to maintain. There's no way to tell what the original intent was, and the output columns change all the time when queries are reused.
My real question, however, is: "How is this actually easier?". Yes, sure, using symbols instead of English words is shorter, and a lot of programmers mistake a lower character count for being easier. However, it's not actually easier. It's just a bit less typing.
> What's nest actually doing?
1) Nesting builds nested results (like GraphQL). This is particularly hard to do in SQL but is allows very large complex data sets to be returned in a single query.
https://looker-open-source.github.io/malloy/documentation/la...
For example the dashboard on the page below is a single SQL query:
https://looker-open-source.github.io/malloy/documentation/ex...
> How does aggregate know what function to use?
You can pre-define calculations with 'measure:' or decalre them explicitly in a query.
> where's the windowing
It's missing with a bunch of other things (like union for example). Its coming of course. The goal is that everything represent able in SQL is represent able in Malloy.
> heck does "order by: 2" mean
Same thing as it does in SQL. We try and have reasonable defaults.
https://looker-open-source.github.io/malloy/documentation/la...
> "How is this actually easier?"
The goal here is to be able to create data models that are re-usable and compose-able and verifiable. Yeah, there is a learning curve as there is with anything powerful. There are many things expressible in Malloy that cannot be easily expressed in SQL.
Ouch, this syntax isn't a looker.
Well, this syntax certainly looks less arbitrary than SQL.
Feels that it's worth giving a try.
It compiles to SQL, so the actual execution plan is as arbitrary as the SQL this compiles to.
I love database where queries and commands are not declarative but imperative. Like RethinkDB or Mongo.
Databases with this type of syntax.
But to use this syntax for an SQL database is a horrible mistake, because it is misleading.
I believe that declarative is typically better at letting the db compile it to something efficient while imperative is typically better when the author of the request knows exactly what they're doing.
> It compiles to SQL, so the actual execution plan is as arbitrary as the SQL this compiles to.
Yes, I'm purely talking syntax. I find the syntax of SQL to be arbitrary. Do I need a single quote or two? Do I need parentheses here? Or double parentheses? What does this bind, exactly?
I believe that there is lots to be gained by making the syntax more regular, without affecting semantics.
It almost is as if someone repurposed YAML as a query language, which I guess is doable since SQL is also a declarative language, which YAML is used for a lot as well.
But I can’t help but feel like this is basically a different way to describe the same constructs as we have in SQL. It almost feels like a query builder’s syntax.
To me, the real improvements over SQL are when they part ways with its constructs, such as seen in for example Datalog.
I get a headache just looking at this. I don't think the SQL version would look any worse either.
I don't think there is a good way to return nested JSON using SQL, which this example does.
I think postgres' json syntax in the latest version isn't too bad for these things.
It would be nice to see how this query could look in Postgres.
Postgres has excellent syntax for dealing with JSON
https://www.postgresql.org/docs/14/functions-json.html
Thank you for making the effort the author's couldn't be bothered to.
It's horrible btw. You can't even tell if and when order in this syntax is important.
It's also not clear if it's declarative or operational syntax. But since they are compiling to SQL it has to be the first. Yet the syntax strongly misleads people into assuming a particular execution strategy.
Not even naming of things is consistent (why shorten "array" into "arr" but not minute into "min").
It just looks like random syntax choices, without a cohesive rationale made by people that think the declarative syntax of SQL looks worse than for-loops, completely ignoring that the database will not execute anything the way it is structured here.
Yeah I don't like it. And I am not trying to gate keep as SQL is one of the few things I am decent at ... it just looks clunky and way too verbose. SQL is far easier to read. To me this has the verbosity of C whereas SQL is clean like Python (non-typed python).
I share your enthusiasm for SQL. I've been using it for nearly 20 years now, and it's an important technology. But I've done a lot of SQL generation for tools that manage arbitrary tables and SQL definitely makes this hard problem virtually impossible (or at least too hard for yours truly). So I try to keep an open mind for anything that purports to solve these problems.
> it just looks clunky and way too verbose.
Surely you need to compare this with the equivalent SQL before claiming that the SQL is cleaner? What would be the equivalent SQL.
> To me this has the verbosity of C whereas SQL is clean like Python (non-typed python).
Is this just a reaction to curly braces?
> Not even naming of things is consistent (why shorten "array" into "arr" but not minute into "min").
I don't believe it shortens "array" to "arr", does it? "Dep" and "arr" here seem to mean departure and arrival, which I'm reasonably sure is usual for the domain (flights).
You're correct although they did demonstrate just how incomprehensible this DSL is. At least with SQL it's easy to distinguish between your keywords, types and columns.
Is it easy with SQL? Or are you just used to it?
It seems pretty straightforward to see what's a keyword vs not tbh
On the documentation site, they have syntax highlighting to help distinguish keywords
> not minute into "min"
That’s because MIN() is minimum?
This just looks like someone wanted to build something new regardless of whether it was better or not.
SQL is complete but ugly: everything is expressible, but nothing is reusable; simple ideas are complex to express; the language is verbose and lacks smart defaults.
It begs for examples of the ugly/verbose/complex SQL and the beautiful/succinct/simple Malloy. Looking at the samples folder did no work for me.
Agreed. I'm not a big fan but the Coffescript docs do this beautifully. https://coffeescript.org/#overview
Having worked in a Coffeescript codebase for the past couple of years, its horrible to read. It just turns into a giant mess with multiple layers of nesting and everything just kind of mushed together. Coffeescript looks fun to play with, but having it in a large production codebase is an exercise in confusion and frustration. It lets people be clever with their code. Don't be clever.
I think they were talking about the comparison, not Coffeescript itself.
Indeed, and i'm not buying the "nothing is reusable". Sql views are in my opinion the opposite, as they make every query reusable.
the downside with views it's that they are persistent so you can't reuse it only in the context of one connection
you can use CTEs but only with one query
> persistent so you can't reuse it only in the context of one connection
Don't e.g. Postgres temporary views cover this case?
"Temporary views are automatically dropped at the end of the current session."
¯\_(ツ)_/¯
I expect vendor specific stuff called something along the lines of temporary view would do this, yes
Unlike functions, views don't have arguments you can use to change the output.
(Edited for clarity.)
Yes they do, you can use them as the source for any sql query, so they have the exact same argument options as tables.
Without speaking to the merits of Malloy
Think of a sales orders table. Think of all the orders in there that should be considered "invalid" for some reason or another.
It would be awesome to define some logic, one time, in one place, for "invalid orders" and then have end users be able to say "from invalid orders", "stores having no invalid orders", "customers with multiple invalid orders", "top 3 reasons orders are invalid" ...
And of course "valid orders" might just be defined as "not invalid orders"
Instead this is probably multiple views, CTEs, UDFs, and still some logic in the main query to handle NULLs or joins or whatnot.
Even little things like hierarchies - I don't want to have separate sprocs for salesByCountry, salesByRegion. salesByState,and then an umbrella dynamic sproc to figure out which one to call.
It's ironic because SQL is such a human friendly declarative language that it has such poor ability to create meaningful shorthand expressions.
I've written SQL for nearly 30 years, I love it, but natural language transpilers have shown me some of its limits as an expressive tool.
This is the use case for a metric store. https://blog.transform.co/history-of-the-metrics-store/
It's a separate system, which has downsides. As an upside, by being a component of your system, the integrations with monitoring, alerting, quality checks, and dashboards is usually pretty easy.
Yeah I don't believe in a separate metrics store (or feature store - and I sell one!)
It should all be in one SQL-like DSL with a natural language transpiler on top.
Can you recommend any good implementations of the ideal state? I worry that SQL is too much to learn, but natural language is way too ambiguous to map to anything resembling logic. (The best we can do, for natural logic -> executable logic is legislative code... And the ambiguity there is famous.)
> SQL is complete but ugly
All these languages that compile to SQL seem to forget one fundamental problem, when this stuff fails (queries becomes slow etc). We are left to debug auto-generated SQL, which is a complete nightmare..
> All these languages that compile to SQL seem to forget one fundamental problem, when this stuff fails (queries becomes slow etc). We are left to debug auto-generated SQL, which is a complete nightmare..
My first inclination when hitting slow malloy (if I ever were to use it) wouldn't be to look at the auto-generated SQL, but to analyze a query plan directly, much like I would be doing for a slow SQL query.
> but to analyze a query plan directly
What query plan? The one belonging to the auto-generated SQL? Then fiddle with Malloy in hopes of it writing better SQL to avoid the bottlenecks?
The auto-generated SQL from Microsoft's DAX Language for Analysis Services and PowerBi is not bad. I think it comes down to how well developed the expression engine converts to SQL. Microsoft though has had many years fine tuning their expression engine (Vertipaq). It may take a while to develop a competant engine.
I am not really sure we need a 'better SQL' - SQL has been doing what SQL does in pretty much the same way for 50+/- years - if you work with relational data, learn SQL.
Very few technologies have had the staying power that SQL still holds.
Is that a good argument? C also does what it does for 50+- years. If you work with code, learn C.
SQL is a syntax for queries, and it is just bad at some sides. Global-only scoping, issues with common expressions, excessive roundtrips, inability to fetch hierarchical data effectively. Even syntactically it is an IDETIFICATION DIVISION-style spells, whose structure is hard to feel.
This tool is just another way to write those very same SQL queries - I have yet to see any tool that purports to write the SQL for you, output SQL that is better than someone that actually understands relational data and sql can do - it is usually not even close.
This isn't a new language, its a layer on top of a language that already works, and works well.
SQL is my least disliked technology of all the ones I use in anger. And reading one solid SQL book has served me well for 10+ years now. The only ask I would want is better errors. But even without it, I rarely spend much time debugging SQL or trying to figure out what Clever Programmer intended their code to do.
I think the SQL DML is far from optimal but it mostly works, even though it is quite difficult to use for some cases. It will be with us for a long time.
SQL DDL, on the other hand, is broken as a standard because it assumes implementation details of the database engine that were widely true when it SQL was designed but are nonsensical or increasingly divergent in modern database engine architectures e.g. the many tacit assumptions about index properties.
After having gone over the README, I still have no idea what a Malloy expression looks like.
The majority of it is taken up telling me what it is and how to install and configure their VS Code extension, but not a single “ah-ha that seems interesting” example to entice me to try it.
This link is buried in the readme: https://looker-open-source.github.io/malloy/documentation/la...
"Currently, all joins in Malloy are left outer joins, and must be between the primary key of one table and a foreign key in the explore being joined to."
https://looker-open-source.github.io/malloy/documentation/la...
that's a non-starter for my current project. We have some complex queries with non-trivial JOINs
Actually joins recently landed. Arbitrary patterns and computes aggregates correctly regardless of join patterns.
https://looker-open-source.github.io/malloy/documentation/la...
>Malloy reads the schema so you don’t need to model everything. Malloy allows creation of re-usable metrics and logic, but there’s no need for boilerplate code that doesn’t add anything new.
That is the one thing that I miss in SQL. I find it extremely wasteful (developer hours) that every SQL query needs to redefine the whole relational model from scratch. The RDBM knows the model, it should use that knowledge to automate the necessary joins.
I get this, but I prefer being explicit over less typing. I have implemented schemas in the past involving multiple routes between tables, and self-joins, and getting an engine to work all that out would be a problem. Easier to just do more typing.
SQL is more like asm. It's the language of the databases.
In my opinion, you'll have to create a language that compiles to SQL that is better than handwritten SQL and is easier to write, just like writing in C is easier and gcc can create better asm than you.
The problem is that ASM_SQL is easy to learn and write and has a lot of features, so that this new C_SQL must be even easier to learn and write and/or a lot more powerful.
It's a high bar to overcome and reordering and renaming SQL things is not going to make it.
SQL is closer to C in this analogy. Most relational databases engines don't actually run SQL. They have a native internal programming language (closer to assembly code) that is much cleaner and sometimes more expressive, which SQL is translated into. This internal language is then either interpreted or JIT-ed into machine code. In rare cases, the internal language even has a nice expressive and composable text syntax that programmers can write directly. This makes everything easier and more flexible from perspective of people writing the database engine, even if the intent is to only ever run SQL queries at the public interface.
Most relational databases mostly don't run on SQL anymore, that is a legacy interface that is exported for compatibility. Unfortunately, that is still the only vaguely portable programming interface and everyone knows it, so that's what we use.
Google owns Looker and already has this SQL replacement: https://opensource.google/projects/logica
I find the join syntax particularly ugly.
In SQL, "X INNER JOIN Y ON something" is a well-behaved table-type value that can directly replace "X" in a query and can be nested in further joins.
Here I must name a join, as if it is a property of a table, and it isn't clear what I'm joining and selecting exactly.
In my book the only thing a better sql needs is to start selects with the from clause. -‘from orders select orderid’ Which would make for better autocomplete among other things.
I lean towards this idea but it might make non trivially joined queries harder to understand, since it shifts the select to the middle or the end of the query.
So you change:
Putting what you want further down the list might be more confusing rather than less.
If that was the case, almost all programming languages had `where`, as in
But that is not less confusing, that is more.
To a programmer, sure, but SQL was designed as a natural language[0] for non programmers and the syntax reflects that i.e
Vs Yoda
The original use case has long failed, but here we are with SQL stockholm syndrome. I don't hate it, but strictly speaking it is a bit odd.
Edit: there was a linguist in the group that developed SQL to help with the goal of making it easier for non programmers.
[0] https://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95-Syste...
IMO it also needs to default destructive statements (update, delete) to one row limit, and you have to explicitly choose to override that.
In the last 10 years of my career, many a dev & prod databases have been accidentally harmed by an overzealous query.
This could also add a warning for "left-over" non-updated rows, for example if the developer signals that only one row should be updated, but the WHERE clause has many rows, its usually a sign of a missing unique key, something I've seen before as well.
100% on restructuring to improve tooling. discoverability is the big thing we lack. from would be a little more verbose than necessary but otherwise yes!
It's hard to take anything seriously that doesn't start with an actual code example.
Double so if they did bother to make many long claims about it being superior.
If there is anything here, the communication is really bad.
1. Start with a hello world example.
2. Add a concrete comparison (i.e. an example in both Malloy or SQL) for each claim.
For now, you are just wasting people's time releasing this like this.
All tell, no show? It's vaporware.
Here is the quick start.
https://looker-open-source.github.io/malloy/documentation/la...
That entire README and not a single line of Malloy to look at? And no comparison to existing SQL? I'd love to give this a try but the wall of text could be/should be replaced with some simple side by side examples and brief descriptions of why Malloy is "better" than SQL.
Sorry, the link to the documentation is buried.
https://looker-open-source.github.io/malloy/documentation/la...
They don't have any examples of the language in the README. And the examples buried in the links at the bottom don't look very attractive.
Not a big fan of SQL dsl's personally. If you feel you'd need a better query syntax and plotting for interactive use, I'd suggest looking at the dataframe libraries that have a larger community to discover how to solve things and have ironed out issues.
Specifically dataframe logic is easily mappable to dbplyr https://dbplyr.tidyverse.org/articles/sql-translation.html for those that know R and ibis https://ibis-project.org/ for Python/Pandas
Judging by the examples it seems different but equivalent. It certainly doesn't seem obviously better. Personally it's not compelling enough for me to want to try it out.
Two issues: First, to be a modeling language, tools have to be able to discover entities and relationships by reading a model definition. Second, to be successful as a modeling language, users need to be able to author models easily. If it has to be hand-coded it will not be successful.
For the first issue, this language has a non-standard syntax. It doesn't use JSON or XML. So either I have to write a parser on my own, or Google has to provide parsers in every language. Also, models are expressed as "explores" [1] where they define the fields available, and the query to use to get the data. But that's not what I expect from a modeling language. The purpose of modeling languages is to enable users to write new ad-hoc queries on the fly. If the queries have to be pre-written into an "explore" then this is not really a modeling language, it is just a query language.
I don't see this taking off as a modeling language.
[1] https://looker-open-source.github.io/malloy/documentation/la...
SQL is like an uncut diamond. Very valuable, but would be even more valuable with some tasteful adjustments. However, Mallory seems not to be that.
Looks like one of the benefits is the ability to query nested data sets. When i need that i usually do that by generating a single json response in the query using row_to_json, but that's not really standard sql, and requires an additional deserialization step. I wonder what sql statement they generate to support nested sets.
I work in a relational lang as hobby. I have seen as many sql replacements as I can, and most commit some or many problems. The biggest one, IMHO, is they stay as constrained to be "just a query lang" removing any hope to be useful at large.
In mine, the lang IS actually composable:
https://tablam.org
BTW, this is not new. This is more or less how was with FoxPro (with more verbose syntax, but same feel).
Instead this, like many other attempt, make "relational language" be embarrassed to be an ACTUAL programming language. And it looks more like a clone of GraphQL than of SQL, btw...
I agree with the other commenters, I don't see much from the query language itself. However, the plotting integration is very interesting for exploratory-type analyses or quickly figuring out what's going on with your data in a pinch, I found myself wishing for something like that on many occasions.
As far as the "next SQL" goes, my hunch is that we should learn our lesson from the various ETL tools that became popular in the last two decades and base it on infinitary stream algebras, it seems like it's the correct computational model to deal with generic, potentially non-uniform data. (I'm afraid I don't have the chops to argue the idea further, though --)
I think 2 things could make this awesome: - built-in testing: testability for SQL still hasn’t been solved but it feels like Malloy could do it - compile to Dataflow
I am getting Data Analysis Expressions ("DAX") vibes. Feels like what it tries to do is similar to the DAX language used by PowerBi and Analysis Services for Direct Query but expressed differently I am honestly having a hard time reading their examples versus DAX. Something missing that DAX has is a selected values feature. I imagine looker might have something similar?
SQL resisted attempts to improve upon it for many-many years. Kudos to Looker for not giving up!
Automatic generation of nested JSON results is very interesting:
https://looker-open-source.github.io/malloy/documentation/la...
This looks more like AST than something humans should write and read, and this is coming from someone that digs Lisp :)
AST as a rule should be easily constructed programmatically. Malloy has a syntax though.
Quickstart documentation (linked from readme), with many examples, is here: https://looker-open-source.github.io/malloy/documentation/la...
Looks nicer but not sure if it brings enough functional benefit to overcome the lingua franca of data processing. Time will tell but if they can get a few major database engines to support it natively, maybe it can take off. I am somewhat skeptical though.
Looks like Elixir's Ecto: https://hexdocs.pm/ecto/Ecto.Query.html#content
Ecto is inspired by LINQ from .NET
Disappointed that there wasnt a single snippet on their README.
"Better" is a dubious claim, and there wasn't much information to compel a passerby of the claim.
I have made something like this too , but its for database scripts:
https://text2db.com/
So it's a "CoffeeScript for SQL", sounds intriguing...
Though, some more examples with real-world use cases would be good.
We're hoping its more of a 'typescript' but yeah :)