summerlight 4 years ago

Not sure if anyone from the Malloy team is going to read this comment, but if so then please add a dedicated comparison page between equivalent SQL and Malloy codes, especially for complex queries. That will do more PR works than 100 pages of detailed introductory materials.

From technical documentations, I can see some potentials for significant simplifications on complex analytical queries but most people don't have such time or interest on new languages.

  • reilly3000 4 years ago

    If this is what I think it is, that wouldn’t be pretty. I used looker heavily a few years ago and its generated SQL is pretty gnarly. Imagine a wall of COALESCE( COALESCE( COALESCE(…

  • ltabb 4 years ago

    Thanks, that is reasonable feedback. We'll work on updating the documentation.

    Here is an example of Malloy solving today's worlde in 50 lines of Malloy

    https://looker-open-source.github.io/malloy/documentation/ex...

    And here is the equivalent SQL.

    https://gist.github.com/lloydtabb/32f46e7ecbb2da1a443d1adbe9...

    • hitekker 4 years ago

      Is it actually equivalent? I know the SQL is compiled from Malloy but I imagine the handwritten version, with comments, would look better in comparison.

    • seanhunter 4 years ago

      I've certainly done plenty of complex sql querying in my time, but I don't think making nice-looking code for solving wordle bears much relation to normal analytical sql workflows.

fouc 4 years ago

Example looks like this:

    query: sessionize is {
      group_by: flight_date is dep_time.day
      group_by: carrier
      aggregate: daily_flight_count is flight_count
      nest: per_plane_data is {
        top: 20
        group_by: tail_num
        aggregate: plane_flight_count is flight_count
        nest: flight_legs is {
          order_by: 2
          group_by: [
            tail_num
            dep_minute is dep_time.minute
            origin_code
            dest_code is destination_code
            dep_delay
            arr_delay
          ]
        }
      }
    }

https://looker-open-source.github.io/malloy/documentation/ex...

  • boomskats 4 years ago

    Amongst many other things, the README tells me that `Malloy is immediately understandable by SQL users, and far easier to use and learn`. Having just looked at this code sample, I really don't agree with that README.

    • civilized 4 years ago

      That's why they tell rather than show.

    • wruza 4 years ago

      Which part of it you couldn’t understand?

      • fphhotchips 4 years ago

        I'll bite. What is that code example actually returning? I don't have a clue what fields I expect to get back from that.

      • da_chicken 4 years ago

        What's nest actually doing? Is it an inner join/lateral/apply? An outer join/lateral/apply? A correlated subquery? If so, what's the key? It doesn't look like it has a key, so it's equivalent to a cross join? That's hardly ever useful. Even this example query feels like it would just be better as 3 queries if it's just a cross join. I don't want every row to have a list of the exact same 20 records on it.

        How does aggregate know what function to use? The examples on the aggregate page actually list aggregate functions, but everywhere else it's just... magically decided?

        It supports TOP or LIMIT, but where's the windowing? Where's OFFSET 100 ROWS FETCH NEXT 25 ROWS ONLY for page 5?

        What the heck does "order by: 2" mean? Even if it's an ordinal position, that's not a great example. Yes, SQL technically allows you to use ordinal positioning of columns in the ORDER BY, but it's horrible practice to use that because it's difficult to maintain. There's no way to tell what the original intent was, and the output columns change all the time when queries are reused.

        My real question, however, is: "How is this actually easier?". Yes, sure, using symbols instead of English words is shorter, and a lot of programmers mistake a lower character count for being easier. However, it's not actually easier. It's just a bit less typing.

        • ltabb 4 years ago

          > What's nest actually doing?

          1) Nesting builds nested results (like GraphQL). This is particularly hard to do in SQL but is allows very large complex data sets to be returned in a single query.

          https://looker-open-source.github.io/malloy/documentation/la...

          For example the dashboard on the page below is a single SQL query:

          https://looker-open-source.github.io/malloy/documentation/ex...

          > How does aggregate know what function to use?

          You can pre-define calculations with 'measure:' or decalre them explicitly in a query.

          > where's the windowing

          It's missing with a bunch of other things (like union for example). Its coming of course. The goal is that everything represent able in SQL is represent able in Malloy.

          > heck does "order by: 2" mean

          Same thing as it does in SQL. We try and have reasonable defaults.

          https://looker-open-source.github.io/malloy/documentation/la...

          > "How is this actually easier?"

          The goal here is to be able to create data models that are re-usable and compose-able and verifiable. Yeah, there is a learning curve as there is with anything powerful. There are many things expressible in Malloy that cannot be easily expressed in SQL.

  • orthoxerox 4 years ago

    Ouch, this syntax isn't a looker.

  • Yoric 4 years ago

    Well, this syntax certainly looks less arbitrary than SQL.

    Feels that it's worth giving a try.

    • ralfn 4 years ago

      It compiles to SQL, so the actual execution plan is as arbitrary as the SQL this compiles to.

      I love database where queries and commands are not declarative but imperative. Like RethinkDB or Mongo.

      Databases with this type of syntax.

      But to use this syntax for an SQL database is a horrible mistake, because it is misleading.

      • Yoric 4 years ago

        I believe that declarative is typically better at letting the db compile it to something efficient while imperative is typically better when the author of the request knows exactly what they're doing.

        > It compiles to SQL, so the actual execution plan is as arbitrary as the SQL this compiles to.

        Yes, I'm purely talking syntax. I find the syntax of SQL to be arbitrary. Do I need a single quote or two? Do I need parentheses here? Or double parentheses? What does this bind, exactly?

        I believe that there is lots to be gained by making the syntax more regular, without affecting semantics.

  • stingraycharles 4 years ago

    It almost is as if someone repurposed YAML as a query language, which I guess is doable since SQL is also a declarative language, which YAML is used for a lot as well.

    But I can’t help but feel like this is basically a different way to describe the same constructs as we have in SQL. It almost feels like a query builder’s syntax.

    To me, the real improvements over SQL are when they part ways with its constructs, such as seen in for example Datalog.

  • tjpnz 4 years ago

    I get a headache just looking at this. I don't think the SQL version would look any worse either.

    • ash 4 years ago

      I don't think there is a good way to return nested JSON using SQL, which this example does.

  • ralfn 4 years ago

    Thank you for making the effort the author's couldn't be bothered to.

    It's horrible btw. You can't even tell if and when order in this syntax is important.

    It's also not clear if it's declarative or operational syntax. But since they are compiling to SQL it has to be the first. Yet the syntax strongly misleads people into assuming a particular execution strategy.

    Not even naming of things is consistent (why shorten "array" into "arr" but not minute into "min").

    It just looks like random syntax choices, without a cohesive rationale made by people that think the declarative syntax of SQL looks worse than for-loops, completely ignoring that the database will not execute anything the way it is structured here.

    • gigatexal 4 years ago

      Yeah I don't like it. And I am not trying to gate keep as SQL is one of the few things I am decent at ... it just looks clunky and way too verbose. SQL is far easier to read. To me this has the verbosity of C whereas SQL is clean like Python (non-typed python).

      • throwaway894345 4 years ago

        I share your enthusiasm for SQL. I've been using it for nearly 20 years now, and it's an important technology. But I've done a lot of SQL generation for tools that manage arbitrary tables and SQL definitely makes this hard problem virtually impossible (or at least too hard for yours truly). So I try to keep an open mind for anything that purports to solve these problems.

        > it just looks clunky and way too verbose.

        Surely you need to compare this with the equivalent SQL before claiming that the SQL is cleaner? What would be the equivalent SQL.

        > To me this has the verbosity of C whereas SQL is clean like Python (non-typed python).

        Is this just a reaction to curly braces?

    • darrenf 4 years ago

      > Not even naming of things is consistent (why shorten "array" into "arr" but not minute into "min").

      I don't believe it shortens "array" to "arr", does it? "Dep" and "arr" here seem to mean departure and arrival, which I'm reasonably sure is usual for the domain (flights).

      • tjpnz 4 years ago

        You're correct although they did demonstrate just how incomprehensible this DSL is. At least with SQL it's easy to distinguish between your keywords, types and columns.

        • __float 4 years ago

          Is it easy with SQL? Or are you just used to it?

        • staticassertion 4 years ago

          It seems pretty straightforward to see what's a keyword vs not tbh

        • zorgmonkey 4 years ago

          On the documentation site, they have syntax highlighting to help distinguish keywords

    • bertil 4 years ago

      > not minute into "min"

      That’s because MIN() is minimum?

  • shepardrtc 4 years ago

    This just looks like someone wanted to build something new regardless of whether it was better or not.

aivarsk 4 years ago

SQL is complete but ugly: everything is expressible, but nothing is reusable; simple ideas are complex to express; the language is verbose and lacks smart defaults.

It begs for examples of the ugly/verbose/complex SQL and the beautiful/succinct/simple Malloy. Looking at the samples folder did no work for me.

  • jakey_bakey 4 years ago

    Agreed. I'm not a big fan but the Coffescript docs do this beautifully. https://coffeescript.org/#overview

    • shepardrtc 4 years ago

      Having worked in a Coffeescript codebase for the past couple of years, its horrible to read. It just turns into a giant mess with multiple layers of nesting and everything just kind of mushed together. Coffeescript looks fun to play with, but having it in a large production codebase is an exercise in confusion and frustration. It lets people be clever with their code. Don't be clever.

      • Oddskar 4 years ago

        I think they were talking about the comparison, not Coffeescript itself.

  • ako 4 years ago

    Indeed, and i'm not buying the "nothing is reusable". Sql views are in my opinion the opposite, as they make every query reusable.

    • cerved 4 years ago

      the downside with views it's that they are persistent so you can't reuse it only in the context of one connection

      you can use CTEs but only with one query

      • zimpenfish 4 years ago

        > persistent so you can't reuse it only in the context of one connection

        Don't e.g. Postgres temporary views cover this case?

        "Temporary views are automatically dropped at the end of the current session."

        • cerved 4 years ago

          ¯\_(ツ)_/¯

          I expect vendor specific stuff called something along the lines of temporary view would do this, yes

    • gwd 4 years ago

      Unlike functions, views don't have arguments you can use to change the output.

      (Edited for clarity.)

      • ako 4 years ago

        Yes they do, you can use them as the source for any sql query, so they have the exact same argument options as tables.

  • kthejoker2 4 years ago

    Without speaking to the merits of Malloy

    Think of a sales orders table. Think of all the orders in there that should be considered "invalid" for some reason or another.

    It would be awesome to define some logic, one time, in one place, for "invalid orders" and then have end users be able to say "from invalid orders", "stores having no invalid orders", "customers with multiple invalid orders", "top 3 reasons orders are invalid" ...

    And of course "valid orders" might just be defined as "not invalid orders"

    Instead this is probably multiple views, CTEs, UDFs, and still some logic in the main query to handle NULLs or joins or whatnot.

    Even little things like hierarchies - I don't want to have separate sprocs for salesByCountry, salesByRegion. salesByState,and then an umbrella dynamic sproc to figure out which one to call.

    It's ironic because SQL is such a human friendly declarative language that it has such poor ability to create meaningful shorthand expressions.

    I've written SQL for nearly 30 years, I love it, but natural language transpilers have shown me some of its limits as an expressive tool.

    • edmundsauto 4 years ago

      This is the use case for a metric store. https://blog.transform.co/history-of-the-metrics-store/

      It's a separate system, which has downsides. As an upside, by being a component of your system, the integrations with monitoring, alerting, quality checks, and dashboards is usually pretty easy.

      • kthejoker2 4 years ago

        Yeah I don't believe in a separate metrics store (or feature store - and I sell one!)

        It should all be in one SQL-like DSL with a natural language transpiler on top.

        • edmundsauto 4 years ago

          Can you recommend any good implementations of the ideal state? I worry that SQL is too much to learn, but natural language is way too ambiguous to map to anything resembling logic. (The best we can do, for natural logic -> executable logic is legislative code... And the ambiguity there is famous.)

bjarneh 4 years ago

> SQL is complete but ugly

All these languages that compile to SQL seem to forget one fundamental problem, when this stuff fails (queries becomes slow etc). We are left to debug auto-generated SQL, which is a complete nightmare..

  • phonebucket 4 years ago

    > All these languages that compile to SQL seem to forget one fundamental problem, when this stuff fails (queries becomes slow etc). We are left to debug auto-generated SQL, which is a complete nightmare..

    My first inclination when hitting slow malloy (if I ever were to use it) wouldn't be to look at the auto-generated SQL, but to analyze a query plan directly, much like I would be doing for a slow SQL query.

    • bjarneh 4 years ago

      > but to analyze a query plan directly

      What query plan? The one belonging to the auto-generated SQL? Then fiddle with Malloy in hopes of it writing better SQL to avoid the bottlenecks?

    • Anon_Admirer 4 years ago

      The auto-generated SQL from Microsoft's DAX Language for Analysis Services and PowerBi is not bad. I think it comes down to how well developed the expression engine converts to SQL. Microsoft though has had many years fine tuning their expression engine (Vertipaq). It may take a while to develop a competant engine.

ejb999 4 years ago

I am not really sure we need a 'better SQL' - SQL has been doing what SQL does in pretty much the same way for 50+/- years - if you work with relational data, learn SQL.

Very few technologies have had the staying power that SQL still holds.

  • wruza 4 years ago

    Is that a good argument? C also does what it does for 50+- years. If you work with code, learn C.

    SQL is a syntax for queries, and it is just bad at some sides. Global-only scoping, issues with common expressions, excessive roundtrips, inability to fetch hierarchical data effectively. Even syntactically it is an IDETIFICATION DIVISION-style spells, whose structure is hard to feel.

    • ejb999 4 years ago

      This tool is just another way to write those very same SQL queries - I have yet to see any tool that purports to write the SQL for you, output SQL that is better than someone that actually understands relational data and sql can do - it is usually not even close.

      This isn't a new language, its a layer on top of a language that already works, and works well.

  • cloverich 4 years ago

    SQL is my least disliked technology of all the ones I use in anger. And reading one solid SQL book has served me well for 10+ years now. The only ask I would want is better errors. But even without it, I rarely spend much time debugging SQL or trying to figure out what Clever Programmer intended their code to do.

  • jandrewrogers 4 years ago

    I think the SQL DML is far from optimal but it mostly works, even though it is quite difficult to use for some cases. It will be with us for a long time.

    SQL DDL, on the other hand, is broken as a standard because it assumes implementation details of the database engine that were widely true when it SQL was designed but are nonsensical or increasingly divergent in modern database engine architectures e.g. the many tacit assumptions about index properties.

daenney 4 years ago

After having gone over the README, I still have no idea what a Malloy expression looks like.

The majority of it is taken up telling me what it is and how to install and configure their VS Code extension, but not a single “ah-ha that seems interesting” example to entice me to try it.

cwillu 4 years ago

"Currently, all joins in Malloy are left outer joins, and must be between the primary key of one table and a foreign key in the explore being joined to."

https://looker-open-source.github.io/malloy/documentation/la...

SkeuomorphicBee 4 years ago

>Malloy reads the schema so you don’t need to model everything. Malloy allows creation of re-usable metrics and logic, but there’s no need for boilerplate code that doesn’t add anything new.

That is the one thing that I miss in SQL. I find it extremely wasteful (developer hours) that every SQL query needs to redefine the whole relational model from scratch. The RDBM knows the model, it should use that knowledge to automate the necessary joins.

  • marcus_holmes 4 years ago

    I get this, but I prefer being explicit over less typing. I have implemented schemas in the past involving multiple routes between tables, and self-joins, and getting an engine to work all that out would be a problem. Easier to just do more typing.

marcos100 4 years ago

SQL is more like asm. It's the language of the databases.

In my opinion, you'll have to create a language that compiles to SQL that is better than handwritten SQL and is easier to write, just like writing in C is easier and gcc can create better asm than you.

The problem is that ASM_SQL is easy to learn and write and has a lot of features, so that this new C_SQL must be even easier to learn and write and/or a lot more powerful.

It's a high bar to overcome and reordering and renaming SQL things is not going to make it.

  • jandrewrogers 4 years ago

    SQL is closer to C in this analogy. Most relational databases engines don't actually run SQL. They have a native internal programming language (closer to assembly code) that is much cleaner and sometimes more expressive, which SQL is translated into. This internal language is then either interpreted or JIT-ed into machine code. In rare cases, the internal language even has a nice expressive and composable text syntax that programmers can write directly. This makes everything easier and more flexible from perspective of people writing the database engine, even if the intent is to only ever run SQL queries at the public interface.

    Most relational databases mostly don't run on SQL anymore, that is a legacy interface that is exported for compatibility. Unfortunately, that is still the only vaguely portable programming interface and everyone knows it, so that's what we use.

HelloNurse 4 years ago

I find the join syntax particularly ugly.

In SQL, "X INNER JOIN Y ON something" is a well-behaved table-type value that can directly replace "X" in a query and can be nested in further joins.

Here I must name a join, as if it is a property of a table, and it isn't clear what I'm joining and selecting exactly.

cjwilliams 4 years ago

In my book the only thing a better sql needs is to start selects with the from clause. -‘from orders select orderid’ Which would make for better autocomplete among other things.

  • smackeyacky 4 years ago

    I lean towards this idea but it might make non trivially joined queries harder to understand, since it shifts the select to the middle or the end of the query.

    So you change:

      1. What I want
      2. Which tables
      3. Which filter
      4. What grouping
    

    Putting what you want further down the list might be more confusing rather than less.

    • wruza 4 years ago

      If that was the case, almost all programming languages had `where`, as in

        (a + b) where
          a = … * c,
          b = … * sqrt(2) + d
          c = …,
          d = … …;
      

      But that is not less confusing, that is more.

      • smackeyacky 4 years ago

        To a programmer, sure, but SQL was designed as a natural language[0] for non programmers and the syntax reflects that i.e

          I need eggs, flour, sugar and vanilla from the supermarket.
        

        Vs Yoda

          From the supermarket eggs, flour, sugar and vanilla I need.
        

        The original use case has long failed, but here we are with SQL stockholm syndrome. I don't hate it, but strictly speaking it is a bit odd.

        Edit: there was a linguist in the group that developed SQL to help with the goal of making it easier for non programmers.

        [0] https://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95-Syste...

  • bearjaws 4 years ago

    IMO it also needs to default destructive statements (update, delete) to one row limit, and you have to explicitly choose to override that.

    In the last 10 years of my career, many a dev & prod databases have been accidentally harmed by an overzealous query.

    This could also add a warning for "left-over" non-updated rows, for example if the developer signals that only one row should be updated, but the WHERE clause has many rows, its usually a sign of a missing unique key, something I've seen before as well.

  • jatone 4 years ago

    100% on restructuring to improve tooling. discoverability is the big thing we lack. from would be a little more verbose than necessary but otherwise yes!

ralfn 4 years ago

It's hard to take anything seriously that doesn't start with an actual code example.

Double so if they did bother to make many long claims about it being superior.

If there is anything here, the communication is really bad.

1. Start with a hello world example.

2. Add a concrete comparison (i.e. an example in both Malloy or SQL) for each claim.

For now, you are just wasting people's time releasing this like this.

All tell, no show? It's vaporware.

rco8786 4 years ago

That entire README and not a single line of Malloy to look at? And no comparison to existing SQL? I'd love to give this a try but the wall of text could be/should be replaced with some simple side by side examples and brief descriptions of why Malloy is "better" than SQL.

civilized 4 years ago

They don't have any examples of the language in the README. And the examples buried in the links at the bottom don't look very attractive.

hantusk 4 years ago

Not a big fan of SQL dsl's personally. If you feel you'd need a better query syntax and plotting for interactive use, I'd suggest looking at the dataframe libraries that have a larger community to discover how to solve things and have ironed out issues.

Specifically dataframe logic is easily mappable to dbplyr https://dbplyr.tidyverse.org/articles/sql-translation.html for those that know R and ibis https://ibis-project.org/ for Python/Pandas

seanhunter 4 years ago

Judging by the examples it seems different but equivalent. It certainly doesn't seem obviously better. Personally it's not compelling enough for me to want to try it out.

petilon 4 years ago

Two issues: First, to be a modeling language, tools have to be able to discover entities and relationships by reading a model definition. Second, to be successful as a modeling language, users need to be able to author models easily. If it has to be hand-coded it will not be successful.

For the first issue, this language has a non-standard syntax. It doesn't use JSON or XML. So either I have to write a parser on my own, or Google has to provide parsers in every language. Also, models are expressed as "explores" [1] where they define the fields available, and the query to use to get the data. But that's not what I expect from a modeling language. The purpose of modeling languages is to enable users to write new ad-hoc queries on the fly. If the queries have to be pre-written into an "explore" then this is not really a modeling language, it is just a query language.

I don't see this taking off as a modeling language.

[1] https://looker-open-source.github.io/malloy/documentation/la...

mongol 4 years ago

SQL is like an uncut diamond. Very valuable, but would be even more valuable with some tasteful adjustments. However, Mallory seems not to be that.

ako 4 years ago

Looks like one of the benefits is the ability to query nested data sets. When i need that i usually do that by generating a single json response in the query using row_to_json, but that's not really standard sql, and requires an additional deserialization step. I wonder what sql statement they generate to support nested sets.

mamcx 4 years ago

I work in a relational lang as hobby. I have seen as many sql replacements as I can, and most commit some or many problems. The biggest one, IMHO, is they stay as constrained to be "just a query lang" removing any hope to be useful at large.

In mine, the lang IS actually composable:

https://tablam.org

    for p in products ?where #price > 0.0 do
        print(p)
    end

    let top10 = products ?limit 10
    let positive = ?where #price > 0.0

    for p in top10 ?positive do
        print(p)
    end    

BTW, this is not new. This is more or less how was with FoxPro (with more verbose syntax, but same feel).

Instead this, like many other attempt, make "relational language" be embarrassed to be an ACTUAL programming language. And it looks more like a clone of GraphQL than of SQL, btw...

qsort 4 years ago

I agree with the other commenters, I don't see much from the query language itself. However, the plotting integration is very interesting for exploratory-type analyses or quickly figuring out what's going on with your data in a pinch, I found myself wishing for something like that on many occasions.

As far as the "next SQL" goes, my hunch is that we should learn our lesson from the various ETL tools that became popular in the last two decades and base it on infinitary stream algebras, it seems like it's the correct computational model to deal with generic, potentially non-uniform data. (I'm afraid I don't have the chops to argue the idea further, though --)

zainhoda 4 years ago

I think 2 things could make this awesome: - built-in testing: testability for SQL still hasn’t been solved but it feels like Malloy could do it - compile to Dataflow

Anon_Admirer 4 years ago

I am getting Data Analysis Expressions ("DAX") vibes. Feels like what it tries to do is similar to the DAX language used by PowerBi and Analysis Services for Direct Query but expressed differently I am honestly having a hard time reading their examples versus DAX. Something missing that DAX has is a selected values feature. I imagine looker might have something similar?

Guthur 4 years ago

This looks more like AST than something humans should write and read, and this is coming from someone that digs Lisp :)

  • bvrmn 4 years ago

    AST as a rule should be easily constructed programmatically. Malloy has a syntax though.

drivebycomment 4 years ago

Looks nicer but not sure if it brings enough functional benefit to overcome the lingua franca of data processing. Time will tell but if they can get a few major database engines to support it natively, maybe it can take off. I am somewhat skeptical though.

andrew_ 4 years ago

Disappointed that there wasnt a single snippet on their README.

"Better" is a dubious claim, and there wasn't much information to compel a passerby of the claim.

otar 4 years ago

So it's a "CoffeeScript for SQL", sounds intriguing...

Though, some more examples with real-world use cases would be good.

  • ltabb 4 years ago

    We're hoping its more of a 'typescript' but yeah :)