Mike: open-source legal AI

200 points by noleary 1 day ago

jbstack 22 hours ago

Legal professional here. This is NOT a replacement for proper legal AI assistants (e.g. Westlaw, in my jurisdiction). As far as I can tell, this is just a wrapper around regular LLMs i.e. nothing that you couldn't achieve yourself with the right prompting.

What legal professionals actually pay for, and that is virtually impossible to replicate unfortunately, is to give the AI access to a legal database of case law. Without case law, you can't do accurate legal research, and you are inviting disaster if you're doing things like drafting statements of case or skeleton arguments.

There's a reason why companies like Thomson Reuters have an oligopoly on these types of products, and can get away with charging thousands a year. They are the only ones with access to a comprehensive set of case law, and they've entrenched their position by having exclusive contracts with the law reporting companies. Without that, your model is just relying on publicly available cases that it can find on Google etc., and that's just a fraction of the full set.

With that said, these types of competitor products can be useful if you're just doing simple tasks like drafting letters or reviewing contracts and you accept that you need to do the legal research separately. But again, you can get that with just ChatGPT + a good prompt.

hn_throwaway_99 22 hours ago

> There's a reason why companies like Thomson Reuters have an oligopoly on these types of products, and can get away with charging thousands a year. They are the only ones with access to a comprehensive set of case law, and they've entrenched their position by having exclusive contracts with the law reporting companies.
I'm not in the legal field, but can someone explain that further? I would have expected that all case law is public access. Not necessarily easy access, but when a judge writes an opinion, why on Earth would that opinion be gated behind a corporation? What am I missing?
- eagleal 21 hours ago
  
  Do not know about the US, but some countries publish some type of high profile cases but _only_ after anonimization for obvious privacy reasons.
  To access the DB through the modern archive (well modern as new rules) you'd have to be an accredited professional passing through a few legal hardles and digital chancellor's office for each copy. It's like going to a bureaucracy^bureaucracy office.
  Some early companies given their initial foothold were not required these checks so they were able to get hold of bigger archives (it's also important to remember often legislation or conformity is done through consulting or lobbying done by these entrenched players).
  They can also build on the Data professionals themselves submit.
- jbstack 21 hours ago
  
  Yes, it sounds crazy and against the principle of open justice, but unfortunately this is the reality. Certainly in the UK which is my jurisdiction - and I believe in the US too although I don't know for sure.
  In theory, any member of the public can obtain a judgment by applying for one at the court and paying a fee. That's fine if you just need a one-off judgment, don't mind paying the fee, and you're not in a hurry. It also assumes that you know which case you need.
  For realistic legal research, you might need to wade through dozens of cases just to even know if any of them are relevant, you might have a deadline of tomorrow to get it done, and you might not want to pay that fee for a bunch of cases that you aren't going to end up needing. Only a company which already has a comprehensive copy of virtually every important case can help you here.
  A typical workflow for a complex piece of legal research might look like this:
  1. You need to research a legal topic.
  2. Do some Googling, or chat to your LLM, to get a rough overview and some pointers for further research (but don't completely rely on what you find).
  3. Read some professional content (e.g. Practical Law articles relevant to the topic, or a legal textbook).
  4. Read the relevant legislation.
  5. Use a legal database to download all the cases you found from steps 2 and 3 which seem like they might be relevant.
  6. Use a legal database to download all the cases which cite the relevant legislative provisions you found in step 4 and seem like they might be relevant.
  7. Use the legal database to confirm that those cases are still good law (not overridden or criticised by a later case).
  8. Skim read them, discard those that turned out to obviously not be relevant.
  9. Read the remaining ones more closely.
  10. Note any useful-looking cases which are cited in the ones from step 9, and recursively work your way through those cases as well.
  Relying on court-provided copies of judgments won't realistically help you with most of these steps.
  
  ericmcer 17 hours ago
  
  those steps are what a RAG LLM agent setup excels at.
  If tech companies invested 10% of what they have in AI assisted coding tools into AI assisted legal tools, they would be able to do those steps easily.
  It is definitely coming.
  
  hn_throwaway_99 14 hours ago
  
  Related question, then - what do judges use when they have to write opinions in the first place? Do they have to follow the same process and use Thomson Reuters?
  It's obviously even more important for judges (compared to lawyers) to be able to easily search all of the relevant case law to see which cases are controlling and would have precedence. Seems bizarre to me that this critical function would be gated behind a corporation.
  
  gadders 3 hours ago
  
  They have staff that do it for them :-)
  
  whattheheckheck 8 hours ago
  
  This all needs to be publicly accessible for free. Gonna see how blatently inconsistent laws and interpretations are
- jonners00 19 hours ago
  
  This is great on this topic: https://www.thebignewsletter.com/p/gatekeepers-of-law-inside...
dspillett 21 hours ago

> Legal professional here. This is NOT a replacement for proper legal AI assistants (e.g. Westlaw, in my jurisdiction). As far as I can tell, this is just a wrapper around regular LLMs i.e. nothing that you couldn't achieve yourself with the right prompting.
I'd not use generative AI for anything but a cursory check anyway⁰. Even if it is trained on clean up-to-date data rather than all the wrong information that is out there, it could still give a wrong answer and I have no leg to stand on if I rely upon it. At least if I pay a human and they trust the LLM too much, I'll hopefully have some call to pursue them for giving bad advice when it bites me.
--------
[0] Or at all… But even if I wasn't someone actively avoiding LLMs, the point would still stand
htrp 21 hours ago

What does legora, harvey, or crosby add here other than the default westlaw/TR/lexis integrations?
- david_shi 14 hours ago
  
  I'd imagine it's like using Cursor/Claude Code vs. a Jetbrains IDE plugin.
r0fl 12 hours ago

In Canada there is a database of a lot of case law here: https://www.canlii.org/
Theoretically speaking if someone scraped all of it and added it to something like this open source Mike project would that then be a much better tool for lawyers?

xrd 23 hours ago

IANAL, obviously!

One question I have about legal AI startups/products, is how do they maintain or improve upon billing practices of law firms?

Having worked with a bunch of lawyers, I know that I'm often paying $500/hr to that firm. That work is actually done by a paralegal who is being paid $40/hr, and then I'm being billed through the partner for an extra $460/hr. This is a gross oversimplification, but you get the point.

If the partner needs to bring in $5M a year, how does any addition of tech solve that?

If I'm the customer of the law firm, I would love to have a more cost efficient way to get legal advice. But, I don't understand how those incentives are matched by the partner? I don't really think they want a more efficient result for their customers, they want a better way to get more billable hours. Adding "tech efficiency solutions" does not solve that issue at all.

Inevitably, customers will use LLMs on their own, and as people have noted, lose attorney client privilege (and often get hallucinated bad advice). There will probably be some very comical court room dramas when people try to represent themselves with an LLM on their shoulder.

Am I misunderstanding something fundamental about the legal world that will make a major law firm adopt this tech? I feel like there are some strong reasons they will universally avoid moving in this direction. Long term it will win and there will be blood on the floor, but why would any large firm adopt this stuff right now?

piokoch 23 hours ago

My guess is that capable lawyer who will be able to spot hallucination and figure out key stuff missed by AI will be satisfied to earn $500K a year, not $5M, so he will charge less. Those who charge $500/hr will simply extinct or will be a luxury used by very rich people for not particular reason, like the ones who buy other overpriced goods.
- htrp 21 hours ago
  
  just a gentle reminder that most lawyers bill about 2200 to 2300 hours a year (at the top tier). Even at the crazy tier (no-life all work), lawyers generally don't exceed 3000 hours a year
- xrd 21 hours ago
  
  I'm not saying a partner earns $5M, but that he is responsible for bringing that into the firm.
  I just don't understand how decision makers at a big firm are going to say yes to tech solutions when those solutions will kill the goose roaming their hunting grounds.
lawtalkinghuman 22 hours ago

The answer is: the market will work it out eventually. Clients will push for more work to be fixed-fee/outcome-based rather than billed hourly. There'll be some small firms who'll successfully grab lots of lower value clients who are willing to use digital tools to handle their work and don't particularly care about having a big fancy office in London or New York if it means lower bills (and they can then basically use the relationship they've had providing the supervised online service to be the first point of call when said client wants something that's less off-the-shelf and needs more work).
Also, an interesting example: in English litigation (where, broadly, loser pays unlike America where each side pays), maximising billable hours is not always a viable strategy for anybody if those costs aren't recoverable on success. Someone involved in large-scale commercial litigation involving disclosure of millions of documents who doesn't use algorithmic document classification (now pretty broadly accepted as normal) potentially runs the risk of a judge determining that the costs of going through all the documents by hand isn't recoverable. Insurers/litigation funders aren't going to want to risk padding the costs so much that the judge prevents them from recovering their stake in the litigation.
Customers using their own LLMs: yep, they might do that. I think the pitch from the legal LLM providers is "we've got legally trained people doing RLHF to make it more accurate" mixed in with "also we've got a partnership with Lexis/Westlaw/etc. so we can do legal research that's better than what's on the open web", with a little bit of "if you get sued for professional negligence, 'I used the legal AI thing that's built into Westlaw' is gonna be more convincing to a judge and jury (and your insurance company) than 'I used ChatGPT, yes, like the app you've got on your phone'...".
yeehawtypebeat 20 hours ago

It really depends on the firm and what work they do. The firm I work for, we do not bill hours. We take a percentage of the recovered funds. It's high volume and many tasks are repetitive.
We don't have paralegals/attorneys handle cases from beginning to end. We have different positions handle different tasks. One person may only do scheduling, another does discovery, another handles reviewing releases.
For us, adopting tech to make us more efficient is a priority. Our setup is a bit unique, but I can see PI and collection firms adopting tech similar to this.

jcfrei 1 day ago

I believe this is the direction enterprise software is generally going. An open-source base with a very permissive license that then each company can adapt (with claude, codex, etc.) for it's own needs. It's either running it on it's own infrastructure or in hosted environment by the author. I've built a similarly extensible codebase for an ERP: https://github.com/lambdadevelopment/lambda-erp

somewhatgoated 1 day ago

How will developers of this software get paid in this model?
- ford 23 hours ago
  
  I think a more realistic model is not fully open source, but apps with extremely open/flexible APIs and data models that allow arbitrary front-ends (likely with a default one provided by whoever provides the API). Kind of like Stripe's model, but the audience of "developers" is bigger since anyone can be a "developer" with Claude Code
  Or maybe it will be the more established open source model where the code is free but the maintainers offer hosting/some default product
- jcfrei 23 hours ago
  
  good question - some thoughts I had: hosting the model and maybe some review process. for example: you have the customer's employees telling llms about new features and then a dedicated review cycle on the hosting side makes sure it doesnt break anything and is secure, etc.
ford 23 hours ago

I'm really interested in how LLMs will enable more customizable, personal software. Our PMs & Designers are writing a lot of code now, and our engineers are spending time figuring out how to make a system that's easy for PMs & designers to extend/add to.
It's not a big leap to apply that model to a company and its customers, where the company builds a well-abstracted, easily extensible base that 1) Customers can easily extend/customize for their workflows 2) Customers can self-host or run fully isolated, much easier (probably not quite there yet, but is a possible world)
- bluefirebrand 23 hours ago
  
  > Our PMs & Designers are writing a lot of code now, and our engineers are spending time figuring out how to make a system that's easy for PMs & designers to extend/add to
  Sounds like your developers are relegating themselves to being review monkeys instead of developers
  
  ford 23 hours ago
  
  In a post Claude Code world that's the job of engineers - the engineering is designing good abstractions, scalable systems, and things that are easy to contribute to. This is what the highest leverage senior engineers have always done, the audience has just changed
  Engineering has moved up another layer of abstraction (just like we moved past managing buffers & writing machine code)
fittingopposite 13 hours ago

This is AGPL.
- kavalg 6 hours ago
  
  How come? the github page says the license is MIT.
meekaaku 3 hours ago

This looks great. The demo is very fast. Is it static generated or is it reading the sql db?

reverius42 1 day ago

Presumably this is an issue for the commercial competitors too, but in light of the recent court ruling in United States v. Heppner that AI chatbots can break attorney-client privilege and/or work product doctrine, what kinds of things can this be safely used for? (I would assume you want to avoid sending anything with client-confidential information in it to a service provider like OpenAI or Anthropic.)

Potentially if used with a local LLM and not a service provider, this might protect attorney-client privilege?

robertritz 1 day ago

United States v. Heppner mentioned a public chatbot service. If a law firm (or specialized provider) offered a chatbot using their own servers and hosted the traces and other data on the law firms own servers it would almost certainly be protected. But another case would need to happen to determine that.
But that only applies for clients using the chatbot. If a lawyer is using the LLM it is definitely protected. No different if a lawyer searches something on Google or Lexis Nexis. The search itself is protected. I guess you could debate metadata but the content surely is protected.
debarshri 1 day ago

you can have dedicated deployment per customer per case, segregating it logically. I have seen this happen in larger law firms. It could be based on groups, teams, partners etc.
victorbjorklund 1 day ago

It’s not different from googling. If a non-lawyer googles legal advice (”how to give yourself an alibi after murdering someone”) it will not be protected by attorney-client privilege. Same if you ask OpenAI.
- llagerlof 1 day ago
  
  This. I am telling this since the boom of generative AI and promptly being ignored.
  
  mettamage 1 day ago
  
  Some people pay attention. I know I do. Thanks for mentioning it.
  
  alansaber 1 day ago
  
  You're right but lawyers are naturally looking for precedent to support this

nlh 22 hours ago

We're in this weird in-between phase of the tech world where projects like this can now be put together in a few hours/days, but the audience of us HN folk are still trained on the idea that this is the result of months or years of work.

We're going to have to re-train ourselves on what hard work looks like (and thus what should be upvoted here).

I don't know whether the project's creator (@willchen96?) is a lawyer, or if they work at a law firm that helped them shape this, or how much time and effort they put into this, or whether law firms even want or need a vibe-coded open source project for their legal AI stack, but we should be considering the totality of those things when looking at new projects these days.

There's a lot of red flags here.

solarkraft 22 hours ago

Your comment (maybe accidentally) encodes the notion that hard work is the thing to appreciate.
I don’t actually care that much about the work having been hard - I care about the result being good.
- nlh 22 hours ago
  
  Totally fair point!

kostarelo 1 day ago

For a moment I thought it was some open-source LLM trained on legal. It's not, it's a web app wrapping major LLM providers and streamlining legal workflows, uploading documents, and having the LLM providers interact with them.

Cool project regardless!

dahcryn 1 day ago

yeah I thought that was the USP of Legora and Harvey, so this is not the same thing at all, just surfing the brand recognition
- alansaber 1 day ago
  
  Harvey made it a point to FT ChatGPT models for a year or so but they were struggling to keep up with the pace of new model deployments and quit. They never went as far as Cursor AFAIK which produced its own routers/"composer" models.
- erispoe 23 hours ago
  
  Harvey doesn't have finetuned model anymore do they?

paultopia 1 day ago

I'm a little puzzled by what this actually is supposed to be. The marketing material on this website suggests that it's meant to be used with a firm's Gemini or Claude API keys. ("A chat interface that reads your documents, cites verbatim, runs multi-step workflows, and drafts and edits contracts end-to-end. Plug in your own Claude or Gemini keys, and keep full control of the models you use.").

If that's true, how does it actually achieve anything with respect to client confidentiality or anything else? (For example, there's the claim "the assistant keeps full context across every conversation and every document." --- but isn't that a function of the model one uses, which is on Anthropic or Google? Ditto the claim "Documents never leave your perimeter. Compliance, residency, and privilege stay under your control." But this is only true if you're not piping them to Anthropic or Google...) Is this just a user interface?

It would be nice if these product webpages included an easy way to find documentation so that one could figure out what the product actually does. I can't find any obvious way to discern if it can be easily used with a local model running via ollama or something, for e.g.

piker 23 hours ago

These firms have enterprise relationships that dictate all of that. This is presumably just a frontend that takes the key as an input and plugs into that infrastructure.
ford 23 hours ago

The "open source" part is the wrapper on top (up to you if you believe that's meaningful here)

kernalix7 1 day ago

Self-hostable legal AI as open source is a useful direction in principle. Hard to tell how mature the actual implementation is though, the repo is pretty fresh and the marketing site is doing a lot of heavy lifting compared to what's in the code right now. Will be more interesting to revisit in a few weeks.

0xbadcafebee 1 day ago

Rule of tech products: the nicer the splash page is, the worse the product is
- superfrank 1 day ago
  
  Apple would like a word...
  
  MrDrMcCoy 22 hours ago
  
  If we restrict the rule to just software in Apple's case, I think it still holds :P
  
  ergocoder 11 hours ago
  
  I mean, a rule can have some exceptions. That's fine.

sandreas 1 day ago

Cool project. What a pity it's not mikefoss.com, would match the soundex of Mike Ross from suits even better ;-)

buggy6257 12 hours ago

Especially since it’s “a competitor to Harvey”

embedding-shape 22 hours ago

Maybe it's just me, but seems strange to not include what countries legal system this is for, and having it prominently front and center?

Since this is HN, I guess it's fair to assume it's for the US, but since English is used in more countries than the US, wouldn't it be a good idea to say outright what countries legal systems this actually understands and supports? Or is it maybe meant to be country-agnostic somehow? If so, that isn't very clear either.

syntaxing 1 day ago

I always wondered if Justin Kan’s Atrium closed door prematurely by just 2-3 years. It would have been cool to see a “technology” driven law firm and how it would have adjusted to LLMs.

alansaber 1 day ago

There are loads of them now. Great for trivial work. Not so great to highly templatise more complex matters.

scosman 1 day ago

2 commits, 8 hours old....

albertgoeswoof 1 day ago

And yet 130 stars
- m4rkuskk 1 day ago
  
  No way they got that many stars in that little time. buy.fans must run a special right now.
- dalemhurley 1 day ago
  
  Amazing work, 130 stars is quite high for a niche product within hours!
- KingOfCoders 1 day ago
  
  Not saying they did, but buying a 100 starts is cheap.
  
  piker 1 day ago
  
  The post exploded on LinkedIn and the repo is likely being starred by hundreds of vibe coders. It’s legit, but may have a lower signal value.
georgespencer 1 day ago

OP's Github profile looks very fishy.

re_spond 1 day ago

Cool initiative. Is this fully separate from "legal Mike", the Dutch company that provides a similar solution, https://legalmike.ai/product/ ?

That may be confusing on the naming.

iot_devs 1 day ago

I thought it was named after the characters of Suits: Harvey and Mike

wps 1 day ago

This website is actually gorgeous. What do you call this style?

anon373839 1 day ago

Agreed; that's a beautiful site. The main design style apart from minimalism that I notice is glassmorphism. Well, that and a very well chosen Monet to set the tone.
NamlchakKhandro 1 day ago

It's called "We just discovered Claude Code and so we think Anthropic is Amazing so everything they do is godlike and thus their design choices must also be god like. Apple is Dead, Long Live Anthropic" style.
- anon373839 1 day ago
  
  Hm, I don't think this looks like Anthropic's design style. Anthropic is kind of doing a Chobanicore + Corporate Memphis design system that I personally find kind of creepy. But the website here just feels fresh and pleasant.
- rvz 1 day ago
  
  > Apple is Dead, Long Live Anthropic" style.
  Except that the font that it is using is EB Garamond and Apple was heavily using the Garamond font in the mid-1980s to 2000s.
  Given that almost everyone is copying both, it is now garbage.
- nipponese 1 day ago
  
  I think you mean this
  https://github.com/anthropics/claude-code/tree/main/plugins/...
elicash 1 day ago

Do you just mean the Monet at the top? I know little about art, but I assume impressionism.
That, plus an Anthropic-like logo.

trilogic 1 day ago

Why don´t you put a direct link that redirect users to some proprietary AI providers instead of making it look fancy. (If I ask whatever AI model will produce same outputs/forms, structured as you wish, and even locally). To qualify as some wrapper you need to add a layer of creativity by you on top of the existing ones.

campers 1 day ago

Interested to try it out! Some feedback on the homepage there's nothing above the fold, or directly below that says its a Legal AI platform. I would like a legal AI tool, but I'm not familiar with the space don't know what Harvey or Legora are. It was only the hackernews title "Mike: open-source legal AI" that gave the context.

PeterStuer 22 hours ago

Looks like a prototype thrown together in a weekend hackathon. At fist glance, flimsy wrapper around a model with a chat interface, a few prompts, and some very basic RAG. What did I miss?

williamtrask 19 hours ago

Sometimes I find joy in noticing the importance of comma placement:

Everything the incumbents ship, in an open codebase your firm owns.

Everything the incumbents ship in an open codebase, your firm owns.

jchurch1 23 hours ago

This is complete vibe garbage.

go look at the auth - it's a call to supabase.

go look at the migrations - it's like 5 tables.

There is a real need in the space and a real opportunity for a solution like this but this is a complete nothing burger of what exists in the underlying code.

The requirements for this kind of product are extensive and complex. The shape of the data layer is complex and nuanced. Absolutely none of this is considered or implemented in the project but it sure is blowing up.

AntiUSAbah 1 day ago

I find the name and presentation well chosen. For whatever reason "Mike" fits well in this legal context.

codechicago277 23 hours ago

Mike is Harvey’s genius mentee in “Suits”
- latortuga 22 hours ago
  
  I had a chuckle at the domain name, mikeoss.com
- AntiUSAbah 21 hours ago
  
  Aaahhh perhaps Legora broke my connection with suits :)
gigatree 23 hours ago

Besides the Suits connection?

oliwary 1 day ago

The name is really clever given that the character in Suits is called Mike Ross. :)

albertgoeswoof 1 day ago

How does this work with docx files? The screenshots only show pdfs?

timdim 1 day ago

LibreOffice for DOC/DOCX to PDF conversion
- albertgoeswoof 1 day ago
  
  how does the agent edit the docx files then? or does it convert all docx to pdf, parse the PDF into context, make edits and then save it back to docx?
  laywers live in docx not pdf

typeofhuman 1 day ago

Behold the continued tradition of AI products having logos that look like buttholes.

higginsniggins 1 day ago

Beautiful website.

DetroitThrow 21 hours ago

This doesn't have most of the important features of Harvey or Legora. This is just a vibe coded project that escaped showlim. Urgh.

kleiba2 1 day ago

I'm so tired of having to sign up to some new service even just to try it out.

robertritz 1 day ago

So open up your new product to every random agent and griefer on the internet? Why would you do that?
- kleiba2 1 day ago
  
  No, I mean just to try it out.
- alansaber 1 day ago
  
  There are guest accounts, you know.
  
  kleiba2 23 hours ago
  
  Where are they?

ebipaul5194 1 day ago

Is it safe to share details with AI for case points what happened when data is breached. Victims name will be reviled right?

voidUpdate 1 day ago

Because using LLMs for legal work has never gone wrong, and LLMs have never cited completely hallucinated cases

jbstack 22 hours ago

Humans make mistakes too. In many cases, more often than LLMs. Humans are still useful for doing work.
I use AI extensively in my legal work. But I check every citation myself, manually. That means that I read the entirety of every case that I plan to cite in my output, and I check on Westlaw that it hasn't been overridden by a later decision. If you're just producing the AI's output verbatim, then you have only yourself to blame when things go wrong in the courtroom.
- voidUpdate 22 hours ago
  
  Maybe you check every citation, but there have been several news stories recently of lawyers using LLMs and using completely fabricated cases without verifying they exist
  
  jbstack 22 hours ago
  
  Agreed, but that's the fault of the lawyer, not the LLM.