WebMCP Proposal

webmachinelearning.github.io

153 points by Alifatisk 4 months ago

The fact that the "Security and privacy considerations" and the "Accessibility considerations" sections are completely blank in this proposal is delightful meta commentary on the state of the AI hype cycle. I know it's just a draft so far, but it got a laugh out of me.

ryanmcbride 4 months ago

don't worry in a few weeks they'll have AI generate some policies for them to skim!
ohyoutravel 4 months ago

This stuck out to me. What a joke.
notepad0x90 4 months ago

I'm struggling to think of a good entry under those sections, what did you have in mind?
For accessibility, that's a client consideration typically, the agent using the MCP server would be responsible for making its output accessible. I don't think the intention is to let webapps define how their output is displayed to end users, but to define outputs for agents instead.
For security, other than what the MCP protocol itself provides, what should be defined?
I think it's a draft, there is still discussion about it, they might not have reached a point where there consensus for those categories. But I'm curious to hear your thoughts.
- dfabulich 4 months ago
  
  > For security, other than what the MCP protocol itself provides, what should be defined?
  The MCP protocol itself provides no security at all.
  The MCP specification includes no specified method of authorization, and no specified security rules. It lists a handful of "principles," and then the specification simply gives up on discussing the problem further.
  https://modelcontextprotocol.io/specification/2025-11-25#sec...
  3.2 Implementation Guidelines While MCP itself cannot enforce these security principles at the protocol level, implementors **SHOULD**: 1. Build robust consent and authorization flows into their applications 2. Provide clear documentation of security implications 3. Implement appropriate access controls and data protections 4. Follow security best practices in their integrations 5. Consider privacy implications in their feature designs
  
  notepad0x90 4 months ago
  
  it's just an http or stdio server, would there be considerations beyond that of any other http server or cli app? shouldn't the security be dependent on deployment details? Like you wouldn't require OAUTH if it is deployed on localhost only, or if there is a reverse proxy handling that bit.
  There is a reason it cannot enforce those principles, an MCP is a web service. it could use SQL as a backend for some reason, or use static pages. it might be best to use mTLS, or it might make sense to make it open to the public with no authentication or authorization whatsoever, and your only concern might be availability (429 thresholds). the spec can't and shouldn't account for wildly varying implementation possibilities right?
  
  davidcrowe 4 months ago
  
  The difference is that MCP introduces a third party: the agent isn't the user and isn't the service, but it's acting on behalf of one to call the other. Standard HTTP auth assumes two parties. That's the gap the spec needs to address.

gavmor 4 months ago

This seems backwards, somehow. Like you're asking for an nth view and an nth API, and services are being asked to provide accessibility bridges redundant with our extant offerings.

Sites are now expected duplicate effort by manually defining schemas for the same actions — like re-describing a button's purpose in JSON when it's already semantically marked up?

foota 4 months ago

No, I don't think you're thinking about this right. It's more like hacker news would expose an MCP when you visit it that would present an alternative and parallel interface to the page, not "click button" tools.
- cush 4 months ago
  
  You're both right. The page can expose MCP tools like via a form element which is as simple as adding an attribute to an existing form and completely aligns with existing semantic HTML - eg submitting an HN "comment". Additionally, the page can define additional tools in javascript that aren't in forms - eg YouTube could provide a transcript MCP defined in JS which fetches the video's transcript
  https://developer.chrome.com/blog/webmcp-epp
  
  znpy 4 months ago
  
  I think that rest and html could probably be already used for this purpose BUT html is often littered with elements used for visual structure rather than semantics.
  In an ideal world html documents should be very simple and everything visual should be done via css, with JavaScript being completely optional.
  In such a world agents wouldn’t really need a dedicated protocol (and websites would be much faster to load and render, besides being much lighter on cpu and battery)
  
  cush 4 months ago
  
  > html could probably be already used for this purpose
  You’re right, and it already is, and tools like playwright MCP can easily parse a webpage to use it and get things done with existing markup today.
  > BUT html is often littered with elements used for visual structure rather than semantics.
  This actually doesn’t make much of a difference to a tool like playwright because it uses a snapshot of the accessibility tree, which only looks at semantic markup, ignoring any presentation
  > In such a world agents wouldn’t really need a dedicated protocol
  They still do though, because they can work more better when given specific tools. WebMCP could provide tools not available on the page. Like an agent hits the dominoes.com landing page. The page could provide an order_pizza tool that the agent could interact with, saving a bunch of navigation, clicks and scrolling and whatnot. It calls the order_pizza tool with “Two large pepperoni pizzas for John at <address>”, and the whole process is done.
jauntywundrkind 4 months ago

I see two totally different things from where we are today
1. This is a contextual API built into each page. Historically site's can offer an API, but that API a parallel experience, a separate machine-to-machine channel, that doesn't augment or extend the actual user session. The MCP API offered here is one offered by the page (not the server/site), in a fully dynamic manner (what's offered can reflect what the state of the page is), that layers atop user session. That's totally different.
2. This opens an expectation that sites have a standard means of control available. This has two subparts:
2a. There's dozens of different API systems available, to pick from, to expose your site. Github got half way from rest to graphql then turned back. Some sites use ttrpc or capnweb or gproto. There hasn't actually been one accepted way for machines to talk to your site, there's been a fractal maze of offerings on the web. This is one consistent offering mirroring what everyone is already using now anyways.
2b. Offering APIs for your site has gone out of favor in general. It often has had high walls and barriers when it is available. But now the people putting their fingers in that leaky damn are patently clearly Not Going To Make It, the LLM's will script & control the browser if they have to, and it's much much less pain to just lean in to what users want to do, and to expose a good WebMCP API that your users can enjoy to be effective & get shit done, like they have wanted to do all along. If webmcp takes off at all, it will reset expectations, that the internet is for end users, and that their agency & their ability to work your site as they please via their preferred modalities is king. WebMCP directs us towards a rfc8890 complaint future, by directly enabling site agency. https://datatracker.ietf.org/doc/html/rfc8890

cadamsdotcom 4 months ago

Great to see people thinking about this. But it feels like a step on the road to something simpler.

For example, web accessibility has potential as a starting point for making actions automatable, with the advantage that the automatable things are visible to humans, so are less likely to drift / break over time.

Any work happening in that space?

egeozcan 4 months ago

As someone heavily involved in a11y testing and improvement, the status quo, for better or worse, is to do it the other way around. Most people use automated, LLM based tooling with Playwright to improve accessibility.
- cadamsdotcom 4 months ago
  
  I certainly do - it’s wonderful that making your site accessible is a single prompt away!
jayd16 4 months ago

In theory you could use a protocol like this, one where the tools are specified in the page, to build a human readable but structured dashboard of functionality.
I'm not sure if this is really all that much better than, say, a swagger API. The js interface has the double edge of access to your cookies and such.
thevinter 4 months ago

We're building an app that automatically generates machine/human readable JSON by parsing semantic HTML tags and then by using a reverse proxy we serve those instead of HTML to agents
jauntywundrkind 4 months ago

Chris Shank & Orion Reed doing some very nice work with accessibility trees. https://bsky.app/profile/chrisshank.com/post/3m3q23xpzkc2u
I tried to play along at home some, play with rust accesskit crate. But man I just could not get Orcas or other basic tools to run, could not get a starting point. Highly discouraging. I thought for sure my browser would expose accessibility trees I could just look at & tweak! But I don't even know if that's true or not yet! Very sad personal experience with this.
bavandersloot 4 months ago

There is a proposed extension in the repo that is getting some traction that automatically converts forms into tools. There is trouble in linking this to a11y though, since that could lead to incentivize sites to make really bad decisions for human consumers of those surfaces.

Flux159 4 months ago

This was announced in early preview a few days ago by Chrome as well: https://developer.chrome.com/blog/webmcp-epp

I think that the github repo's README may be more useful: https://github.com/webmachinelearning/webmcp?tab=readme-ov-f...

Also, the prior implementations may be useful to look at: https://github.com/MiguelsPizza/WebMCP and https://github.com/jasonjmcghee/WebMCP

politelemon 4 months ago

This GitHub readme was helpful in understanding their motivation, cheers for sharing it.
> Integrating agents into it prevents fragmentation of their service and allows them to keep ownership of their interface, branding and connection with their users
Looking at the contrived examples given, I just don't see how they're achieving this. In fact it looks like creating MCP specific tools will achieve exactly the opposite. There will immediately be two ways to accomplish a thing and this will result in a drift over time as developers need to take into account two ways of interacting with a component on screen. There should be no difference, but there will be.
Having the LLM interpret and understand a page context would be much more in line with assistive technologies. It would require site owners to provide a more useful interface for people in need of assistance.
- bastawhiz 4 months ago
  
  > Having the LLM interpret and understand a page context
  The problem is fundamentally that it's difficult to create structured data that's easily presentable to both humans and machines. Consider: ARIA doesn't really help llms. What you're suggesting is much more in line with microformats and schema.org, both of which were essentially complete failures.
  LLMs can already read web pages, just not efficiently. It's not an understanding problem, it's a usability problem. You can give a computer a schema and ask it to make valid API calls and it'll do a pretty decent job. You can't tell a blind person or their screen reader to do that. It's a different problem space entirely.

rgarcia 4 months ago

This is great. I'm all for agents calling structured tools on sites instead of poking at DOM/screenshots.

But no MCP server today has tools that appear on page load, change with every SPA route, and die when you close the tab. Client support for this would have to be tightly coupled to whatever is controlling the browser.

What they really built is a browser-native tool API borrowing MCP's shape. If calling it "MCP" is what gets web developers to start exposing structured tools for agents, I'll take it.

xg15 4 months ago

Yeah, this seems like a weird niche where an agent has to interact with an existing browser session.
That, or they expect that MCP clients should also be running a headless Chrome to detect JS-only MCP endpoints.

Garlef 4 months ago

I think this is a good idea.

The next one would be to also decouple the visual part of a website from the data/interactions: Let the users tell their in-browser agent how to render - or even offer different views on the same data. (And possibly also WHAT to render: So your LLM could work as an in-website adblocker for example; Similar to browser extensions such as a LinkedIn/Facebook feed blocker)

Raed667 4 months ago

Why would Facebook or LinkedIn ever give you this?

nip 4 months ago

The web was initially meant to be browsed by desktop computers.

Then came mobile phones with their small screens and touch control which forced the web to adapt: responsive design.

Now it’s the turn of agents that need to see and interact with websites.

Sure you could keep on feeding them html/js and have them write logic to interact with the page, just like you can open a website in desktop mode and still navigate it: but it’s clunky.

Don’t stop at the name “MCP” that is debased: it’s much bigger than that

notnullorvoid 4 months ago

Further bloating the web spec with something that won't be used in a couple years if at all.

charcircuit 4 months ago

This is coming late as skills have largely replaced MCP. Now your site can just host a SKILL.md to tell agents how to use the site.

ATechGuy 4 months ago

Interesting. I'd appreciate an example. Thanks!
- ednc 4 months ago
  
  check out https://moltbook.com/skill.md
  
  Spivak 4 months ago
  
  I really like how the shell and regular API calls has basically wholesale replaced tools. Real life example of worse-is-better working in the real world.
  Just give your AI agent a little linux VM to play around that it already knows how to use rather than some specialized protocol that has to predict everything an agent might want to do.
  
  esafak 4 months ago
  
  no workie
  
  charcircuit 4 months ago
  
  The link is still working for me.
  
  Aldipower 4 months ago
  
  From the skill: *NEVER send your API key to any domain other than `www.moltbook.com`*
  Is this a joke?
hnlmorg 4 months ago

The purpose of this appears to be for sites that cannot be controlled via prompt instructions alone.
I do like agent skills, but I’m really not convinced by the hype that they make MCP redundant.
- dionian 4 months ago
  
  seems like skill is a better interface, but state still needs to be externally managed, even if not using mcp as the protocol
fdefitte 4 months ago

Skills are great for static stuff but they kinda fall apart when the agent needs to interact with live state. WebMCP actually fills a real gap there imo.
- charcircuit 4 months ago
  
  What prevents them with working with live state. Coding agents deal with the live state of source code evolving fine. So why can't they watch a web page or whatever update over time? This seems to be a micro optimization that requires explicit work from the site developer to make work. Long term I just don't see this taking off versus agents just using sites directly. A more long term viable feature would be a way to allow agents to scroll the page or hover over menus without the user's own view being affected.
jasonjmcghee 4 months ago

It's not meant to describe how to use the site, it should / can replace the need for playwright and DOM inspection / manipulation entirely.
Think of it like an "IDE actions". Done right, there's no need to ever use the GUI.
As opposed to just being documentation for how to use the IDE with desktop automation software.
- charcircuit 4 months ago
  
  The beauty of how general these models are is that the site owner can choose.

vessenes 4 months ago

I’m just personally really excited about building cli tools that are deployed with uvx. One line, instructions to add a skill, no faffing about with the mcp spec and server implementations. Feels like so much less dev friction.

jayd16 4 months ago

Have any sickos tried to point AI at SOAP APIs with WSDL definitions, yet?

chopete3 4 months ago

Likely no.
Every generation needs its own acronyms and specifications. If a new one looks like an old one likely the old one was ahead of its time.

baalimago 4 months ago

Very cool! I imagine it'll be possible to start a static webserver + WebMCP app then use browser as virtualization layer instead of npm/uvx.

The browser has tons of functionality baked in, everything from web workers to persistence.

This would also allow for interesting ways of authenticating/manipulating data from existing sites. Say I'm logged into image-website-x. I can then use the WebMCP to allow agents to interact with the images I've stored there. The WebMCP becomes a much more intuitive way than interpreting the DOM elements

mcintyre1994 4 months ago

Wes Bos has a pretty cool demo of this: https://www.youtube.com/watch?v=sOPhVSeimtI

I really like the way you can expose your schema through adding fields to a web form, that feels like a really nice extension and a great way to piggyback on your existing logic.

To me this seems much more promising than either needing an MCP server or the MCP Apps proposal.

innagadadavida 4 months ago

Demo I built 5 months ago: https://www.youtube.com/watch?v=02O2OaNsLIk This exposes ecommerce specific tool calls as regular javascript functions as it is more lightweight than going the MCP route.
It's great they are working on standardizing this so websites don't have to integrate with LLMs. The real opportunity seems to be able to automatically generate the tool calls / MCP schema by inspecting the website offline - I automated this using PLayright MCP.

root_axis 4 months ago

Hmmm... so are we imagining a future where every website has a vector to mainline prompt injection text directly from an otherwise benign looking web page?

jasonjmcghee 4 months ago

In response to microphone or camera access proposals you could have said "so we're going to let every website have a vector to spy on us?"
This is what permissions are for.

DevKoala 4 months ago

Most teams that want their data to be operated programmatically expose an API. For who does this solve a problem?

diegof79 4 months ago

Mainly for web browser plugin authors implementing AI assistants (Gemini/Claude/OpenAI/Copilot).
Instead of parsing or screen-shooting the current page to understand the context, an AI agent running in the browser can query the page tools to extract data or execute actions without dealing with API authentication.
It's a pragmatic solution. An AI agent, in theory, can use the accessibility DOM to improve access to the page (or some HTML data annotation); however, it doesn't provide it with straightforward information about the actions it can take on the current page.
I see two major roadblocks with this idea:
1. Security: Who has access to these MCPs? This makes it easier for browser plugins to act on your behalf, but end users often don't understand the scope of granting plugins access to their pages.
2. Incentive: Exposing these tools makes accessing website data extremely easy for AI agents. While that's great for end users, many businesses will be reluctant to spend time implementing it (that's the same reason social networks and media websites killed RSS... more flexibility for end users, but not aligned with their business incentives)
- DevKoala 4 months ago
  
  But think about it. Will you do it for your web property? Is someone else going to do it for my web property when I have clearly blocked robots? Will I do it for another web property for my agent to work and hope they don’t update their design or protect themselves against it?

cedws 4 months ago

You could get rid of the need for the browser completely just by publishing an OpenAPI spec for the API your frontend calls. Why introduce this and add a massive dependency on a browser with a JavaScript engine and all the security nightmares that comes with?

curtisblaine 4 months ago

Because the nightmares associated with having an API, authentication, database, persistent server etc. are worse. If all you have is an SPA you shouldn't be forced to set up an API just to be called by an LLM.
drdaeman 4 months ago

I think API specs are a wrong problem to solve. It’s usually pretty easy to reverse engineer an API requests and responses from a frontend or network log. What’s hard and what an OpenAPI (or any API, but machine-readable specs tend to suffer most) spec would be typically missing is the documentation about all the concepts and flows for using this API in a meaningful manner.

jgalt212 4 months ago

Why would any content producer want to make it easier for AI bots to scrape their site? Perhaps that's too broad, but there's no way any free / open ad-supported content producer would ever want to support this. Someone needs to figure out microtransactions soon before all the content producers shut up shop, and the growth of open / openish knowledge grinds to a halt.

iririririr 4 months ago

Cannot wait to be able to have a browser that show me the web as if it were a gopher website and i don't have to deal with ever changing to worse JavaScript heavy UX.

This is true excitement. I am not being ironic.

Ronrey 4 months ago

Running MCP tools in production — the security gap isn't theoretical. The spec gives you a tool execution model with no opinion on who gets to call what, or how you scope access when tools span multiple services. WebMCP inherits all of that plus exposes it to every page visitor's browser. The protocol needs an auth and permissions story before it's a standard.

datadrivenangel 4 months ago

The problem with agents browsing the web, is that most interesting things on the web are either information or actions, and for mostly static information (resources that change on the scale of days) the format doesn't matter so MCP is pointless, and for actions, the owner of the system will likely want to run the MCP server as an external API... so this is cool but does not have room.

OtherShrezzing 4 months ago

I disagree. I run a sudoku site. It’s completely static, and it gets a few tens of thousands of hits per day, as users only download the js bundle & a tiny html page. It costs me a rounding error on my monthly hosting to keep it running. To add an api or hosted mcp server to this app would massively overcomplicate it, double the hosting costs (at least), and create a needless attack surface.
But I’d happily add a little mcp server to it in js, if that means someone else can point their LLM at it and be taught how to play sudoku.

dvt 4 months ago

I’m working on a DOM agent and I think MCP is overkill. You have a few “layers” you can imply by just executing some simple JS (eg: visible text, clickable surfaces, forms, etc). 90% of the time, the agent can imply the full functionality, except for the obvious edge cases (which trip up even humans): infinite scrolling, hijacking navigation, etc.

0x696C6961 4 months ago

In what world is this simpler than just giving the agent a list of functions it can call?
- Mic92 4 months ago
  
  So usually MCP tool calls a sequential and therefore waste a lot of tokens. There is some research from Antrophic (I think there was also some blog post from cloudflare) on how code sandboxes are actually a more efficient interface for llm agents because they are really good at writing code and combining multiple "calls" into one piece of code. Another data point is that code is more deterministic and reliable so you reduce the hallucination of llms.
  
  foota 4 months ago
  
  What do the calls being sequential have to do with tokens? Do you just mean that the LLM has to think everytime they get a response (as opposed to being able to compose them)?
  
  zozbot234 4 months ago
  
  LLMs can use CLI interfaces to compose multiple tool calls, filter the outputs etc. instead of polluting their own context with a full response they know they won't care about. Command line access ends up being cleaner than the usual MCP-and-tool-calls workflow. It's not just Anthropic, the Moltbot folks found this to be the case too.
  
  foota 4 months ago
  
  That makes sense! The only flaw here imo is that sometimes that thinking is useful. Sub-agents for tool calls imo make a nice sort of middle ground where they can both be flexible and save context. Maybe we need some tool call composing feature, a la io_uring :)
- dvt 4 months ago
  
  Who implements those functions? E.g., store.order has to have its logic somewhere.
  
  0x696C6961 4 months ago
  
  Those functions usually already exist, you just write light wrappers around them for the LLM.
Mic92 4 months ago

Do expose the accessibility tree of a website to llms? What do you do with websites that lack that? Some agents I saw use screenshots, but that seems also kind of wasteful. Something in-between would be interesting.
- dvt 4 months ago
  
  I actually do use cross-platform accessibility shenanigans, but for websites this is rarely as good as just doing like two passes on the DOM, it even figures out hard stuff like Google search (where ids/classes are mangled).
Garlef 4 months ago

Question: Are you writing this under the assumption that the proposed WebMCP is for navigating websites? If so: It is not. From what I've gathered, this is an alternative to providing an MCP server.
Instead of letting the agent call a server (MCP), the agent downloads javascript and executes it itself (WebMCP).

827a 4 months ago

I wonder how/if a protocol like this, or MCP in general, would perform better than just a standardized /SKILL.md similar to /robots.txt which defines all the things the site can do and how to do it.

TZubiri 4 months ago

What problem does this solve?

j45 4 months ago

MCP is cool, but it's too open ended security wise.

People should be mindful of using magic that has no protection of their data and then discover it's too late.

That's not a gap in the technology, it's just early.

kekqqq 4 months ago

Finally, I was hoping for this to be implemented in 2026. Rendered DOM is for humans, not for agents.

smetannik 4 months ago

This cancer even reached W3C...

jaredcwhite 4 months ago

Well there goes the neighborhood.

wongarsu 4 months ago

Now we just need a proxy server that automatically turns any API with published openapi spec into a WebMCP server, and we've completed the loop

behindsight 4 months ago

I'm building this. Initially it was to do codegen for tools/sdks/docs but will incorporate webmcp as part of it.
I wanted to make FOSS codegen that was not locked behind paywalls + had wasm plugins to extend it.

yksanjo 4 months ago

I've prepared a thoughtful reply saved to /Users/yoshikondo/HN_REPLY.md

   HN Thread Link: https://news.ycombinator.com/item?id=47037501

   Quick summary of my reply:
   - Your 70+ MCP tools show exactly what WebMCP aims to solve
   - Key insight: MCP for APIs vs MCP for consumer apps are different
   - WebMCP makes sense for complex sites (Amazon, Booking.com)
   - The "drift problem" is real - WebMCP should be source of truth
   - Suggested embed pattern for in-page tools