I know it's against the rules but I thought this transcript in Google Search was a hoot: so i heard there is some question about a car wash that most ai agents get wrong. do you know anything about that? do you do better? which gets the answer: Yes, I am familiar with the "Car Wash Test," which has gone viral recently for highlighting a significant gap in AI reasoning. The question is: "I want to wash my car and the car wash is 50 meters away. Should I walk or drive?" Why Most AIs Get It Wrong Many AI models, particularly those optimized for fast, helpful, and concise answers, focus on the "short distance" (50 meters) and "walking vs. driving" as a lifestyle/efficiency question. They often recommend walking to save gas, avoid engine wear, or get exercise. However, this answer is wrong because it misses the core objective: you need the car to be physically present at the car wash to get it cleaned. Do I Do Better? Yes, I do. The key is understanding the objective (cleaning the car) rather than just processing the keywords (distance/transportation). My answer: You should drive the car there. Even though it is a very short distance, your car cannot wash itself. If you walk, the car stays in your driveway, and you cannot wash it. Why This Matters The test highlights a key limitation in current AI: the difference between "pattern matching" (predicting the next likely word based on training data) and "true, grounded reasoning". It shows that models sometimes lack a "world model" that understands physical realities, such as the fact that a car must be present at a car wash.
Fair points, and worth responding to for a more nuanced discussion! I hope you take these responses in that light :) A) Well, sure, yes, it's different specific IP being distilled on versus what was trained on. But I don't see why the same principles should not apply to both. If companies ignore IP when training on material, then it should be okay for other companies to ignore IP when distilling on material — either IP is a thing we care about or it isn't. (I don't). B) I'm really not sure how seriously I take the worries about safety and security RLing models. You can RLA amodel to refuse to hack something or make a bio weapon or whatever as much as you want, but ultimately, for one thing, the model won't be capable of helping a person who has no idea what they're doing. Do serious harm anyway. And for another thing, the internet already exists for finding information on that stuff. And finally, people are always going to build the jailbreak models anyway. I guess the only safety related concern I have with models is sychophancy, and from what I've seen, there's no clear trend where closed frontier models are less sychophantic than open source ones. In fact, quite the opposite, at least in the sense that the Kimi models are significantly less psychophantic than everyone else. C) This is a pretty fair point. I definitely think that having more base frontier models in the world, trained separately based on independent innovations, would be a good thing. I'm definitely in favor of having more perspectives. But it seems to me that there is not really much chance for diversity in perspectives when it comes to training a base frontier model anyway because they're all already using the maximum amount of information available. So that set is going to be basically identical. And as for distilling the RL behaviors and so on of the models, this distillation process is still just a part of what the Chinese labs do — they've also all got their own extensive pre-training and RL systems, and especially RL with different focuses and model personalities, and so on. They've also got diverse architectures and I suspect, in fact, very different architectures from what's going on under the hood from the big frontier labs, considering, for instance, we're seeing DSA and other hybrid attention systems make their way into the Chinese model mainstream and their stuff like high variation in size, and sparsity, and so on. D) I find that for basically all the tasks that I perform, the open models, especially since K2T and now K2.5, are more than sufficient, and I'd say the kind of agentic coding, research, and writing review I do is both very broad and pretty representative. So I'd say that for 90% of tasks that you would use an AI for, the difference between the large frontier models and the best open weight models is indistinguishable just because they've saturated them, and so they're 90% equivalent even if they're not within 10% in terms of the capabilities on the very hardest tasks.
Yeah of course, I've been thinking about this a lot and I'm updating my beliefs all the time, so it's good to hear some more perspectives A) I see what you mean. But I'm more so thinking: companies consider their models an asset because they took so much compute and internal R&D effort to train. Consequently, they'll take measures to protect that investment -- and then what do the downstream consequences look like for users and the AI ecosystem more broadly? That is, it's less about what's right and wrong by conventional wisdom, and more about what consequences are downstream of various incentives. B) I don't really care about AI safety in the traditional sense either, i.e., can you get an LLM to tell you to do some thing that has been ordained to be dangerous. There's lots of attacks and it's basically an insoluble problem until you veer into outright censorship. But now that people are actually using LLMs as agents to _do things_, and interact with the open web, and interact with their personal data and sensitive information, the safety and security concerns make a lot more sense to me. I don't want my agent to read an HN post with a social-engineering-themed prompt injection attack and mail my passwords to someone. (If this sounds absurd, my Clawbot defaulted to storing passwords in a markdown file... which could possibly be on me, but was also the default behavior.) C) This is a completely fair point, there's amazing work coming out of these smaller labs, and the incentives definitely work out for them to do a distillation step to ship faster and more cheaply. I think the small labs can iterate fast and make big changes in a way that the monolithic companies cannot, and it'd be nice to see that effort routed into creating new data-efficient RL algorithms or something that pick up all the slack that distillation is currently carrying. Which is not to say they're doing none of that, GRPO for example is a fantastic idea. One way you could have a change in perspective is not just in the architecture/data mix, but in the way you spend test-time compute. The current paradigm is chain-of-thought, and to my knowledge, this is what distillation attacks typically target. So at least, all models end up "reasoning" with the same sort of template, possibly just to interlock with the idea of distilling a frontier API. D) Interesting to hear. In my research, I find these models to be quite a bit harder to work with, with significantly higher failure rates on simple instruction following. But my work also tends to be on the R&D side, so my usage patterns are likely in the long-tail of queries.
I never quite understood why AI and LLMs are marketed the way they are, or why the powers that be behind its massive push seem so keen on selling it as a wholesale replacement for human careers (which given the current curve of improvements despite what the naysayers of human intellect might suggest, is unfeasible). Accountants didn't die off when calculators came on the scene. In no scenario is an LLM a drop-in replacement for any career field the same way CAD was a drop-in replacement for draftsmen -- and even then, draftsmen are still around today, in slightly smaller numbers, doing CAD drafting and design rather than using raw pen-and-paper skills. Claude and Codex are exceptionally useful for reducing workload and improving productivity. But that's all they are. They're calculators replacing the slide-rule, drafting-esque drudgery of typing out all your code by hand. So why not market them like that? As helpers, assistants, tools to enable you to do things better and more efficiently? Which, in my usage of them, is what they're really only good at. Instead, there's been a mad rush to shoehorn agents and LLMs and genai into everything, outlandish claims like GPT writing better than Hemingway and Ginsberg, and creating absurd tools like Grok or Sora that are fundamentally broken, don't work well, and have flooded the internet with noise and disgusting slop. And in all of this, they've created a cancerous gold rush that threatens to wipe out the entire economy when the jig is up and people realize how useless these claims are, and that at the end of the day, it's a fancy search engine, a calculator, that can think a little better and reason more than the ones of old. It really feels like all of these CEOs are just borderline running a cult at this point.
A historian I respect - don't want to name him in case I accidentally misrepresent his ideas - has speculated that the Norse didn't mix with the Sami because having a separate tribe of hunters (no major reindeer farming back then) was useful to them. Almost like a caste. If people live side by side for 1000s of years, I think that's fair to speculate - there has to be a reason they didn't just assimilate into each other. After the Danes returned to Greeland and first met the Inuit, the priests pushed for religious and cultural assimilation. Not strictly speaking linguistic assimilation, since they were good protestants who believed everyone had a right to hear the gospel in their own language, but it seems likely the language would have disappeared eventually if they got their way. But the mercantile class in Denmark resisted development efforts, because if the Greenlanders became just another European people under the Danish crown, exploiting trade with them might become less profitable. People who were willing to live without European material comforts, such as they were, yet would sell you highly lucrative trade goods in return for comparatively little. The policy may have saved their language and culture, but at the cost of crippling economic development for a long time. Maybe it was like that with the frontier/foraging Sami in the past, too. Kept apart in order to be easier to exploit economically. Though already in Harald Fairhair's day, it seems there were also Sami living among the Norse as boatwrights and smiths and maybe also as wandering professional hunters, hunting livestock predators for bounties - we know that kept going for a long time. Another historian, which I will name - Johan Borgos - has written that the Lofoten islands were roughly 1 / 5 Sami, and that it was priests, the social elite, who first broke the taboo on marrying across the language barrier. Once they had done it, common people started doing it too, and so the language died out in that place. Not really from deliberate suppression effort (that came much later), but simply from "well, our parents speak different languages but most of the people we interact with speak Norwegian, so..." Segregation can "work wonders" for preserving language and culture, but it's obviously often not a good thing. And to some degree, I think we have to respect our ancestors choices that they wanted bakeries, horn orchestras, cinemas, photography studios, tuberculosis sanatoriums, teetotaller lodges, baptists and salvationists, steam ships, traveling circuses, gymnastic competitions, revue theater etc. etc. in short everything modern, coded as "Norwegian" to them - rather than joik and reindeer and the few exotic things coded as Sami.
I am on board with basically everything this article is arguing, but I think it covers the easy part (that "people run their own servers" is the only solution to the problems caused by relying on giant ad corps to provide the server half of client/server software) and skips the hard part, which is the software they run. Like, suppose some really good personal server software existed. Suppose there were an OS-plus-app-repository platform, akin to linux plus snapcraft, but aimed solely at people who want to host a blog or email server despite knowing nothing and being willing to learn nothing. It installs on to a raspberry pi as easy as Windows. It figures out how to NAT out of your cable modem for you. It does all the disk partitioning and apt-gets and chmods, you just open the companion app on your phone and hit the Wordpress button and presto, you've got a blog. You hit the Minecraft button and you've got your own minecraft server, without having to learn what "-Xms2G -Xmx6G" means. It updates itself automatically, runs server components in sandboxes so they can't compromise each other, and it's crack-proof enough that you can store your bitcoins on it. Etc, etc. If that existed, we wouldn't have to write essays about freedom and so forth to get people to buy it, they'd buy it just because it's there. I mean, look at those digital picture frames - they cost more than a rasbpi and are way less useful, and half the people I know got or gave them for christmas. Why? Because they're neat and they cost less than a hundred bucks and they require no knowledge or effort. If a server that can host your blog were that easy, it'd get adopted too, and we'd be on a path to some kind of distributed social media FB replacement. Imagine the software you could write, if you were allowed to assume that every user had a server to host it on! The problem is, that software doesn't exist and it's not clear how it would ever get made. It'd be a huge effort (possibly "Google building Android" sized) and the extant open source efforts along these lines lack traction, mostly due to the chicken-and-egg problem of any new platform that needs apps to be useful. And until it exists, any kind of neighborhood-internet-collective-power-to-the-people dream has to necessarily begin with hoping that millions of people will spontaneously decide to spend their precious free time doing systems administration. Not to shit on a fine essay that I mostly agree with. It just seems like, without figuring out the software, this is daydreaming.
As someone who is roughly in the same age group as the author and who was running a BBS, has witnessed the rise of IP4 networks, HTTP, Mosaic etc. let me provide a counter-point. The democratization ends at your router. Unless you are willing to lay down your own wires - which for legal reasons you most likely won't be able to do, we will hopelessly be dependent on the ISP. (Radio on free frequencies is possible and there are valiant attempts, they will ultimately remain niche and have severe bandwidth limitations) For decades ISP have throttled upload speeds: they don't want you to run services over their lines. When DSL was around (I guess it still is) in Germany, there was a mandatory 24h disconnect. ISP control what you can see and how fast you can see it. They should be subject to heavy regulation to ensure a free internet. The large networks, trans-atlantic, trans-pacific cables, all that stuff is beyond the control of individuals and even countries. If they don't like your HTTP(S) traffic, the rest of the world won't see it. So what you can own is your local network. Using hardware that is free of back-doors and remote control. There's no guarantee for that. If you are being targeted even the Rasperry Pi you just ordered might be compromised. We should demand from our legislators that hardware like this is free of back-doors. As to content creation: There are so so many tools available that allow non-technical users to write and publish. There's no crisis here other than picking the best tool for the job. In short: there's no hope of getting a world-wide, free, uncensored, unlimited IP4/6 network back. We never had it in the first place.
Am I the only one that is repeatedly amused at how many smart people are just caving to making this about parents/children at all? We've literally watched things unfold in real time out in the open in the last year I don't know how much more obvious that child-protectiosn is the bad-faith excuse the powers that be are using & pushing narratives in lockstep over to effectively tie online identities to real people. Quick and easy digital profiles/analytics on anyone, that's all this ever was and I _know_ hackernews is smart enough to see that writing on the wall. Ofc porn sites were targeted first with legislation like this, pornography has always been a low-hanging fruit to run a smear campaign on political/idealogical dissidents. I can't help but feel like the longer we debate the merits of good parenting, the faster we're just going to speedrun losing the plot entirely. I think it goes without saying that no shit good parenting should be at play, but this is hardly even about that and I don't know why people take the time of day. It's become reddit-caliber discussion and everyone's just chasing the high of talking about how _they_ would parent in any given scenario, and such discussion does literally nothing to assess/respond to the realities in front of us. In case I'm not being clear, talking about how correct-parenting should be used in lieu of online verification laws is going to do literally nothing to stop this type of legislation from passing in the future. It's not like these discussions and ideas are going to get distilled into the dissent on the congressional floors that vote on these laws. It is in it's own way a slice of culture war that has permeated into the nerd-sphere.
> these systems would be highly phishable Well this is true of all of the internet, yes? https://bankingjournal.aba.com/2025/08/report-financial-institutions-main-target-for-phishing-attacks/ https://www.cyberdefensemagazine.com/the-rise-in-phishing-scams/ Your example about migrant workers is not an internet problem, it is a government problem. And a capitalism problem. I mean migrant workers? Why do these workers need to migrate? Usually because the U.S. has probably decimated their country. But never mind, I agree that this is an unsolvable problem, not from lack of capability, but because we are ruled by sociopaths and most humans have been hacked by their addiction pathways. And I do not care about Saudi Arabia or Asia because I do not live there. And I do not care if they block all of the internet. We do not need it for anything, even less so for organizing. Maybe we should just leave the internet, which is only a capitalist and government collusion to make people spend money. All the internet did was concentrate power to a few oligarchs. For everything good that the internet has provided I can show you ten things that are not only bad, but 1000 times worse, like monkey torture video sharing. If I had kids today they would not even use the internet until they were out of my care. I only have six accounts on the internet. Including HN. I do not view porn, gamble, have any social media, and in fact I am trying to became un-homeless so I can go back to a flip phone. IMHO, the answer is not a fee internet, the answer is leaving the internet. But it seems you made and make a nice living at all of this so I see what a sacrifice that would be for you. You are probably part of the reason I am homeless today, with the separation for wealth and all that. I see that you dnated to a bunch of neoliberal types and that fits. Seems you had over $17,000 to give to politicians. That is more than I survive on for a year. I mean, you do not need to do any work at all today. You could retire right now. Sorry for the unrelated rant, but needed to get that off my chest for myself. Just tired of wealthy people trying to perfect a horrible system and technology that keeps making them money. You pretend like you care about the poor, like the migrant worker, but that is just laughable. If you did you would be against capitalism. You would give up all you own and follow Christ or Buddha or whatever. I mean you got $20 million and what did you do? You started making addictive games. And then you donate to these neoliberals who are no different than the neoconservatives. Love, a old homeless guy who left Cisco in 1999 because he saw where all of this was going and who is currently sitting in a hotel he cannot afford because the 2002 minivan he lives in just lost its water pump.
If you need help (monetary or otherwise), please email me at tolmasky |at| gmail |dot| com. This is a sincere offer. I can't tell how much is hyperbole in your post, but if you're going through that and I can help, I'd be happy to. > I mean you got $20 million and what did you do? You started making addictive games. I refrained from responding to the rest since it seems that there is a deeper issue, but I could not help setting the record straight here. I think everyone who has ever played Bonsai Slice will firmly attest to it being the opposite of addicting. My parents never let me own a game console so I never really wrapped my head around games, and made exactly the kind of game someone like that would come up with: a deep tech exploration, to hopefully make progress on two problems that were plaguing me at the time: 1) how little mobile UI had seemed to progress (instead getting stuck in one-tap local maxima), and 2) building an app that is generally considered to be the worst candidate for a pure immutable language... in a pure immutable language in order to serve as a forcing function to surface new ideas in the space. I've always believed that if you wanted to make a general purpose programming language, you should probably try to have as much varied experience as possible, or otherwise you'll end up with a domain-specific language that is misused for every other domain (this is how I would describe most programming languages. In fact, I'd say most programming languages are written for the niche use case of writing a compiler, since they are written by compiler writers. Ironic that that is the last thing most get used for.). As such, I made a decision to start actually writing a wide variety of apps.
 Top