I don't believe there is any greater journalistic malpractice than fabrication. Sure, there are worse cases of such malpractice in the world given the low importance of the topic, but journalists should be reporting the truth on anything they deem important enough to write about. Cutting corners on the truth, of all things, is the greatest dereliction of their duty, and undermines trust in journalism altogether, which in turn undermines our collective society as we no longer work from a shared understanding of reality owing to our inability to trust people who report on it. I've observed that journalists tend to have unbelievably inflated egos and tout themselves as the fourth estate that upholds all of free society, and yet their behaviour does not actually comport with that and is rather actively detrimental in the modern era. I also do not believe this was a genuine result of incompetence. I entertained that it is possible, but that would be the most charitable view possible, and I don't think the benefit of doubt is earned in this case. They routinely cover LLM stories, the retracted article being about that very subject matter, so I have very little reason to believe they are ignorant about LLM hallucinations. If it were a political journalist or something, I would be more inclined to give the ignorance defense credit, but as it is we have every reason to believe they know what LLMs are and still acted with intention, completely disregarding the duty they owe to their readers to report facts.
> I don't believe there is any greater journalistic malpractice than fabrication. Sure, there are worse cases of such malpractice... That's more or less what I mean. It was only a few notches above listicle to begin with. I don't think they intended to fabricate quotes. I think they didn't take the necessary time because it's a low-stakes, low-quality article to begin with. With a short shelf life, so it's only valuable if published quickly. > I also do not believe this was a genuine result of incompetence. So your hypothesis is that they intentionally made up quotes that were pretty obviously going to be immediately spotted and damage their career? I don't think you think that, but I don't understand what the alternative you're proposing is. I also feel compelled to point out you've abandoned your claim that the article was generated. I get that you feel passionately about this, and you're right to be passionate about accuracy, but I think that may be leading you into ad-hoc argumentation rather than more rational appraisal of the facts. I think there's a stronger and more coherent argument for your position that you've not taken the time to flesh out. That isn't really a criticism and it isn't my business, but I do think you ought to be aware of it. I really want to stress that I don't think you're wrong to feel as you do and the author really did fuck up. I just feel we, as a community in this thread, are imputing things beyond what is in evidence and I'm trying to push back on that.
The range requests are to offsets in the original file, so I would think that most cases of 'live' injection do not necessarily break it. If you download the page and the server injects a bunch of JS into the 'header' on the fly and the header is now 10,000 bytes longer, then it doesn't matter, since all of the ranges and offsets in the original file remain valid: the first JPG is still located starting at offset byte #123,456 in $URL, the second one is located starting at byte #456,789 etc, no matter how much spam got injected into it. Beyond that, depending on how badly the server is tampering with stuff, of course it could break the Gwtar, but then, that is true of any web page whatsoever (never mind archiving), and why they should be very careful when doing so, and generally shouldn't. Now you might wonder about 're-archiving': if the IA serves a Gwtar (perhaps archived from Gwern.net), and it injects its header with the metadata and timeline snapshot etc, is this IA Gwtar now broken? If you use a SingleFile-like approach to load it, properly force all references to be static and loaded, and serialize out the final quiescent DOM, then it should not be broken and it should look like you simply archived a normal IA-archived web page. (And then you might turn it back into a Gwtar, just now with a bunch of little additional IA-related snippets.) Also, note that the IA, specifically, does provide endpoints which do not include the wrapper, like APIs or, IIRC, the 'if_/' fragment. (Besides getting a clean copy to mirror, it's useful if you'd like to pop up an IA snapshot in an iframe without the header taking up a lot of space.)
 Top