User talk:Citation bot

You may want to increment {{Archive basics}} to |counter= 31 as User talk:Citation bot/Archive 30 is larger than the recommended 150Kb.

Archives

List of archives

Archive 0 (Early bug reports)
Archive 1 (May 2008 – Jun 2011)
Archive 2 (Jun 2011 – Nov 2015)
Archive 3 (Nov 2015 – Jul 2016)
Archive 4 (Jul 2016 – Oct 2016)
Archive 5 (Oct 2016 – Sep 2017)
Archive 6 (Sep 2017 – Oct 2017)
Archive 7 (Oct 2017 – Jul 2018)
Archive 8 (Jul 2018 – Aug 2018)
Archive 9 (Aug 2018 – Aug 2018)
Archive 10 (Sep 2018 – Oct 2018)
Archive 11 (Oct 2018 – Nov 2018)
Archive 12 (Nov 2018 – Jan 2019)
Archive 13 (Jan 2019 – Feb 2019)
Archive 14 (Feb 2019 – Mar 2019)
Archive 15 (Mar 2019 – Jun 2019)
Archive 16 (Jun 2019 – Jul 2019)
Archive 17 (Jul 2019 – Aug 2019)
Archive 18 (Aug 2019 – Oct 2019)
Archive 19 (Oct 2019 – Mar 2020)
Archive 20 (Mar 2020 – May 2020)
Archive 21 (May 2020 – Jul 2020)
Archive 22 (Jul 2020 – Sep 2020)
Archive 23 (Sep 2020 – Dec 2020)
Archive 24 (Dec 2020 – Apr 2021)
Archive 25 (Apr 2021 – Jun 2021)
Archive 26 (Jun 2021 – Aug 2021)
Archive 27 (Aug 2021 – Sep 2021)
Archive 28 (Oct 2021 – Dec 2021)
Archive 29 (Dec 2021 – Dec 2021)
Archive 30 (Dec 2021 – present)

This page has archives. Sections older than 90 days may be automatically archived by when more than 4 sections are present.

Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot. Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter. A 503 error means that the bot is overloaded and you should try again later – wait at least an hour.

Please click here to report an error.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

Consistent spacing

Status: new bug
Reported by: Abductive (reasoning) 03:24, 2 August 2021 (UTC)[reply]

What happens: bot added a date parameter in a ref with a space before every pipe, but did not include a space
Relevant diffs/links: https://en.wikipedia.org/w/index.php?title=53W53&type=revision&diff=1036681818&oldid=1036681278
We can't proceed until: Feedback from maintainers

I know this is a minor bug, but it bugs me. I know that the bot is written to make an attempt to duplicate the formatting already present in the ref. How it could have failed here, I don't know. But more importantly, it should default to the consensus ref formatting: space,pipe,parametername,=,parametervalue. (Spaces before pipes, no spaces around the equals signs or anywhere else, except perhaps before the curly end brackets if there already was a space there.) Abductive (reasoning) 03:24, 2 August 2021 (UTC)[reply]

I agree. The default should be space,pipe,parametername,=,parametervalue. --BrownHairedGirl (talk) • (contribs) 15:27, 2 August 2021 (UTC)[reply]

Cannot fix since the the bot already uses the existing citation template as a guide. Templates that are mixes in spacing such as these cannot be done in a way that makes everyone happy. AManWithNoPlan (talk) 16:45, 2 August 2021 (UTC)[reply]

But how to explain the example? The bot deviated from the format of the ref it edited? Abductive (reasoning) 16:59, 2 August 2021 (UTC)[reply]

I see, you want the bot to add spaces to existing parameters - in particular the last one. Interesting, the bot by default does not in anyway modify spacing of existing parameters. That parameter has no trailing spaces. As far as the bot in concerned there are no spaces before pipes, just spaces at the end of parameters. AManWithNoPlan (talk) 17:14, 2 August 2021 (UTC)[reply]

The bot must have looked at the lack-of-space of the last parameter (before the end curly braces) to come to the conclusion that the ref was formatted that way. Perhaps it should look after the "cite xxxx" for the cue? Abductive (reasoning) 17:51, 2 August 2021 (UTC)[reply]

not, that is not what it did. It simply does not change the spacing of existing parameters. The existing final parameter has no ending space, so the bot does not add one. AManWithNoPlan (talk) 21:14, 2 August 2021 (UTC)[reply]

Ah, I see what you are saying. It slotted it in at the end. Well, I had hoped that the bot could have provided a cure to the annoying new habit of users removing all spaces from refs, making a wall of text for editors. Abductive (reasoning) 22:25, 2 August 2021 (UTC)[reply]

And creates annoyingly unpredictable line wraps. Does this format really have consensus? If so, bots (any bot) could create a cosmetic function for citations they edit. -- GreenC 17:04, 6 August 2021 (UTC)[reply]

There are some people who like the "crammed" format. I started a conversation about the formatting here, but I don't really understand what they were saying. Abductive (reasoning) 02:06, 7 August 2021 (UTC)[reply]

As Abductive suggests, what the bot should do ideally is to check if the first parameter's pipe following the template name is preceded by a space (or even better, if at least one of the parameters' pipe symbol is preceded by space) and if it is, it should add a space in front of pipe symbol of newly inserted parameters, no matter where they are inserted into the parameter list. If the template has no parameters yet, the bot should fall back to the "default" format "space, pipe, parameter name, equal sign, parameter value" we consistently use in all CS1/CS2 documentation and examples. (Well, IMO, this latter format would ideally be made the only format used at all, but that's a discussion beyond the scope of CB issues here.)

Yeah, it is only cosmetic, but like Abductive I too find it somewhat annoying when previously perfectly formatted citations become misaligned by bot edits.

--Matthiaspaul (talk) 13:34, 7 August 2021 (UTC)[reply]

While I agree, this is actually going to be hard to implement. I will need to think about it. AManWithNoPlan (talk) 18:12, 8 August 2021 (UTC)[reply]

Still thinking about how to do this. It will have to deal with figuring out what the last parameter before adding a parameter to the very end, but no the middle. AManWithNoPlan (talk) 00:51, 4 September 2021 (UTC)[reply]

I ran into this same problem with my bot, I solved it by never adding a new parameter in the last position. It requires a function to determine what the second-to-last parameter is and assumes a library that supports placement of parameters. -- GreenC 18:25, 24 October 2021 (UTC)[reply]

Adds URL instead of Project MUSE parameter

Status: feature request
Reported by: — Chris Capoccia 💬 17:41, 20 November 2021 (UTC)[reply]

What happens: expanding doi 10.3751/69.3.12 adds URL, but seems like better choice would be to use Project MUSE template with id parameter, Project MUSE 586504
Relevant diffs/links: diff
We can't proceed until: Feedback from maintainers

URL is better unless the identifier auto-links. Nemo 22:10, 21 November 2021 (UTC)[reply]

what do you mean by "auto-links"? — Chris Capoccia 💬 15:22, 27 November 2021 (UTC)[reply]

Some identifiers can automatically add themselves to the title, when |muse-access=free is present with the |muse=12345678. AManWithNoPlan (talk) 23:02, 8 December 2021 (UTC)[reply]

Billboard refs

Is there a particular reason why the bot is changing references using the cite web templates for articles on Billboard's website to cite magazine? The print magazine is not being cited. Even when I corrected the change on an article, the bot came back and changed it to cite magazine again. -- Carlobunnie (talk) 01:00, 30 November 2021 (UTC)[reply]

Online magazines are still magazines. Headbomb {t · c · p · b} 01:07, 30 November 2021 (UTC)[reply]

Sure yes, but the cite web template is also still correct/applicable so where is the need to change it? Why doesn't the bot change all Time or Variety refs to the cite mag template also? My thing is that it's weird and inconsistent and unnecessary. -- Carlobunnie (talk) 00:06, 1 December 2021 (UTC)[reply]

italics

Why does citation bot add italicization to Associated Press and Reuters as seen here? Our own articles about those news agencies don't italicize them. — Fourthords | =Λ= | 02:58, 24 December 2021 (UTC)[reply]

Hello, fourthords,

You might try asking the bot operator. Being a bot, it won't be replying to inquiries here. Liz ^{Read! Talk!} 03:43, 24 December 2021 (UTC)[reply]

Since my inquiry was about this bot's edits, this seemed the most appropriate place to ask. Apparently plenty of editors (and possibly the bot's programmer, somewhere) are watching this page. — Fourthords | =Λ= | 17:22, 24 December 2021 (UTC)[reply]

Because |agency= is to be used when the work of Reuters or AP (and other agencies) is republished in another publisher's work (typically a newspaper). When Reuters or AP (and other agencies) is cited directly, then the source is the 'work'. We cite the work not the corporate entity. The en.wiki articles are not italicized because the articles are about the corporate entities. In both of these cases, the corporate entities have eponymous websites that are the sources so those names go in |work= when citing their articles directly.

—Trappist the monk (talk) 03:58, 24 December 2021 (UTC)[reply]

Should, then, this script not be performing its edits in contravention of this bot? — Fourthords | =Λ= | 17:22, 24 December 2021 (UTC)[reply]

The Associated Press is an organization, not a collection of documents, and should therefore be listed under |via= or |publisher=, not |work=. Organization names are not italicized; periodicals, edited volumes, websites, or other collections of documents are. The current name of the collection of documents that the Associated Press publishes appears to be AP News. If "Associated Press" is being used in the work parameter, it is being used incorrectly there. If the bot is moving "Associated Press" to the work parameter without changing it to "AP News" or some similar name for the work rather than the organization, it is doing the wrong thing and should stop. —David Eppstein (talk) 18:59, 24 December 2021 (UTC)[reply]

Ah, that seems to be in contravention of what Trappist the monk (talk · contribs · blocks · protections · deletions · page moves · rights · RfA) said, [AP and Reuters] have eponymous websites that are the sources so those names go in |work= when citing their articles directly. Is there an explicit MOS or guideline that says one way or another, then? — Fourthords | =Λ= | 19:19, 24 December 2021 (UTC)[reply]

I presume that Editor David Eppstein did not intend to write: Organization names are not capitalized (emphasis added)

Editor David Eppstein and I rarely agree on anything but in this case, for the most part, I think that we agree. The Associated Press is an organization, my term was 'corporate entity'. We don't cite organizations or corporate entities, we cite their work. The Associated Press has an online presence at AP News (I hadn't bothered to look – Reuters has an eponymous online presence). That name for the collection of documents is italicized when one of the documents that it holds is cited. AP News is sufficiently similar to the corporate name that it is not necessary to write |publisher=The Associated Press (|via=The Associated Press should not be used for work distributed from AP News because AP News is the publisher's outlet).

I do not know of any MOS or guideline covering this though the topic is surprisingly volatile with entrenched camps on both sides of the italic/no-italic divide. There is some, reasonably stable text at Help:Citation Style 1 § Work and publisher.

—Trappist the monk (talk) 20:19, 24 December 2021 (UTC)[reply]

Typo fixed; I meant "italicized" not "capitalized". —David Eppstein (talk) 20:23, 24 December 2021 (UTC)[reply]

And failure is the usual option again

I really don't see the point of encouraging editors to install the gadget when most uses of it end in failure, which has been my experience every time I've tried to use it this week. At the same time, I have no problems running refill, so it is not toolforge. Can we have a completely separate process instance for gadget users please?, and let the batch runners fight it out between themselves. --John Maynard Friedman (talk) 12:57, 24 December 2021 (UTC)[reply]

"it is not toolforge": that it is incorrect. When Refill is being used by several people, it fails also. That tool has way too many bugs to run as a bot, which is why it is used so much less. AManWithNoPlan (talk) 14:47, 24 December 2021 (UTC)[reply]

a second instance would be very nice just for the gadget. Someone with access to toolforge would have to do it. Once spawned, I could modify the code to refuse non-gadget runs on that interface. AManWithNoPlan (talk) 14:51, 24 December 2021 (UTC)[reply]

and a third instance would be very nice for single pages too. AManWithNoPlan (talk) 15:26, 24 December 2021 (UTC)[reply]

If you were to create another bot, for argument let's call it CitationBatchBot, code-identical to the current bot, would that de facto create another instance without needing to hack toolforge? The very few batch users could be 'persuaded' to use that one, leaving the original for gadget and command-line (sic?) editors. True? --John Maynard Friedman (talk) 20:06, 24 December 2021 (UTC)[reply]

Someone with access would need to do this. I do not have access and my toolforge account is in some weird limbo state and unusable. AManWithNoPlan (talk) 20:10, 24 December 2021 (UTC)[reply]

Is there a page where I can request other editors to run the bot for articles I edit? I think it will work if I can put the my(our) request in the to-do list of batch runners.--SilverMatsu (talk) 02:56, 7 January 2022 (UTC)[reply]

Feature: usurped title

Hi, Example diff (second change). When a domain has been usurped it is replaced with |url-status=usurped and sometimes the |title= also contains usurped content. My bot is able to detect keywords that indicate a usurped title, but unable to determine a correct replacement title, so it adds the placeholder |title=usurped title. This placeholder was also discussed at CS1|2 help. It would be great if Citation bot was able to recognize the placeholder and fill in a better title. It would require using |archive-url= as the source since the |url= is usurped. There are not a huge number (197) but they are growing indefinitely due to the WP:JUDI case. -- GreenC 19:31, 27 December 2021 (UTC)[reply]

Good point, @GreenC.

I had been thinking about a similar issue: InternetArchiveBot's addition of |title=Archived copy when it archives a URL which lacks a title. That usage is categorised in Category:CS1 maint: archived copy as title, which currently contains over 160,000 articles.

In both cases, a remedy will require analysis of the archived copy of the page. Whether the generic title is |title=Archived copy or |title=usurped title, the same remedy is required ... so the two should be treated as one task.

I would much prefer that this was done by a new standalone bot, rather than incorporated into Citation bot. Citation bot is way overloaded even with its current task set. Adding in the huge backlog of generic titles would swamp Citation bot.

And in any case, I don't see any overlap between this task and Citation bot's other functions. This generic titles task does not need to add a cite template or change its type; all it needs to do is to change the value of |title=. The lookup of the archived title is not part of Citation bot's current capabilities.

So there is no benefit to including this in Citation bot, only downsides. This needs a new bot. I suggest a request at WP:BOTREQ. BrownHairedGirl (talk) • (contribs) 05:02, 28 December 2021 (UTC)[reply]

Dealing with the backlog could have its own dedicated bot, but there's no reason not to have this covered by Citation bot. Headbomb {t · c · p · b} 05:46, 28 December 2021 (UTC)[reply]

@Headbomb:: as I explained above, there are two good reasons not to have this covered by Citation bot:

Citation bot is already way overloaded. Adding another task will make that worse.
Direct lookup of title has never been part of Citation bot's function. It does lookup only indirectly, through the Zotero servers. BrownHairedGirl (talk) • (contribs) 06:58, 28 December 2021 (UTC)[reply]

If Citation bot wants to do "Archived copy" as well, great. If the concern is someone will submit a 160k job and swamp the system, blacklist the tracking category as input and let the bot do it incidentally while doing other jobs. While it whittles away. People have talked about a dedicated title bot forever. Citation bot already does titles (I think?), getting it from Wayback Machine is the same: <title>Page Title</title>. -- GreenC 06:19, 28 December 2021 (UTC)[reply]

@GreenC: your suggested backgrounding of this task would involve a lot of extra programming to Citation bot, which may not be compatible with its existing queue structure. And even if backgrounded, it would still be adding load to an already overloaded tool. In the last six months, there have been many discussion here about how to reduce that overload; adding to the load only make te prob worse, and your idea of

Blacklisting the tracking category would be a truly terrible idea: it would lock Citation bot out of any work needed on ~3% of all en.wp articles.

As above, Citation bot gets titles from the Zotero servers. Direct lookup from the Wayback Machine would be new functionality ... and the Zoteros are also very overloaded, so even if they could be pointed at the Wayback Machine, that would just exacerbate the overload.

There is zero advantage to bolting this function onto Citation bot, because it would all be new functionality; and there are huge downsides. This needs a new standalone bot ... and I think I may know the person who can do it. BrownHairedGirl (talk) • (contribs) 07:16, 28 December 2021 (UTC)[reply]

Blacklisting the category means that no one could request that the bot runs on 160K articles at once. This is already done. As for "would involve a lot of extra programming to Citation bot", let's let AManWithNoPlan decide on how feasible this is, since he's the coder. Headbomb {t · c · p · b} 08:46, 28 December 2021 (UTC)[reply]

@Headbomb: As above, blacklisting the tracking category would lock Citation bot out of any work needed on ~3% of all en.wp articles. That would be highly disruptive.

In the last 5 months, there have been repeated discussions about how overloaded Citation bot is. The latest such thread (see above: #And failure is the usual option again) was started less than 4 days ago by @John Maynard Friedman, who is understandably miffed at the lack of spare capacity in Citation bot. John wants clones of Citation bot to spread the load; that's a great idea in theory, but the bot maintainer has explained yet again why that is very unlikely to happen. So, dumping a backlog of 180k pages onto Citation bot is a recipe for perma-bottleneck. Even if the backlog was throttled to 500 pages per day, that's 360 days of increased overload.

So why on earth not just give this job to a separate bot? BrownHairedGirl (talk) • (contribs) 09:09, 28 December 2021 (UTC)[reply]

"As above, blacklisting the tracking category would lock Citation bot out of any work needed on ~3% of all en.wp articles." No it would not. You do not seem to understand the concept of a category blacklist here: The forbidding of doing a dedicated run on Category:Foobar. Headbomb {t · c · p · b} 09:13, 28 December 2021 (UTC)[reply]

@Headbomb: Thanks for finally explaining what the narrow meaning which you placed on the concept of "category blacklist" in this context.

Any such crude "category blacklist" would be pointless if implemented a ban on any dedicated run on Category:Foobar, because:

It is superfluous. The maximum allowed size for category jobs is 550, and tracking categories for these two generic titles already exceed that size.
A ban on simply throwing the category name at Citation bot would be easily circumvented simply by listing the pages in the webform, or by using the "linked pages" feature on the webform.

The bottom line here is very simple. Citation bot is massively overloaded, and adding a big extra task will exacerbate that overload ... so give the job to another bot. BrownHairedGirl (talk) • (contribs) 09:23, 28 December 2021 (UTC)[reply]

"adding a big extra task will exacerbate that overload" Tasks are given manually, subject to the usual limits. There's zero reasons why this task should be any lower priority than any other, save for your personal preference that other work be done instead. Things are not zero sum games, if another bot wants to tackle this, great, but there's zero reason to kneecap Citation Bot's usefulness on account of another bot. Headbomb {t · c · p · b} 09:26, 28 December 2021 (UTC)[reply]

@Headbomb: this is not complicated, and the relevant afcst are not a matter of "personal preference":

Citation bot's capacity is already roughly fixed, and we are at the limit.
The other tasks which Citation bot is doing are almost entirely tasks which for which Citation bot is the only available bot.

So this is a sort of zero sum game. If we add an extra task to an overloaded bot, some of the existing tasks will suffer.

That is why I argue for a separate bot for the extra task: because it allows all the tasks to be completed faster, by increased the sum of throughput. It is quite bizarre that you choose to dismiss as "personal preference" my call for this extra task to be handled in a way that does not exacerbate a well-documented overload problem. BrownHairedGirl (talk) • (contribs) 09:45, 28 December 2021 (UTC)[reply]

Anyone is free to code an additional bot. Which is not an argument to kneecap this one's usefulness because you don't personally want it to do certain types of edits. Headbomb {t · c · p · b} 09:47, 28 December 2021 (UTC)[reply]

Sigh. @Headbomb, please drop the aggressive hyperbole and the violent imagery. It is uncivil and disruptive. You have recently poisoned another discussion elsewhere with similar tactics; please refrain from repeating those hyperbolic falsehoods here.

Nobody, least of all me, is proposing to "kneecap" Citation bot. No reduction in its functionality is being proposed by anyone, let alone me. Kneecapping is a form of violent maiming which causes a severe reduction in capability, so labelling my objections as "kneecapping" is nonsense.

I have no objection in principle to adding extra functionality to Citation bot. My strong objection is purely pragmatic: that it would add an extra huge task to an already massively-overloaded bot. That would mean either glacially slow progress on the new task, or an worse bottleneck on the existing task. That is why I why I prefer to address the new task by creating new capacity.

I am surprised that you seem so determined to ignore the fact that Citation bot is already overloaded, or why you express that denialism by using such unpleasant imagery to misrepresent me ... but please stop. BrownHairedGirl (talk) • (contribs) 10:01, 28 December 2021 (UTC)[reply]

"violent imagery" You really need to have a major WP:AGF/reality check re-calibration if you think any of what I said is related to the literal meaning, and not, and that should be patently obvious, its metaphorical meaning. Headbomb {t · c · p · b} 10:04, 28 December 2021 (UTC)[reply]

On the contrary, @Headbomb: you need to have a dictionary check.

Try e.g. Merriam-Webster, Collins, The Free Dictionary, Dictionary.com, or Lexico: all define the word "kneecapping" as an act of violence against the person.

And it is you who needs a major WP:AGF/reality check re-calibration. You have falsely accused me of seeking to create grave injury to Citation bot. That is patently false: there is no way in which anything I have written in this thread could be reasonably or plausibly interpreted in that way.

You use of violent imagery to misrepresent me is bad enough. But the fact you then then falsely accuse me of needing a reality check is a form of gaslighting. Please stop your vicious bullying tactics. BrownHairedGirl (talk) • (contribs) 10:28, 28 December 2021 (UTC)[reply]

PS your claim of metaphorical usage alters nothing. It is a violent metaphor, which is most unpleasant ... and even in its mildest meaning it is in no way a fair description of what I propose.

Please drop the hyperbole and the violent imagery. BrownHairedGirl (talk) • (contribs) 10:32, 28 December 2021 (UTC)[reply]

Without getting into the dubious ethics of using inappropriate choice of language (or choice of inappropriate language), the reality right now is that CitationBot is frequently, almost usually, unusable by single-use requestors. I have argued and continue to argue that the batch runs need to be restrained in some way at least until we can get separate instances of the bot. There may be some resolution in sight, see Wikipedia:Village pump (technical)#Is there a ToolForge doctor in the house? CitationBot could use some help. Headbomb may have an argument that a batch run is a batch run is a batch run, so what makes their batch run any less deserving that anyone else's (apart from being a machine-gun to kill grass-hoppers). Right now, it is simply and wildly unrealistic to add another batch run to the load: it won't achieve its own objectives in any useful timescale; it will mean that the same happens to the other batch runs; and it guarantees that individual articke requests will invariably fail rather than just usually. There is a WP article about that attitude: WP:DISRUPTIVE. Maybe it is not fair to be labelled as disruptive for just being the one who loaded the last straw, but tough. Headbomb, you need to find another bot that will do what you need, this is not an argument worth winning. --John Maynard Friedman (talk) 14:00, 28 December 2021 (UTC)[reply]

Everyone wins when Citation bots get more useful. People choose what batch run they submit. If you don't want your batch run to be used for 'archived title', simply don't submit one. Headbomb {t · c · p · b} 17:35, 28 December 2021 (UTC)[reply]

What if you can't run a batch job at all, or can't get an individual page processed, because someone else has swamped Citation bot with this extra task? What do you do about that? BrownHairedGirl (talk) • (contribs) 17:42, 28 December 2021 (UTC)[reply]

You wait a bit, and your task gets processed. Unless you want to get greedy and submit multiple batch runs, then you have to wait till your first batch run is processed. No different than the current situation. There's nothing special about this 'extra task'. Headbomb {t · c · p · b} 17:51, 28 December 2021 (UTC)[reply]

Not so. The special thing about this extra task is that unlike Citation bot's existing tasks, it doesn't have to be done using Citation bot.

It is interesting to see that you describe my submission of multiple batches of high-return bare URL cleanup jobs as "greedy". BrownHairedGirl (talk) • (contribs) 18:03, 28 December 2021 (UTC)[reply]

edit conflict No, Headbomb, that does not happen. You click the gadget, wait for five minutes and you get a message to say that it has failed. So you resubmit, wait, same result. And again. And again. So you stop bothering. It seems that batch runs don't fail, they just smell that way. Batch runs are disruptive to ordinary editors right now and the more of them that run, the more disruptive they are - and just get in each other's way too but of course that doesn't matter when you can fire and forget. Meanwhile in the real world... --John Maynard Friedman (talk) 18:08, 28 December 2021 (UTC)[reply]

And it eventually gets processed. See for example this run. I requested it at 5:45am or so, then got the timeout page. And then at 6am it started being processed. Sometimes it takes hours, sometimes it's minutes, but batch runs do get processed. Submitting one over and over and over serves no purpose. Headbomb {t · c · p · b} 18:55, 28 December 2021 (UTC)[reply]

Calling the caped crusader

@Rlink2: if we give you a snazzy cape and your own special car, please can you help out here? Your mission is to make a new bot which takes any CS1/CS2 template with |title=Archived copy or |title=usurped title, looks up the linked archived copy, and extracts a meaningful title from the contents of <title>Page Title</title>.

Please come to our rescue! Your reward in gold will be in the usual place --BrownHairedGirl (talk) • (contribs) 07:28, 28 December 2021 (UTC)[reply]

@BrownHairedGirl: @GreenC: Sorry for the delay, and thanks for the humor. I just created a script that can handle this. It will be able to extract the URL titles from web.archive.org and ghostarchive.org. Webcite I can do when the site is back up. There are some edge cases i am yet have to code in but it works mostly. See diffs: Special:Diff/1062447127, Special:Diff/1062447385, Special:Diff/1062447728 and Special:Diff/1062447750,.

Note that it would not work for PDF files, such as the one in 2G spectrum case. It will not work with archive.today URLs either (This is not my fault), so GreenC when replacing links always try to use web.archive.org if placing "archived title". Rlink2 (talk) 14:35, 28 December 2021 (UTC)[reply]

Holy moley, @Batman, that was fast!

I checked all 4 diffs, and in each case the result looks good. It is not perfect, because some of the websites abuse the title field to advertise the site, like <title>Storms lash Hightown {{!}} ZYX News, the leading local news service for YOUR town!</title> ... which causes the cite template to have |title=Storms lash Hightown | ZYX News, the leading local news service for YOUR town! rather than just |title=Storms lash Hightown.

But coding to strip this sort of junk is a huge job, and in my view it's much better to have such a verbose title than to just have a generic placeholder. Editors can manually trim such fluff if they have the time.

One enhancement would be useful if it's not too much work: add a |website= parameter, unless it (or |work=/|magazine=/|newspaper= is already present. It seems to me that this should be a relatively easily-coded enhancement, but please ignore this request if it's too much hassle.

Thanks again for the very prompt response. BrownHairedGirl (talk) • (contribs) 14:59, 28 December 2021 (UTC)[reply]

@BrownHairedGirl: I can add the website parameter, but I do not know much about it. Does the name of the website go there? Documentation regarding this would be helpful Rlink2 (talk) 15:06, 28 December 2021 (UTC)[reply]

@Rlink2: see Template:Cite_web#Website. Basically, a simple and widely-used way of filling it is just to use the domain name, i.e. the text between the 2nd and 3rd slashes in the URL, e.g. in |, just use "www.example.com": |website=www.example.com.

Editors or other bots may later replace that with something more informative (e.g. Citation bot will replace |website=www.washingtonpost.com with |newspaper=[[The Washington Post]]) ... but |website=www.washingtonpost.com is way more useful than no website field. BrownHairedGirl (talk) • (contribs) 15:21, 28 December 2021 (UTC)[reply]

Do not use the domain name in the |website= field. I have been admonished by multiple people don't do this. If you are unsure ask at Help talk:Citation Style 1 first. -- GreenC 16:12, 28 December 2021 (UTC)[reply]

@GreenC: I have used the domain name in thousands of refs. WP:Reflinks does it automatically. It's better than having nothing to identify the website. BrownHairedGirl (talk) • (contribs) 17:26, 28 December 2021 (UTC)[reply]

Actually I think it was on this talk page that someone told me not to do it because I made a feature recommendation that Citation bot could do it. Is reflinks still well maintained? Old tools do things that no longer have good support and no one actively fixing them. Update: here it is: User_talk:Citation_bot/Archive_28#Adding_website_field -- GreenC 18:06, 28 December 2021 (UTC)[reply]

Not adding the domain name and thereby leaving the website field as blank or missing is letting the best be the enemy of the good. BrownHairedGirl (talk) • (contribs) 18:14, 28 December 2021 (UTC)[reply]

PS your update crossed with my post. So, one editor objected, with no claim of any consensus for their view, let alone evidence. BrownHairedGirl (talk) • (contribs) 18:17, 28 December 2021 (UTC)[reply]

No doubt, but there is established controversy so clarification at Help talk:Citation Style 1 would be advisable before making mass edits to avoid the blow back it might cause. Particularly without bot approval. -- GreenC 18:33, 28 December 2021 (UTC)[reply]

@GreenC: I understand your desire to avoid drama. But, as a general principle, it is horribly bureaucratic to be pushed to debate every step of incremental progress against those who who prefer no progress to an incomplete improvement. BrownHairedGirl (talk) • (contribs) 19:15, 28 December 2021 (UTC)[reply]

I agree with @BrownHairedGirl:'s approach to Wikipedia 100 percent. I also believe in the ideas of incremental improvement even if the solution isn't perfect all the time. Rlink2 (talk) 19:26, 28 December 2021 (UTC)[reply]

A bot operator would be wise to check out why this admin said don't do it, before proceeding at a mass scale. There might be a consensus discussion somewhere we don't know about. I would not assume the admin was being bureaucratic to avoid progress. -- GreenC 19:33, 28 December 2021 (UTC)[reply]

I agree with BHG's analysis and conclusion. Furthermore, the documentation for {{cite web}} places no such constraint on what may be given in website=, though none of the examples use the fully qualified domain. (What they do do is give an example argument that matches more to my conception of work=, giving website=Encyclopedia of Things, which is surely a work.) Take a contrarian example: Amazon Inc has multiple websites, simplistically language but also in product offering. So it is certainly useful, probably important, to know that the website is amazon.de, amazon.es, or amazon.co.uk. I don't know who decided that website=<domain> is deprecated but it is not policy and if its a rule, it is certainly one to be ignored when the circumstances suggest otherwise, as they do in this case. --John Maynard Friedman (talk) 19:36, 28 December 2021 (UTC)[reply]

Rlink2, given this particular example, the script should catch | and swap it to {{!}}. Izno (talk) 00:21, 29 December 2021 (UTC)[reply]

Thanks for the heads up Izno. I believe the script already does this, but I will double check to make sure. Rlink2 (talk) 00:30, 29 December 2021 (UTC)[reply]

It is technically trivial to extract the title field from a page. And completely difficult to make sure that title string is appropriate for use on Wikipedia. That's why no one does it. Go slowly. Build up rules on what kind of material to keep and keep-out based on experience. -- GreenC 16:11, 28 December 2021 (UTC)[reply]

I will, thanks for the tip. I always go slow at first with every new thing I do. This is no different. Rlink2 (talk) 16:24, 28 December 2021 (UTC)[reply]

|website=domain is problematic for a huge variety of reasons. For example, if [1] were archived, we wouldn't want |website=mdpi.com, but rather |journal=Religions. Or if this were archived, we'd don't want |website=books.google.ca. Headbomb {t · c · p · b} 20:05, 28 December 2021 (UTC)[reply]

Hard cases make bad law. url=books.google.abc is an obvious exception that goes in the exclusion list. Ditto Archive.org, archive.is, archive.today etc. Academic publishers like mdpi, Wiley etc - well, yes, what is so terrible about the first pass giving these as the website=, after all that is indeed the relevant website – does the journal Religions have any other (or rather one more relevant)? A 'first pass, low impact' bot can deal with the 95% that are straight-forward and tag the others for CitationBot to give the deep-cleanse treatment, finding the doi etc etc. --John Maynard Friedman (talk) 20:46, 28 December 2021 (UTC)[reply]

The point is those 'hard cases' are extremely common. As are cases like

J. I. Friedman. "The Road to the Nobel Prize". Huế University. Archived from the original on 2008-12-25. Retrieved 2008-09-29.

which should not get |website=hueuni.edu.vn. Headbomb {t · c · p · b} 21:31, 28 December 2021 (UTC)[reply]

The above should not be a {{cite web}} but rather a different cite template. "Garbage in, garbage out" should not be a determining factor on what to try and not try. Jonatan Svensson Glad (talk) 22:13, 28 December 2021 (UTC)[reply]

Cite web is exactly the template appropriate for this. Headbomb {t · c · p · b} 22:49, 28 December 2021 (UTC)[reply]

Hmm, didn't actually open the link (that was my bad), I thought it would have been a journal article. However, I don't see the issue with adding |website=hueuni.edu.vn as long as no |publisher= would have existed. Jonatan Svensson Glad (talk) 00:07, 29 December 2021 (UTC)[reply]

@Headbomb: On what basis could you say that |website=hueuni.edu.vn is wrong? That is exactly where the story was first posted. [And if we already have that much info in the citation, what is there left for a bot to do?]

So the result would be

J. I. Friedman. "The Road to the Nobel Prize". hueuni.edu.vn. Huế University. Archived from the original on 2008-12-25. Retrieved 2008-09-29.

Apart from the redundant detail, what is so terrible about it? (Personally I have yet to see any value added by the website= option and actually am more worried that it so open to domain spoofing, where the actual url= says myscambank.com and the website= says myfriendlylocalbank.com and we walk an unsuspecting visitor into a trap and get sued, but I guess that is an argument for another place.) --John Maynard Friedman (talk) 01:24, 29 December 2021 (UTC)[reply]

TBH, Headbomb, your example strikes me as a straw man. What problem are we trying to solve here? Let's suppose we have a citation like

{{cite web |title=Foundation of Smallville |url=https://www.smallvillehistory.ky.us |website=Smallville History Society }}

which yields

"Foundation of Smallville". Smallville History Society.

but the web page fell into disuse, the domain registration was not renewed and a gambling site reregistered it to redirect to their site. So what we really want to is mark it as dead and get the last good archived copy (can that be done automatically?) so that vistors only see the archived version and not the redirect site. Yes, we definitely don't want to introduce a website=smallvillehistory.ky.us because that leads to the gambling site, but who says we need to supply one if it is not already present? In fact we need to remove anything like a website=<domain name> because we know it is invalid. --John Maynard Friedman (talk) ~~01:54, 29 December 2021 (UTC)~~ revised 02:13, 29 December 2021 (UTC)[reply]

|website=Smallville History Society is dead wrong. The Smallville History Society is not a website. It's the publisher. Headbomb {t · c · p · b} 02:19, 29 December 2021 (UTC)[reply]

That is a bizarre statement.

|website=Smallville History Society does not assert that "Smallville History Society" is a website. It asserts that the URL is on the website of the Smallville History Society, which is branded as "Smallville History Society". BrownHairedGirl (talk) • (contribs) 02:31, 29 December 2021 (UTC)[reply]

That's very literally what it asserts. Headbomb {t · c · p · b} 10:29, 29 December 2021 (UTC)[reply]

I would have used |work=News & events for the Hue University example citation: J. I. Friedman. "The Road to the Nobel Prize". News & events. Huế University. Archived from the original on 2008-12-25. Retrieved 2008-09-29. It's not a particularly helpful part of the citation, but its not as bad as putting the url as the work and it preempts other editors from doing the wrong thing. —David Eppstein (talk) 02:05, 29 December 2021 (UTC)[reply]

Except that News & events not the work. There is no publication, nor work, called that. It's a section of the main university website. Headbomb {t · c · p · b} 02:18, 29 December 2021 (UTC)[reply]

It is the "News & events" section of the main university website. That's why it says "News & events" on the page, right above the title of the individual page, and why it has "News & events" listed as one of the main sections of the university website in the link bar at the top of the page. It is the highest-level point of organization of web content that is in any way useful to distinguish from the organization that published the content. And as I tried to say in my earlier comment that I replied to but you seem to have missed, the point is less to name the whole website and more to find something plausible to use for that slot so that bad editors do not fill it with bad content like the url hostname. —David Eppstein (talk) 07:22, 29 December 2021 (UTC)[reply]

"more to find something plausible" which is exactly what it shouldn't do. If you click on [ Structure of Hue university ] instead, the work/website doesn't all of a sudden become Structure of Hue university. There is no larger work here, and we should not shoehorn one simply because a parameter exists in a template. Headbomb {t · c · p · b} 10:28, 29 December 2021 (UTC)[reply]

Bot changing work= to newspaper=

Is there a convincing reason for the bot to make this change? template:cite news does not give it as preferred to work=. Whether some modern source is or is not a newspaper can be arguable, so why ask questions when you don't know the answer? Do you keep a massive table of which news sources are [physical] newspapers and which only have a web presence? Or which stories only ever appeared on the website but never made it into print? It seems to me that it is not broken so doesn't need fixing. --John Maynard Friedman (talk) 11:05, 7 January 2022 (UTC)[reply]

example diff. --John Maynard Friedman (talk) 20:08, 8 January 2022 (UTC)[reply]

@John Maynard Friedman: Who needs a massive table? We already have an entire database called Wikidata. It can easily be determined that The Washington Post is a daily newspaper via d:Special:EntityPage/Q166032#P31 (that said I think you have a valid argument as to whether such edits should be done by a bot). —Uzume (talk) 04:15, 9 January 2022 (UTC)[reply]

There actually is a (not massive) table. Wikidata would be too much work I think. Izno (talk) 04:44, 9 January 2022 (UTC)[reply]

All of which is no doubt interesting but irrelevant. It doesn't explain the value or purpose of the change. And now I see another one: diff] where 'work' has been changed to 'magazine' (which is true, but so what?) but it still says {{cite news}} – not {{cite magazine}} which I suppose just might be more useful metadata sometime. So a potentially useful change is ignored but a fatuous one taken. What is the point of this change? This is the cosmetic edit equivalent of changing eyeshadow. --John Maynard Friedman (talk) 10:08, 9 January 2022 (UTC)[reply]

That diff is a bug. Izno (talk) 18:41, 9 January 2022 (UTC)[reply]

ISBN in Cite web

Status: new bug
Reported by: Johannes Schade (talk) 06:30, 10 January 2022 (UTC)[reply]

We can't proceed until: Feedback from maintainers

The bot changed "{{Cite web|last=Coolahan |first=Marie-Louise |date=9 May 2019 |title=Dowdall [née Southwell], Elizabeth |website=[[Oxford Dictionary of National Biography]] |doi=10.1093/odnb/9780198614128.013.112775 |url=https://www.oxforddnb.com/view/10.1093/ref:odnb/9780198614128.001.0001/odnb-9780198614128-e-112775 |access-date=14 March 2021 |url-access=subscription}} – Online edition" -> "{{Cite web|last=Coolahan |first=Marie-Louise |date=9 May 2019 |title=Dowdall [née Southwell], Elizabeth |website=[[Oxford Dictionary of National Biography]] |doi=10.1093/odnb/9780198614128.013.112775 |isbn=978-0-19-861412-8 |url=https://www.oxforddnb.com/view/10.1093/ref:odnb/9780198614128.001.0001/odnb-9780198614128-e-112775 |access-date=14 March 2021 |url-access=subscription}} – Online edition". I doubt the bot checks the book against the website. The website could differ from what was published in the book with that ISBN. I do not think an ISBN should be added under these circumstances. With thanks and best regards, Johannes Schade (talk) 06:30, 10 January 2022 (UTC)[reply]

A diff would be more useful than the above. Headbomb {t · c · p · b} 10:26, 10 January 2022 (UTC)[reply]

To cite the ODNB, use {{cite ODNB}}:

{{Cite ODNB |last=Coolahan |first=Marie-Louise |date=9 May 2019 |title=Dowdall [née Southwell], Elizabeth |doi=10.1093/odnb/9780198614128.013.112775}}

Coolahan, Marie-Louise (9 May 2019). "Dowdall [née Southwell], Elizabeth". Oxford Dictionary of National Biography (online ed.). Oxford University Press. doi:10.1093/odnb/9780198614128.013.112775. (Subscription or UK public library membership required.)

—Trappist the monk (talk) 13:12, 10 January 2022 (UTC)[reply]