Article · Wikipedia archive · Last revised Jun 29, 2026

Wikipedia talk:Categorization

Last revised
Jun 29, 2026
Read time
≈ 12 min
Length
2,832 w
Citations
Source

Small categories

I saw a question from Naraht at the help desk on whether there is a minimum number of entries in a Category and whether there are exceptions here. I know that WP:SMALLCAT is a former guideline, is there any current guidance? TSventon (talk) 15:42, 24 March 2026 (UTC)[reply]

TSventon Thank you for bumping it over here. I've dealt with two somewhat similar situations in the last 3 months.
  1. Category:Historically segregated African-American schools in the United States. I've only created state level categories in Category:Historically segregated African-American schools in the United States by state or territory if the state has three or more, and left them in the parent cat if there were one or two.
  2. Category:RuPaul's Drag Race episodes I had only been creating the season by season subcats if there were more than three entries that weren't just redirects. Another editor created season cats for the seasons that only had one and I wasn't sure enough of WP:SMALLCAT to undo it.
So advice please.Naraht (talk) 16:59, 24 March 2026 (UTC)[reply]
WP:NARROW. It's really just a question of whether it enhances navigation, or is merely overcategorisation. Other questions to ask yourself: Is this a topic that a well-referenced article could be written about? Is the topic broad enough for such a grouping to have value, or so narrow as to hinder navigation. Also, do any lists or navigation boxes already exist for this?
I hope this helps. - jc37 13:06, 28 March 2026 (UTC)[reply]
@Naraht:. TSventon (talk) 13:12, 28 March 2026 (UTC)[reply]
Jc37, TSventon For the Drag race episodes, there is an article (multiple pages) for each of the seasons so far, so based on that, I'm good with the addition of categories even if only one article has been created out of the 8-14 episodes. They are folded into an extensive navbox as well, don't know if that helps or hurts. Unfortunately, the idea of whether a well referenced article could be written about put not only Category:Historically segregated African-American schools in the United States splitting into states as a worse idea, if affects a number of more established categories as being less appropriate, for example a category of the historically black colleges in a state as a category. If it doesn't make sense to have a single article covering both the private historically black colleges/universities in Georgia with the state created historically black colleges/universities does that make them less appropriate to have as a category?Naraht (talk) 20:58, 29 March 2026 (UTC)[reply]
@Naraht: When it comes to diffusing large categories, it's more about navigation if the subcategories fit into an existing scheme, such as separating American categories by state or territory. The problem there becomes when users do that for all categories even when there's not many articles to diffuse, or when they create a category like "X by Y" with only one subcategory in it. There's no hard and fast rule; it's just a judgement call. If a categorisation scheme exists and seems useful, I don't think it matters whether there could be an article written about every category in the tree. Mclay1 (talk) 00:20, 30 April 2026 (UTC)[reply]
Mclay1 Thanx. For the Historically segregated AA schools in the US, it is a situation where there are 200 and let's say Georgia has 17, Mississippi has 15, etc. but Massachusetts and Connecticut only have one or two. I know this is probably a situation where this could be discussed if anyone brought up a CFD, but I thought I'd ask here in advance.Naraht (talk) 14:42, 30 April 2026 (UTC)[reply]
@Naraht: In a situation like that where a category can be fully diffused but some subcategories will be underpopulated, opinions will be divided on whether those small categories are useful and should exist or not. Personally, I wouldn't create a category for just one article. If it were brought to CfD, the majority consensus would likely be to upmerge them. Mclay1 (talk) 14:54, 30 April 2026 (UTC)[reply]
Fair enough.Naraht (talk) 15:11, 30 April 2026 (UTC)[reply]

Should singing synthesis sample libraries be considered "software"

Hello, why are Wikipedia's articles on sample libraries performance-sampling based singing synthesizers dubbed as "software"? Some examples in these lists:

List of Vocaloid products

Category:Singing software synthesizers

Most singing synthesis systems are based on performance-sampling techniques. Basically, they use a library of recordings of singing and process those recordings to produce their synthesis. An overview of this and a practical example is shown in this paper by one of the pre-eminent figures in the field: https://repositori.upf.edu/items/e5fd582d-da51-4b24-a376-44289967d116?locale=en (unfortunately this it seems to be down right, so I have uploaded a copy here: https://smallpdf.com/file#s=d72ae8f3-2328-4b4d-8c5f-183f04c5c54d)

I don't think these libraries differs significantly from other types of audio sampling libraries, and I think it is comparable to things like brush packs for drawing programs. Usually, software is considered to be executable code or at least that controls a given system in an expressive enough way within a given domain.

Wikipedia's own definition of software: Software "Software consists of computer programsthat instruct the execution of a computer.Software also includes design documents and specifications."

I feel that these articles should instead describe them as something along the lines of "singing synthesis sample libraries" or "vocal synthesis libraries". ~2026-23457-68 (talk) 18:29, 15 April 2026 (UTC)[reply]

Computer program, the sequence of instructions, is what you're thinking of here. It is a subset of software. The lead at Software indicates that it also includes design documents and specifications and I'd argue that it also includes certain data used by the computer program; certainly things like error messages, the icon and other GUI elements. In summary, it's arguable and drawing the line will not be trivial. ~Kvng (talk) 15:52, 24 April 2026 (UTC)[reply]

"All other punctuation marks (and punctuation-like diacritics) should be removed."

I'm confused about the part of the guideline about removal of "punctuation-like diacritics". It was added in Special:Diff/1311854344 by User:Mclay1 on 17 September 2025, but I don't see anything resembling this guideline in the prior version: Special:Permalink/1311292815. The relevant part of the guideline in the prior version has no mention of diacritics:

Hyphens, apostrophes and periods/full stops are the only punctuation marks that should be kept in sort values. The only exception is the apostrophe in names beginning with O', which should be removed. For example, Eugene O'Neill is sorted {{DEFAULTSORT:Oneill, Eugene}}. All other punctuation marks should be removed. (Commas can be added when re-ordering words, as in the previous example.)

The only relevant discussion at the time that I see in the talk page archives is Wikipedia talk:Categorization/Archive 19#"O'" rule for first names about the last name "O'Donel", for which the guideline is clear both before and after Mclay1's edit. Side note: it is also mentioned in Wikipedia:Categorization/Sorting names#Other exceptions.

The text of the guidelines (as of Special:Permalink/1350613034) implies that all diacritics are handled automatically by MediaWiki in its collation algorithm. The passage In English Wikipedia, sort order merges (ignores) case and diacritics. For example, "Baé", "Båf", "BaG" would be sorted in that order. and its footnote In 2016, English Wikipedia's category collation was changed to "uca-default", which is based on the Unicode collation algorithm (UCA). The most noticeable difference is that UCA groups characters with diacritics with their non-diacritic versions.

Part of my confusion is related to letters of other Latin-based alphabets which don't have "punctuation-like diacritics". As a concrete example, the section Wikipedia:Categorization/Sorting names § Sort by surname which says (as of Special:Permalink/1335883955):

[...] on English Wikipedia, the DEFAULTSORT value is Western order, overridden for Icelandic categories, where the sort key is as the name is written. Arnaldur Indriðason is sorted {{DEFAULTSORT:Indridason, Arnaldur}}, while the Icelandic category of photographers is done, [[Category:Icelandic photographers|Arnaldur Indridason]]. For the listas= parameter in project templates on article talk pages use the DEFAULTSORT value (since it mainly categorises in non-Icelandic categories)

but the whole guideline page doesn't mention the replacement of any alphabet with English. In the Icelandic example the only replacement is "ð" → "d", but the common practice seems to be to replace all letters specific to the Icelandic alphabet with their English alphabet analogues. For example, Þórhildur Sunna Ævarsdóttir has Aevarsdottir, Thorhildur Sunna and Thorhildur Sunna Aevarsdottir as category sort keys. The discussion from the footnote doesn't seem to mention it either. —⁠andrybak (talk) 16:11, 25 April 2026 (UTC)[reply]

@Andrybak: I added the clarification about punctuation-like diacritics to make clear they are included in the guideline for the removal of punctuation as opposed to the guideline explaining that diacritics on letters will be ignored. I was referring to characters such as ʻ in Hawaiʻi (island). I don't know if there's a better term for those characters? Mclay1 (talk) 16:24, 25 April 2026 (UTC)[reply]
Supposedly that specific character is a letter, but I think it's fair to say most English-speakers would expect Hawaii and Hawaiʻi to be sorted together. Mclay1 (talk) 16:27, 25 April 2026 (UTC)[reply]
I agree with this example of modifier letter turned comma – this is indeed what the readers would expect.
However, as a reader of the guideline page, I assumed that "punctuation-like diacritics [...] should be removed" meant that ü (U with diaeresis) should be converted into u (letter) via removal of the diaeresis and that Ç (C with cedilla) should be converted into C (letter). Diaeresis and cedilla are examples of diacritics that resemble punctuation. This is what my confusion is about.
Because names are an easy example of other alphabets being used in article titles, I tried checking what is written in WP:NAMESORT, but it doesn't mention letter/alphabet/diacritics conversion/replacement at all. —⁠andrybak (talk) 16:45, 25 April 2026 (UTC)[reply]
I'll try to clarify that then. Although, since diacritics attached to letters are ignored anyway, removing them in sort keys is fine and has historically been the general practice anyway. In terms of replacing non-English letters, that's an interesting point that we should investigate. Mclay1 (talk) 17:32, 25 April 2026 (UTC)[reply]
In the Icelandic example the only replacement is "ð" → "d" - this is wrong. The Icelandic letter concerned is called eth, and it's pronounced just like the hard "th" in English words like "this", "that" and "the other". This is in contrast to þ, or thorn, which is the soft "th" in English words like, well, "thorn", "thank" or "think". --Redrose64 🌹 (talk) 20:49, 25 April 2026 (UTC)[reply]
While eth is the phonetic equivalent of th, it is commonly transliterated as D because it was derived from D. For example, Sigurd is the Anglicisation of Sigurðr. That's standard, not just on Wikipedia. Mclay1 (talk) 05:18, 26 April 2026 (UTC)[reply]
I've tested the sorting of some characters. Æ automatically sorts as AE, and ð automatically sorts as D, so replacing those characters in sort keys is not necessary (but is also fine to do for clarity). However Þ sorts after Z, so it probably should be replaced by Th. ʻ sorts in between H and I for some reason, so I think it should be removed to avoid confusion. Mclay1 (talk) 05:34, 26 April 2026 (UTC)[reply]

Discrepancy between categories and lists of Czech films (discussion at WikiProject Film)

I've started a discussion at WikiProject Film about the fact that categories and lists of Czech films are defining "Czech" differently, and what to do about it. I'd love some input from people more familiar with categorization.

Wikipedia_talk:WikiProject Film#Discrepancy between categories and lists of Czech films dylansan (talk) 12:52, 8 May 2026 (UTC)[reply]

Discussion about history of instructions at WP:CFD/S

Please join the discussion at Wikipedia talk:Categories for discussion#Reconstructing history of instructions of Wikipedia:Categories for discussion/Speedy. —⁠andrybak (talk) 16:30, 16 May 2026 (UTC)[reply]

Categories created just so the category creator can be in them

Pinging @User:Jp1008

This strikes me as being an abuse of the Categorization system, but I wanted to ask before I did anything about it.

I can think of at least 20 cats I could add to my page, such as Category:Wikipedians interested in Cockcroft–Walton generators... --Guy Macon (talk) 04:57, 15 June 2026 (UTC)[reply]

Typically those are or at least used to be deleted at CFD, see e.g. Wikipedians interested in Paul McCartney.
You can certainly make a solid OC/U case for deletion at CFD. Doesn't guarantee they will be deleted, and some people will probably see them as harmless, but no one should be bothered by the mere fact of a nomination. ~2026-34922-84 (talk) 05:15, 15 June 2026 (UTC)[reply]
I think they're harmless. Have moved them into a subcat and added sort keys. PamD 05:33, 15 June 2026 (UTC)[reply]
And after looking at WP:OC/U (please link cryptic abbreviations!) I still think these are justified because many of each artist's works are individually notable and could have articles created. The Paul Mcartney discussion cited there seems irrelevant here as that was about fandom. PamD 05:43, 15 June 2026 (UTC)[reply]
My bad, I suppose I was thinking in the CFD context it was one of those well-known ones.
As with so many other things, when rubber meets road over-narrowness of category is assessed on a case-by-case basis in context. No idea what the outcome of a CFD would be here, but I wouldn't fault anyone for starting one to see where community consensus is. ~2026-34922-84 (talk) 06:01, 15 June 2026 (UTC)[reply]

Hi Guy, thanks for the heads up. I carefully read WP:OVERCAT and researched art Categories for a while before I decided to add these two as it was the first time I created a User Cat an I’m a careful new editor.
As a counter example, you can see Cats for by individual paintings in that do seem too narrow. But I would not presume they were created due to an egotistical motivation, but by someone desperately trying to find some company (Wikipedia does not make this easy at all for new editors).
I am a level above individual paintings, and Turner and Constable are not “smallish” artists, but the most important painters of the English Romantic period. Turner even influenced the early Impressionists. Last but not least, I don't have any other User Cats in my User page that indicates this was an attempt to do what you are implying and I have logged many improvements to both artists' pages and their paintings over the last months.
I hope that is sufficient to address your concerns. Jp1008 (talk) 18:42, 15 June 2026 (UTC)[reply]

I have been researching WP policy on this and I have to backtrack. From a Gemini query:

Because Wikipedia's policy on user categorization (WP:OC/U) strictly limits the creation of hyper-specific interest groups, there are almost no official, long-standing user categories dedicated to single, individual European painters. [So even] if you search for Category:Wikipedians interested in Leonardo da Vinci, Category:Wikipedians interested in Michelangelo, or Category:Wikipedians interested in Raphael, you will find that none of those pages exist.

I will now have to figure out how to remove them ... Jp1008 (talk) 22:19, 15 June 2026 (UTC)[reply]
Keeping in mind that I have not said that I think they should de deleted (unlike "some people" I don't ask questions when I have already decided what the answer is), if you decide to delete them I believe that adding {{db-g7}} to the page as explained at WP:SPEEDY will do the trick. --Guy Macon (talk) 23:15, 15 June 2026 (UTC)[reply]