Talk:

From Wiktionary, the free dictionary
Latest comment: 2 years ago by Fish bowl in topic RFV discussion: June 2018–January 2022
Jump to navigation Jump to search

RFV discussion: June 2018–January 2022[edit]

The following discussion has been moved from Wiktionary:Requests for verification (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Vietnamese

I find it odd that Vietnamese writers would make use of a specifically Japanese phonetic glyph with a value of nu as the typographic equivalent of the " ditto mark.

I suspect that the intended glyph is not the Japanese katakana character (nu, Unicode hex value 30CC), but rather the graphically similar Chinese (and thus Vietnamese chữ Nôm) character (again, as well, Unicode hex value 53C8). In fact, the Japanese phonetic katakana character originally derived from a shorthand version of (used phonetically to represent nu), which includes the glyph as its right-hand portion.

Our entry at cites a website that appears to be volunteer-based data of uncertain provenance. Meanwhile, the Vietnamese Nom Preservation Foundation's online lookup tool has no entry for ヌ (Ux30CC), but it does have an entry for 又 (Ux53C8). Could someone check other sources and confirm?

‑‑ Eiríkr Útlendi │Tala við mig 21:53, 22 June 2018 (UTC)Reply

The website in question says has a pronunciation of lại, and you can find several instances of pronounced lại on the same site. It is very likely to be a confusion of the two by their shapes. — TAKASUGI Shinji (talk) 04:18, 23 June 2018 (UTC)Reply
Thank you for the additional information. The chunom.org website is the one cited at the ヌ#Vietnamese entry, and the data there is of unclear provenance. I cannot tell if this is a reliable and trustworthy source, or instead something that might be error-prone in a manner similar to Jisho.org. (That might be what you were suggesting, that chunom.org is error-prone?)
If, ultimately, the Ux30CC glyph is actually in use in electronic Vietnamese chữ Nôm texts, then we should probably have an entry. If instead electronic texts only use Ux53C8, ヌ#Vietnamese should probably go away.
Are there any other electronic Vietnamese sources, or even ideally published works, that use glyph (Ux30CC) interchangeably with (Ux53C8)? ‑‑ Eiríkr Útlendi │Tala við mig 18:16, 27 June 2018 (UTC)Reply
It is a reduced form of ("again"), used as an iteration mark in Vietnamese Chu Nom, e.g. 喑ヌ (ầm ầm), 猪ヌ (chưa chưa), 赤ヌ (xích xích), 紅ヌ (hồng hồng). Lại means “again”. Listing it on is probably using the wrong codepoint, but then I'm not sure where this should belong. Wyang (talk) 22:34, 27 June 2018 (UTC)Reply
They seem to use U+30CC and U+31F4 interchangeably, which suggests there is no officially assigned code point. I prefer moving the information to with a soft redirect at , until the official code point is given in Unicode. — TAKASUGI Shinji (talk) 02:27, 28 July 2018 (UTC)Reply
Even Chunom.org's main entry is the U+314F one (), while their U+30CC entry is pretty minimal.
In the absence of any Vietnamese editor input, I second Shinji's suggestion. ‑‑ Eiríkr Útlendi │Tala við mig 21:42, 30 July 2018 (UTC)Reply
@Eirikr, TAKASUGI Shinji: I checked the links provided by Wyang and the character is indeed attested in Vietnamese texts published from 1909 to 1940. The only problem is that it shouldn't be using the same codepoint that is meant for katakana. I don't think this character is unifiable with (the glyph forms are different) so I checked the proposed charts for CJK Extension G and H as well as CJK Extension B,C,D,E,F but this character is not there. I propose moving the entry over to ⿻㇇丶 (See Category:Terms containing unencoded characters for other terms that are not yet encoded). KevinUp (talk) 14:36, 10 January 2020 (UTC)Reply
I think that using the katakana codepoint is less troublesome, in the same vein as how Cyrillic codepoints are used for some tones in the old Zhuang Latin script. —Suzukaze-c 20:12, 10 January 2020 (UTC)Reply
  • @KevinUp, any chance that's a scanno kind of problem? I highly doubt that the original texts from 1909–1940 were using any codepoints at all.  :)  And thinking through how such texts became digitized, scanning + OCR comes to mind as a likely approach. And if the OCR engine weren't configured quite right, that might be how (Ux30CC) crept in where some graphical variant of (Ux53C8) might have been the glyph actually used in the dead-tree texts.
An idea, anyway. ‑‑ Eiríkr Útlendi │Tala við mig 19:19, 14 January 2020 (UTC)Reply
Eirikr: I don't think Vietnamese texts can be digitized using OCR because many Nôm characters are still unencoded in Unicode. I think the Katakana character was chosen because no other character is available to represent that glyph (the links contain actual images of the text). For now, we could just keep the entry under ヌ#Vietnamese until it is encoded by Unicode. KevinUp (talk) 10:58, 16 January 2020 (UTC)Reply
KevinUp: interesting re: digitizing.
For completeness' sake, I see that there is also (U+3121), visually identical to Japanese (Ux30CC) in some fonts, and more explicitly derived from (yòu) (U+53C8). However, the Nôm lookup tool doesn't have U+3121 either, only U+53C8. ‑‑ Eiríkr Útlendi │Tala við mig 19:01, 16 January 2020 (UTC)Reply
Eirikr, Suzukaze-c: I checked the images at chunom.org and noticed that there is another variation of this character where the dot does not extend beyond the bottom stroke of . Since this character is not a katakana or Zhuyin letter, it shouldn't be using any of these two codepoints. I think it would be better to move this entry to ⿻㇇丶 which can also represent the second variation of this glyph. KevinUp (talk) 14:24, 17 January 2020 (UTC)Reply
  • @KevinUp, thank you for the additional research. I wonder how much of this variation is due to differences in scribal handwriting? On the page for the ngày ngày example, for instance, I note several irregularities in other characters as well.
Agreed that our Vietnamese entry for this should probably be moved. One concern, however, is how would users find ⿻㇇丶 when searching? ‑‑ Eiríkr Útlendi │Tala við mig 17:01, 17 January 2020 (UTC)Reply
They won't. If the Nom Foundation is the primary body working on digital Nom texts, we should follow their usage. —Suzukaze-c (talk) 03:54, 2 February 2021 (UTC)Reply
Closed: no action taken. —Fish bowl (talk) 04:22, 12 January 2022 (UTC)Reply