Gavin Grover's GROOVY Wikiblog

See later entries

4 May 2014

2048 volumes of GroovyCode

When extending Unicode's upper codepoint limit from U+10_FFFF back up to U+7FFF_FFFF, we need to introduce a new term: volume. A volume is 16 planes, so whereas presently there's only 1 volume and 1 plane, GroovyCode will have 2048 volumes, from 0x0 to 0x7FF. We'll also introduce new notation V+xxx to reference volumes and P+xxxx to reference planes, both in hexadecimal. We've previously seen how we can represent the CJK Unihan ideographs using the same formulaic method already used to represent Korean Hangul since Unicode 2.0, but doing it recursively, and generate 2 billion possible ideographs. Let's look at other possible uses of the newly available 2047 volumes in GroovyCode...

Volume V+000 (i.e. U+0x_xxxx) is the Unicode-controlled volume (UCV), consisting of the 16 planes from P+0000 to P+000F. The top half of P+000F will change from its present Private Use to High Ultra-surrogates. The Unicode Consortium will control use of this volume as at present. Already they've given names to 6 of its planes: 0 is BMP, 1 is SMP, 2 is SIP, 3 is TIP, 0xE is SSP, and 0xF is PUP. Each V+000 plane will continue to have the final 2 codepoints be nonchars as they are presently, however subsequent planes (P+0010 and higher) won't so it'll be easier to formulaicly generate characters into blocks spanning more than one plane.

Volume V+001 (i.e. U+1x_xxxx) will be the Private-use volume (PUV), consisting of the further 16 planes representable by 4 bytes in UTF-8-extended. P+0010 will have a dual use as both Low Ultra-surrogates, as well as Private use as at present. Developers would normally use P+0010 (rather than P+000F or the BMP block) for their initial private-use characters, so they can defer deciding whether to encode using the 2048-volume GroovyCode or the presently crippled 1.0625-volume Unicode UTF-8, until they know how many private-use characters they need.

Volumes V+002 to V+03F (doubling 5 more times!) gives the 62 planes representable by 5 bytes in UTF-8-extended. V+002 could be the Japanese emoji volume (JEV). Simply give it to a consortium of Japanese telecom businesses (NTT,etc) to manage: it might even sway them into switching en mass from Shift-JIS to Unicode/GroovyCode.

V+003 could be the Basic syllabic volume (BSV). Korean Hangul gets over 10,000 codepoints in the UCV/BMP (15%) to represent its syllables, and Chinese Hanzi and Japanese Kanji, which are joint syllabic/ideograph, together get 26,000 in the BMP (40%), plus 50,000 more in the UCV/SIP, but alphabetic scripts such as Latin must represent each sound separately. To equal matters up, we can represent syllables in Latin-encoded languages such as English with a single codepoint generated formulaicly.

Presently in Korean there are 19 mandatory leading consonants, 21 mandatory vowels, and 27 optional trailing consonants, giving a total of 19 * 21 * 28 == 11,172 possible syllable blocks, generated by formula into range U+AC00 to U+D7A3. By using the same calculation in English, we see there's 42 optional leading consonant clusters (e.g. str), 20 mandatory vowels for British English (e.g. e), and 143 optional trailing consonant clusters (e.g. ngth) in a syllable, giving 43 * 20 * 144 == 123,840 possible syllables that can be generated by formula. Many of those will be unused in English but the codepoint needs to be reserved so the glyph can easily be generated formulaicly by the font if desired. We can also thus delegate compatibility decomposition issues such as fi ligatures to the font.

Syllabic English in Latin thus would take up 12% of a volume. By separately encoding syllables in all languages for all alphabetic scripts, including vowel marks in abjads and abugidas, we would use many more of the 5-byte UTF-8-extended volumes from V+004 to V+03F. We would reserve the rest for future use.

Volumes V+040 to V+7FE (doubling another 5 times!) gives the codepoints representable by 6 bytes in UTF-8-extended, except the last. Volume V+7FF would have all nonchars its last two planes (P+7FFE and P+7FFF) to cater for the high ultra-surrogates having nonchars as its last 2 codepoints. V+7FF would best be reserved as a Non-character volume (NCV), the other 14 planes having a sort of semi-nonchar status for now.

We would use the 2 billion codepoints in V+040 to V+7FE to generate Unihan characters by formula. If we had 210 basic components each with inbuilt combining behavior, we could formulaicly generate all CJK characters of up to 4 components each 210 ^ 4 ~= 2,000,000,000. By integrating some heuristics into our formula, we could generate many more complex characters. But what if even that isn't enought?...

The 6-bit GroovyCode language we embed in the continuation bytes following each 11111110 byte would, among other tasks, specify the private usage of the 11111111 byte. Users could use the 11111111 byte to head up 8 continuation bytes, encoding points up to U+FFFF_FFFF_FFFF, which is volumes from V+000 to V+FFF_FFFF, giving 280 trillion codepoints if needed. Unihan component recursion depth is no longer a problem!

31 May 2014

GroovyCode's embedded 6-bit language: Ultracode

GroovyCode reinstates the 2.1 billion codepoint repetoire originally proposed for Unicode's UTF-8 encoding, but taken away in 2003. GroovyCode also brings "Ultracode", a 6-bit language embedded in the 10xxxxxx bytes following a 11111110 byte, giving 64 possible tokens.

You may ask why GroovyCode/Unicode needs a language embedded differently to the characters, instead of just assigning 64 default ignorable codepoints, perhaps from a private use area. (We will in fact be assigning 64 such points for interim implementations, but that's only a temporary measure.) Our embedded 6-bit language will use nested scopes, which makes it different to most of the other control-like codepoints in Unicode. Only a few codepoints provide nested scopes, and most of them have been deprecated or superceded:
  • U+206A to U+206F indicate three nested scopes which have all since been deprecated by later codepoints and properties: which script's digit shapes to use, whether to join Arabic letters together, and whether symmetric reversal should be used for certain symbols
  • U+FFF9 to U+FFFB enable embedding annotations like Japanese rubies within text, although the Unicode Consortium recommends a higher-level protocol be used
  • U+202A to U+202E are the bidirectional embedding and override codes, which were superceded by the simplified isolate codes in Unicode 6.3
  • U+2066 to U+2069 are the bidi isolate codes, the only "Unicode recommended" codes that use nested scoping
Unicode has slowly moved away from nested scopes so the codepoint repetoire is almost as self-synchronizing as its UTF-8 encoding. Bidirectionality is the only Unicode process that can't be represented any other way. But there's still an important difference between Unicode's nested scopes such as bidi isolates, and GroovyCode's embedded 6-bit language: processing for all Unicode's nested scopes is reset by the paragraph break token U+2029, whereas the paragraph break is just another contained token within the 6-bit "Ultracode" language.

Ultracode's tokenset

Whereas Unicode's predescessor ASCII provided the 2nd and 3rd quadrants as a useful subset when a 6-bit code was required, Ultracode will use the 2nd and 4th quadrants...
ASCII for Ultracode.png
  • in the 1960's the Latin uppercase letters and underscore were considered more fundamental than the lowercase ones, but nowadays abc-defg is more accepted for variable names than ABC_DEFG
  • the 4th quadrant also provides a Unicode-undefined control character del (U+007F)
  • it's easier to recode ASCII's 0x1yyyyy to 10xyyyyy (whereby the 0x1 mutates to 10x, and yyyyy stays the same)

Ultracode's syntax

We can now work from our restrictions and outline a rough grammar for Ultracode...

The tokens available are the 26 lowercase letters a to z, 10 digits 0 to 9, space, delete, and 26 punctuation and symbol characters -.(){},;!&|#+*/=<>"':?$%~`. Keyboard tokens not available, i.e. the uppercase letters A to Z and symbols []\@^_, can be used as meta-syntax, translated to the 64 encodable characters. We will use the syntactic style used by CSS, Javascript, and Groovy/Kotlin-style builders.

So a to z and 0 to 9 will be valid in names and numbers, with letter first for names and digit first for numbers. A hyphen - and dot . can also be used with both names and numbers. When an uppercase (A to Z) appears in some markup text, it will be immediately translated to hyphen - followed by the lowercase equivalent. The CSS/JS-syntax will use (){} as delimiters and ,:; as separators. Quotes "'` will be shortcut symbols for commonly used calls such as bolding or italics. The del codepoint will be used for newline. Although such newline and space are available for encoding, they'll both be superfluous in the syntax, as will # to indicate line comments. !&| could be used for boolean logic, which leaves ?$%~+*/=<> for some other use.

Meta-tokens mean we don't need an escape character in Ultracode markup text. We'd use meta-token ] to end the markup text, meaning markup text would generally be indicated in some other context by a string of tokens ending in [. We'd also use nested [ and ] to embed other information in our markup, such as arbitrary private-use information embedded in the 10xxxxxx bytes following a 11111111 byte. We could use meta-token \ to precede a single Unicode character in our markup text, and @ to precede the alias for a non-graphic one. @ is also a good meta-character to indicate other embeddings, such as @" ... " for strings of Unicode characters, and @{ ... } for embedded code in some language for inline execution. This leaves ^_ for some other meta-token use.

see for originals of deleted copies

30 May 2014; links moved to contents page on 11 Nov 2016

Groovy's Cyberbully

The coterie's been slinging the name cyberbully my way, the latest in a long list like stalker, kamikaze, terrorist, even sociopath. Putting cyber in front of a word makes it sound more impressive, but what really makes bullying be cyber? When I visit a coffee shop in Melbourne and 10 minutes after sitting down the DJ on the background radio says something about "Grover", then that's also cyber: some day everything ever recorded will have been run through voice recognition software and searchable like Google Books.

I had thought the cold shouldering and runarounds I was getting from the Groovy developers when I tried to respectfully work my way into the ecosystem during 2006 and 2007 was just part of the everyday routines and vagaries of distributed collaborative projects, with perhaps some Rubyists on the outside stirring things up a bit, but didn't know anything for sure so didn't speak up.

Things heated up a bit just after G2One,Inc was announced in October 2007, including some foreigners directly involved in placing surveillance in my apartment, though I didn't mention anything at the time, instead tried to nudge Groovy forward with the occasional hint on my blog. In My Internet Presences on 21 December 2007, I responded to a veiled threat I'd received to publish impersonations of and slanders against me online faster than I could get them removed so I'd find it difficult finding work, if I "messed with the Groovy brand".

After a year of not saying much more, in Groovy 1.6 Released on 23 February 2009, I said that although the Groovy developers, to their credit, had by then completed all the items in creator James Strachan's original manifesto, the JSR hadn't moved an inch in the previous 5 years and Groovy 1.7 should address that issue.

In Groovy futures on 8 August 2009, I accused them of keeping up the appearance of developing the Groovy Language, while continuing to collect consulting fees, to get a high valuation in their talks with JBoss or whoever it was. Two days later SpringSource announced their acquisition by VMWare. In A Groovy Undertaking on 21 August 2009, I suggested someone was trying to stand in for language creator Strachan in an upcoming Groovy Language interview in Australian Computerworld. I added I was getting a little tired of it all, and never really knew when I first got involved what a truly dirty business open source software development was. A month later, Guillaume Laforge got himself interviewed by Australian Computerworld.

In Groovy 2.0 status report on 4 December 2009, I suggested the Codehaus Groovy developers hadn't wanted me around, accused Laforge of putting his own name to my feature request for a groupBy method, and said it was only one of many similar happenings. In Try Groovy, or is it try{Groovy}catch(Exception e){} ??? on 15 December 2009, I accused Laforge of changing the try statement in Groovy so I'd need to rewrite a lot of my documentation.

In Bust Groovy open. Set it free! on 1 January 2010, I accused the Groovy developers of using the Groovy MOP to tie programmers into needing to use the Groovy syntax and DGM so they could sell us the book and charge us consulting fees down the line, and suggested the individual components of Groovy should be separable. In Groovy Ceasefire on 3 March 2010, I suggested the Groovy 1.x and Groovy++ project managers were at odds with each other, each jockeying to protect and promote their own positions in the Groovy ecosystem heirarchy, and that this behavior wasn't good for the Groovy Language. I called on Laforge and Alex Tkachman to think of the future of the Groovy Language as being more important than any petty quarrels overs financial placing in the ecosystem.

I had ended up disillusioned with Groovy by then and had almost given up. Later that year, I found out the silent treatment and runarounds had all been part of a sustained deliberate effort by Graeme Rocher to get others to build Groovy and Grails at no cost, play the products for as much profit and gains as possible, and squash all opposition along the way.

Groovy was originally begun by Strachan in 2003 by taking the rough outline of Beanshell and adding closures, no easy task, along with collections and property syntax. Laforge joined up a few months later and translated the closure-based methods from Ruby. The second significant contribution was from Jeremy Rayner who created the Antlr-based grammar, and the third from Jochen Theodorou who created the meta-object protocol. All three of these Groovy heros were subsequently knifed, though Theodorou doesn't know it yet.

Grails was begun by Rocher in 2005, then an "IT Curriculum Consultant", as a thin wrapper around other software products from various companies (e.g. Spring from SpringSource, Hibernate from JBoss) so he could use it to muscle in on the training and consultancy markets for those products. In 2006 Grails was renamed and promoted as Groovy on Grails to sound like Ruby on Rails. He started G2One,Inc in 2007 and quickly shopped it around among the various companies whose products he bundled, successfully fooling SpringSource into buying it 12 months later.

I returned to the Groovy ecosystem in 2011, inspired by Tkachman's work on Groovy++, and tried out a different tactic for dealing with Rocher and Laforge: publicly exposing their thuggery and fraud as soon as it happens, even when I can't prove it beyond doubt under the culprit's rules of discovery and choice of jurisdiction, but do it all under my own name.

All my "attacks" have been the stuff of standard investigative journalism and the only target has been the deception they shrouded Groovy in. If told I'm still bitter over events that happened many years previously, my reply is if it takes me all of 5 years to know for sure that Rocher and his cronies did things in secret against me, that's not an excuse to ignore it. If told I should use private emails to sort out problems with people on the mailing lists, I'd reply how one of Laforge's many political tricks is to suddenly go public unannounced, as he did with John Rose in late 2005.

By mid-2013, it had become obvious that exposing Rocher and Laforge's misdeeds on my private blog wasn't having the effect of nudging Groovy back on track, and other parties with their own agendas had joined in the fray. So perhaps my blogs weren't necessary. But most of all, by then I'd realized that Rocher/Laforge were just one corner of an interconnected puzzle. I'm now more concerned about my fight against the NSA to reimagine Groovy, the Real one here at Codeplex, than petty squabbles over the fake one over at Codehaus as it keeps toppling. But that's not going to stop me exposing their fabrications whenever I see them.

8 May 2014 (edited 10 May)

Groovy's Gross Exaggerations

groovy maven 2014-03.png

My previous entry about Groovy's lies and statistics exposes how a search engine in China has been used to game TIOBE's rankings, and how Pivotal's software release schedule has been inflating Groovy's apparent popularity. There I also speculated that Maven was being gamed, and the above picture shows I was correct. Groovy's despots wrote it off as a robot or DoS attack, I thought it was an April fools one finger salute from someone in another time zone, but clicking country showed all the downloads came from China, mimicking the Tiobe exploit from last October, not only using the same country but also they both being exactly 2 months before a Groovy/Grails conference (Grails Xchange London in Dec 2013, and Gr8te Conf Denmark in June 2014), obviously intended to fabricate an online deception to sell more conference seats. But I can't prove that last link in the chain of logic because conceivably it could all just be a big frameup.

A month later shows more activity. The narrative being concocted seems to be that there's so much interest in Groovy that programmers can't wait to download each new sub-release, 567,000 in the last month. But click on country and look at the map, 530,000 come from a proxy in China and only 12,000 from the US, 2000 from Germany, and 700 from France over the entire last month.

groovy maven 2014-04.jpg

Someone's fabricated over HALF A MILLION downloads of Groovy over the past month. No wonder virtually no-one takes the language or the ecosystem seriously anymore. The big loser from all this is Gradle because it's a fairly descent piece of software that made a bad decision early on to use Groovy for its scripting. Hopefully they'll see sense before their version 2.0 and provide other JVM languages for an API, like vert.x does.

added 10 May 2014

Another motive for this deception came to light a day later as Rocher (via a human proxy) "deprecated" the Grails user mailing list under cover of centralizing technical questions in one place, but later leaked the real reason: "needing" to move Grails totally away from its open source origins at Codehaus, without explaining why the "need". Because the Groovy 2.3 release 2 days earlier was the first release of Groovy ever that Laforge didn't announce on the Groovy mailing list, only on his personal blog, we can deduce they're trying to get Groovy off Codehaus also. Of all the repositories on Codehaus, Groovy is unique in having a Codehaus rep among its despots, presumably to vote against such stuff. Laforge has been sending out his blog as a weekly Groovy newsletter since last Christmas Eve, initiated under cover of a family holiday, to build up an alternative email contact list for Groovy separate from Codehaus, soliciting for new subscribers every week on the Grails mailing list. The fabricated half million Groovy downloads from Maven may be an excuse to make Maven, not Codehaus, look like the primary distribution channel for Groovy, and thus a reason not to distribute Groovy on Codehaus anymore. There seems to be no end to Rocher and Laforge's deceit.

added 30 May 2014

The bintray maven download count for Groovy is now 888,000 downloads, 837,000 from a Chinese proxy. Though someone's computer seems to have crashed on 12 May and 23 May...

groovy maven 2014-05.jpg

See earlier entries

Last edited Apr 16 at 3:17 AM by gavingrover, version 26


No comments yet.