Gavin Grover's GROOVY Wikiblog

See later entries

2 February 2017

ASF's devious plans for Apache Groovy

Last October (2016), I pointed out irregularites in Apache Groovy's rise in the TIOBE index, and suggested the ASF chairperson for the Groovy PMC knew about them, but conveniently ignored them when reporting to the ASF directors. TIOBE's commentary for that month had speculated that Go and Groovy were the only 2 languages with an increase of more than 1% compared to the same period the previous year, and one of them would become TIOBE's 2016 Language of the Year. It further said that Groovy's sudden jump from 0.328% to 1.182% in December 2015 would not be considered in the calculation because it happened the previous year.

So what do you think then happened in TIOBE's index for December 2016? Yep, after 12 months in the top 20, Groovy suddenly dropped from 1.7% to 0.875%. I predict Groovy will bounce back up in TIOBE's index now the new year has passed, and thus become a contender for TIOBE's 2017 Language of the Year! It looks like Apache Groovy's backers in the ASF aren't just conveniently ignoring the distortions in Groovy's TIOBE ranking, but actively gaming it. Even more tragic is that they consider the TIOBE Language of the Year title to be something worth planning 12 months ahead for.

But how did they game the ranking? Most of the TIOBE ranking is calculated from various of Google's national search engines from around the world. Does someone in the ASF project management for Groovy have the influence to move Groovy up and down in Google's search results? Was a deal made for Go to be 2016 Language of the Year in return for Groovy in 2017? It all looks very, very suspicious.

Expanding Unicode, 2017 F.A.Q

Why do the utf-88 trailing surrogates double up as standalone characters?
answered on 26 Jan 2017:

In utf-16, both the leading surrogates (U+d800 to U+dbff) and trailing surrogates (U+dc00 to U+dfff) have no other use, which provides maximal self-synchronization. Some variations to Unicode let some of the 1024 trailing surrogates double as other characters used standalone (i.e. when not following a leading surrogate), but this isn't official Unicode policy. The encoded sequence is still self-synchronizing, but requires a single character lookback. (Of course, the leading surrogates could never be used that way.)

The reason utf-88 allows its trailing surrogates (U+100000 to U+10ffff) to double up as standalone characters is because utf-88 isn't intended to be a permanent encoding. It's only an interim encoding until the Unicode Consortium see sense and revert to the pre-2003 upper limit for utf-8 and utf-32 of just over 2 billion codepoints. utf-88 is a surrogation scheme over utf-8 that uses half of private use plane U+fxxxx (i.e. U+f8000 to U+fffff) as leading surrogates and all of plane U+10xxxx as trailing surrogates, but enables plane U+10xxxx to continue its use as a private use plane. It does that because that's how it will continue to be used after the Unicode Consortium introduces its own second-tier surrogation scheme over utf-16 to accompany the reversion of utf-8 and utf-32 back up to 2 billion characters.

utf-88 is an interim surrogation scheme that can be used now, and can be converted to an expanded utf-8 easily when the Consortium finally makes it official.

Why does utf-88 provide 1 million private use codepoints?
answered on 26 Jan 2017:

There are not enough private use codepoints in Unicode, only about 137,000, in 3 blocks (U+e000 to U+f8ff, U+f0000 to U+ffffd, and U+100000 to U+10fffd). If people want to design square-structured scripts like Unihan and Korean Hangul, they'll need many more codepoints. utf-88 extends the third private use block to over 1 million codepoints, by redefining U+10fffe and U+10ffff from Nonchar to Private Use, and adding planes U+11xxxx to U+1fxxxx. It uses these codepoints not only because they're contiguous but also because they're the remaining ones that can be encoded with 4 bytes under the original pre-2003 utf-8 encoding scheme as proposed by Rob Pike and Ken Thompson, thus showing respect for users of Unicode by providing this "prime real estate" for their private use needs.

This third private use block contiguity also suggests how the Unicode Consortium might eventually define a second-tier surrogation scheme for utf-16. Because planes U+0xxxx to U+fxxxx have Nonchars for their last 2 codepoints, using such a plane in its entirety for trailing surrogates (as utf-88 does) wouldn't work to encode the planes above this which won't have any Nonchars. The most likely scheme would use a single plane in its entirety for the leading surrogates (where it doesn't matter if the last two codepoints aren't used), and the first half of another plane for trailing surrogates. If special purpose plane, U+Exxxx, is to be used, then the middle half of that plane (U+e4000 to U+ebfff) would be suitable the trailing surrogates, along with perhaps all of plane U+dxxxx for the leading surrogates.

Why release an unfinished version of the CJK decomposition data? Why make the licensing so permissive?
duplicate of answer on 18 Mar 2011:
I'm hoping someone in China takes an interest in creating a standard set of decompositions for every CJK character in Unicode, and eventually submitting it to the Unicode consortium for inclusion in their property files. I want to encourage this to happen with as permissive a licence set as possible. Eventually, I hope the Unicode consortium includes all intermediate decompositions as standalone characters in Unicode.

26 November 2016; updated 26 Sep 2017

The Road to Groovy's Manifesto

Warning: Some of the links are to old Groovy mailing list entries at Nabble, which nowadays redirect via ASF-controlled

Let's look at the events leading up to my proclamation of the Groovist Manifesto in March 2015. Groovy's creator James Strachan announced Groovy on 29 August 2003, but I myself didn't discover it until late 2004. I started using Codehaus Groovy but didn't join the mailing lists. Strachan left the project because of creative differences at GroovyOne on 28 Nov 2005 and in his very last ever posting to the Groovy mailing lists a week later. Soon after, new self-styled "Project Manager" Guillaume Laforge instituted an openness policy when he went public with emails between himself and John Rose about a spec for Groovy, saying "Groovy is an OSS project: that means discussions should be held in PUBLIC. It's out of question that discussions are held between a small set of persons. I can't stand that, and I won't stand it furthermore." I then cautiously started posting on their mailing lists.

The openness politician Laforge had talked about had been just a trick. By then he'd derelicted his duty to make a spec for Groovy by closing down the JSR process and generating a "spec" from their Codehaus implementation's codebase. John Wilson and Alex Tkachman were just the most vocal two who'd been given the runaround with their contributions. I had long since left the mailing lists to blog openly about Groovy's management problems, but that only worsened their deception. They accused me of acting out of bitterness and inconsistently, spread their smears widely, and even threatened legal or even criminal action against me. I then branched off on my own to rebuild Groovy, eventually landing on the Go platform, no longer caring about the JVM side the Groovy ecosystem.

In January 2015 VMware retrenched the 6 Groovy and Grails committers they'd employed, and only the Grails ones found another business to support their work. A month after that, when Laforge started bleeting about moving Codehaus Groovy to a foundation, Cedric Champeau suggested that Groovy branded nothing more than their org.codehaus.groovy implementation when he published Who is Groovy? They had obviously conspired to restrict the Groovy Language brand to whatever software they directly controlled, just like when they had 3 years earlier similarly restricted the Groovy Language semantics when they duplicated the functionality of Alex Tkachman's Groovy++ plugin. By then I knew I had to step up again to protect Groovy's branding, so anyone who builds any implementation on any platform could use it. So in March, I clearly articulated the 3 articles of the Groovist Manifesto, that Groovy implementations should be led by their technical people, applications using Groovy should not dictate its direction, and it should be standardized so many various implementations can be built.

Over the last few months, I've pruned my blog here at Codeplex and my older one at of abandoned technical directions specifically to free up the Vy brand, which I'm now reusing for the upcoming IME for foreigner-friendly pictorial input of Unihan. Over the next six months or so, I'm going through my old blog entries again, purging them of any claim or suggestion that doesn't show a direct and consistent link to one of the 3 articles of Groovy's manisfesto. Of course, the original text will still be available from the wiki history, but the current content published in the Real Groovy wikiblog will be that which promotes a standardized Groovy Language with many implementations, each led by technical people who are independent from the applications which use them. Anything I publish will flow logically from the aims of that manifesto. Real Groovy will be as much an implementation of the Groovy Language as Apache Groovy, eventually becoming the new reference implementation!

27 October 2016; updated 5 November 2016

Groovist Manifesto: Oct/Nov 2016 update

Early last year, I issued the Groovist Manifesto, a charter of 3 articles that deal with the root causes of Groovy's problems, so we don't waste time dealing with the symptoms. Time for an update on their progress...

1. Groovy implementations should be led by their technical people

I recently wrote how the 4 people in the Groovy PMC who actually do the programming or testing on Groovy are swamped by another 5 who don't do any, including the chairperson, i.e. Guillaume Laforge, Jim Jagielski, Andrew Bayer, Konstantin Boudnik, and Roman Shaposhnik. The active code committers in Groovy's top leadership has decreased from 60% in Codehaus days to 44% today. Although I've called on anyone who hasn't contributed a github commit or even a jira comment to Groovy in the past year to leave the Groovy PMC, managers like these always cling to their power so I don't expect article 1 to be fulfilled.

2. Applications using Groovy should not dictate its direction

Graeme Rocher is building a Grails consulting business at OCI, while continuing to personally own Grails property like the domain name. He's muscling in on Groovy as well, signing up Groovy PMC member Paul King as an OCI consultant. This looks like a move by Rocher to fork Apache Groovy in the same way LibreOffice was forked from OpenOffice. After moving as many of Groovy's techies as he can to OCI, he's going to fork and rename Groovy as something like the Grails Language. But unofficially, he'll command his minions to refer to it as Grails Groovy, in the same way he told people to talk about Groovy on Grails after being forced to rename Groovy on Rails to Grails. And if Apache abandons the OpenOffice brand when they abandon OpenOffice, Rocher will claim it's a precedent in a grab for the Groovy brand as well as its codebase and committers.

But any Groovy committer considering working for Rocher at OCI had better realize any promise they can work on Groovy will become a broken promise. They'll be pimped out full-time to paying clients.

3. Groovy should be standardized for its various implementations

After being picked on as part of Rocher's grab for profits from G2One,Inc in 2007-08, then again when Laforge squashed Groovy++ and my own GRegexes in 2011-13, I decided to rebuild a better implementation of Groovy from scratch, without trying to coordinate with the Apache Groovy committers on a spec. After settling on Go as the best implementation language, I discovered that a plugin facility is lower level than dynamic typing syntax. I'm therefore now working towards a 1.0 release of Gro, using Unihan for statements, special identifiers, package names, and commands. Gro 1.1 will use Unihan for macros also, and a dynamically-typed language plugin called Groo will rigorously test out the interface to the plugin.

Later on, an IME (input method editor) called Vy will provide foreigner-friendly pictorial input of Unihan. I've now finished purging my old blog entries of redundant uses of the name Vy. Groo and Vy packaged together will be called Real Groovy, replacing Apache Groovy as the reference implementation of the Groovy Language.

groovy ecosystem.jpg

20 October 2016

Grolang 0.5 Release Notes

A minor release of Grolang is out (0.5), with some minor updates such as changing the Unihan character for switch to .

Gro's github home page gives an overview of how to install and run Gro.

An intro page at describes features of Gro's syntax. All the features described on that page have been implemented, and will only change if absolutely necessary.

Implemented features have been motivated by:
  • writing programs with no whitepace. I believe whitespace in programming syntax is essentially like colors used by IDE's to markup the syntax. It should be added by the display software, not embedded in the raw code.
  • using Unihan in the syntax to make programs shorter. The Simplified Chinese subset of Unihan is used because many programmers know it, and it can be easily entered via IME's.
  • allowing many packages and source units to be stored in a single file. This makes browsing programs easy when the source code is short.
  • inferring boilerplate code. package main, func main(), and missing common imports are inferred. This makes conceptual one-liners be actual one-liners.
  • seemlessly exposing the underlying platform. Go code can be embedded within Gro code without any special markup syntax. The only restriction is that the Go code can't use Unihan in its identifier names.

Rudimentary features are motivated by:
  • providing a command line facility. This includes both a Lisp-like REPL and a J-like tutorial mode. The response-time of the current REPL is slow because the program needs to be recompiled for every statement entered.
  • enabling plugins. Gro plugins can define macros for new statements, expressions, and types, as well as enforce a blacklist on use of existing Gro statements, expressions, and types. The response-time is slow for now because the program needs to be recompiled when a plugin is called.
  • building dynamic typing directly into the syntax. This is achieved via a plugin.

Gro brings brevity to the Go language via the Unihan IME, bringing Go back to its command-line roots.

25 August 2016

Go 1.7, Gro 0.4.0, & Groo

Last week, Go 1.7 was released. We celebrate by releasing the next version of Gro. It's:
  • rewritten as a fork of the faster Go-based recursive descent parser that was first used in Go 1.6, formerly called Qu
  • split into two projects: Gro for the Unihan-based vocabulary and macro system, and Groo for the dynamically-typed plugin
  • moved from my personal github account to its own at, so the former utf88, kern, thomp, gro, and qu are deprecated

Here's some doco on the language.

I split grolang into two (i.e. Gro and Groo) because I discovered that dynamic typing in Go could be modelled as macro-based addons to a core language. Hence:

Go -> Gro -> Groo

Groo(lang) is a member of the Groo family of languages, which now includes Apache Groovy, GrooScript, and Grooid.

If there was an edition of Go without garbage collection, and heap memory (pointers, slices, maps, interfaces, functions, and channels) could be deleted using something like unsafe.Delete(interface{})bool, then perhaps that language would be called "G":

G -> Go -> Gro -> Groo

Whatever would come next in the hierarchy?

See earlier entries

Last edited Sep 25 at 7:02 PM by gavingrover, version 15


No comments yet.