Gavin Grover's GROOVY Wikiblog
2 February 2017
Last October (2016), I
pointed out irregularites in Apache Groovy's rise
in the TIOBE index, and suggested the ASF chairperson for the Groovy PMC knew about them, but conveniently ignored them when reporting to the ASF directors. TIOBE's commentary for that month had
speculated that Go and Groovy were the only 2 languages with an increase of more than 1% compared to the same period the previous year, and one of them would become TIOBE's 2016 Language of the Year. It further said that
Groovy's sudden jump from 0.328% to 1.182% in December 2015
would not be considered in the calculation because it happened the previous year.
So what do you think then happened in TIOBE's index for December 2016? Yep, after 12 months in the top 20, Groovy suddenly dropped from 1.7% to 0.875%. I predict
Groovy will bounce back up in TIOBE's index
now the new year has passed, and thus
become a contender for TIOBE's 2017 Language of the Year
! It looks like Apache Groovy's backers in the ASF aren't just conveniently ignoring the distortions in Groovy's TIOBE ranking, but actively gaming it. Even more tragic is that they
consider the TIOBE Language of the Year title to be something worth planning 12 months ahead for.
But how did they game the ranking? Most of the TIOBE ranking is calculated from various of Google's national search engines from around the world. Does someone in the ASF project management for Groovy have the
influence to move Groovy up and down in Google's search results
? Was a deal made for Go to be 2016 Language of the Year in return for Groovy in 2017? It all looks very, very suspicious.
Why do the utf-88 trailing surrogates double up as standalone characters?
answered on 26 Jan 2017:
In utf-16, both the leading surrogates (U+d800 to U+dbff) and trailing surrogates (U+dc00 to U+dfff) have no other use, which provides maximal self-synchronization. Some variations to Unicode let some of the 1024 trailing surrogates double as other characters
used standalone (i.e. when not following a leading surrogate), but this isn't official Unicode policy. The encoded sequence is still self-synchronizing, but requires a single character lookback. (Of course, the leading surrogates could never be used that
The reason utf-88 allows its trailing surrogates (U+100000 to U+10ffff) to double up as standalone characters is because utf-88 isn't intended to be a permanent encoding. It's only an interim encoding until the Unicode Consortium see sense and revert
to the pre-2003 upper limit for utf-8 and utf-32 of just over 2 billion codepoints. utf-88 is a surrogation scheme over utf-8 that uses half of private use plane U+fxxxx (i.e. U+f8000 to U+fffff) as leading surrogates and all of plane U+10xxxx as trailing
surrogates, but enables plane U+10xxxx to continue its use as a private use plane. It does that because that's how it will continue to be used after the Unicode Consortium introduces its own second-tier surrogation scheme over utf-16 to accompany the reversion
of utf-8 and utf-32 back up to 2 billion characters.
utf-88 is an interim surrogation scheme that can be used now
, and can be converted to an expanded utf-8 easily when the Consortium finally makes it official.
Why does utf-88 provide 1 million private use codepoints?
answered on 26 Jan 2017:
There are not enough private use codepoints in Unicode, only about 137,000, in 3 blocks (U+e000 to U+f8ff, U+f0000 to U+ffffd, and U+100000 to U+10fffd). If people want to design square-structured scripts like Unihan and Korean Hangul, they'll need many
more codepoints. utf-88 extends the third private use block to over 1 million codepoints, by redefining U+10fffe and U+10ffff from Nonchar to Private Use, and adding planes U+11xxxx to U+1fxxxx. It uses these codepoints not only because they're contiguous
but also because they're the remaining ones that can be encoded with 4 bytes under the original pre-2003 utf-8 encoding scheme as proposed by Rob Pike and Ken Thompson, thus showing respect for users of Unicode by providing this "prime real estate"
for their private use needs.
This third private use block contiguity also suggests how the Unicode Consortium might eventually define a second-tier surrogation scheme for utf-16. Because planes U+0xxxx to U+fxxxx have Nonchars for their last 2 codepoints, using such a plane in its entirety
for trailing surrogates (as utf-88 does) wouldn't work to encode the planes above this which won't have any Nonchars. The most likely scheme would use a single plane in its entirety for the leading surrogates (where it doesn't matter if the last
two codepoints aren't used), and the first half of another plane for trailing surrogates. If special purpose plane, U+Exxxx, is to be used, then the middle half of that plane (U+e4000 to U+ebfff) would be suitable the trailing surrogates, along with perhaps
all of plane U+dxxxx for the leading surrogates.
Why release an unfinished version of the CJK decomposition data? Why make the licensing so permissive?
duplicate of answer on 18 Mar 2011:
I'm hoping someone in China takes an interest in creating a standard set of decompositions for every CJK character in Unicode, and eventually submitting it to the Unicode consortium for inclusion in their property files. I want to encourage this to happen
with as permissive a licence set as possible. Eventually, I hope the Unicode consortium
includes all intermediate decompositions as standalone characters
26 November 2016
Warning: Some of the links are to old Groovy mailing list entries at Nabble, which nowadays redirect via ASF-controlled groovy-lang.org.
Let's look at the events leading up to my proclamation of the Groovist Manifesto
in March 2015. Groovy's creator James Strachan announced Groovy on 29 August 2003, but I myself didn't discover it until late 2004. I started using Codehaus
Groovy but didn't join the mailing lists. Strachan left the project because of creative differences
at GroovyOne on 28 Nov 2005
in his very last ever posting to the Groovy mailing lists a week later
. Soon after, new self-styled "Project Manager" Guillaume Laforge instituted an openness policy when
he went public with emails between himself and John Rose
about a spec for Groovy, saying "Groovy is an OSS project: that means discussions should be held in PUBLIC. It's out of question that discussions are held between a small set of persons.
I can't stand that, and I won't stand it furthermore.
" I then cautiously started posting on their mailing lists.
My recent wikiblog post Groovy's Cyberbully
explains much that happened from then to mid-2014. The openness politician Laforge had talked about
had been just a trick
. By then he'd derelicted his duty
to make a spec for Groovy by closing down the JSR process and generating a "spec" from their Codehaus implementation's codebase. John Wilson and Alex Tkachman were
just the most vocal two
who'd been given the runaround with their contributions. I had long since left the mailing lists to blog openly about Groovy's management problems, but that only
worsened their deception
. They accused me of acting out of bitterness and inconsistently,
spread their smears widely
, and even threatened legal or even criminal action against me. I then branched off on my own to rebuild Groovy, eventually landing on the Go platform, no longer caring about the JVM side the Groovy ecosystem.
In January 2015 VMware retrenched the 6 Groovy and Grails committers they'd employed, and only the Grails ones found another business to support their work. A month after that, when Laforge started bleeting about moving Codehaus Groovy to a foundation,
Cedric Champeau suggested that Groovy
branded nothing more than their
implementation when he published Who is Groovy?
They had obviously
conspired to restrict the Groovy Language brand
to whatever software they directly controlled, just like when they had 3 years earlier similarly
restricted the Groovy Language semantics
when they duplicated the functionality of Alex Tkachman's Groovy++ plugin. By then I knew I had to step up again to protect Groovy's branding, so anyone who builds any implementation on any platform could
use it. So in March, I clearly articulated the 3 articles of the Groovist Manifesto
, that Groovy implementations should be led by their technical people, applications using Groovy should not dictate its direction, and it should be standardized so many
various implementations can be built.
Over the last few months, I've pruned my blog here at Codeplex and my older one at
of abandoned technical directions specifically to free up the Vy brand, which I'm now reusing for the upcoming IME for foreigner-friendly pictorial input of Unihan. Over the next six months or so, I'm going through my
old blog entries again, purging them of any claim or suggestion that doesn't show a direct and consistent link
to one of the 3 articles of Groovy's manisfesto. Of course, the original text will still be available from the wiki history, but the
current content published in the Real Groovy wikiblog will be that which promotes a
standardized Groovy Language with many implementations, each led by technical people who are independent
from the applications which use them. Anything I publish will
flow logically from the aims of that manifesto
. Real Groovy will be as much an implementation of the Groovy Language as Apache Groovy, eventually becoming the new reference implementation!
27 October 2016; updated 5 November 2016
Early last year, I issued
the Groovist Manifesto
, a charter of 3 articles that deal with the root causes of Groovy's problems
, so we don't waste time dealing with the symptoms. Time for an update on their progress...
1. Groovy implementations should be led by their technical people
recently wrote how
the 4 people in the Groovy PMC who actually do the programming or testing on Groovy are
swamped by another 5 who don't do any, including the chairperson
, i.e. Guillaume Laforge, Jim Jagielski, Andrew Bayer, Konstantin Boudnik, and Roman Shaposhnik. The active code committers in Groovy's top leadership has decreased from 60% in Codehaus
days to 44% today. Although I've called on anyone who hasn't contributed a github commit or even a jira comment to Groovy in the past year to leave the Groovy PMC,
managers like these always cling to their power
so I don't expect article 1 to be fulfilled.
2. Applications using Groovy should not dictate its direction
Graeme Rocher is building a Grails consulting business at OCI, while continuing to
personally own Grails property
like the grails.org domain name. He's muscling in on Groovy as well, signing up Groovy PMC member Paul King as an OCI consultant. This looks like a move by Rocher to
fork Apache Groovy in the same way LibreOffice was forked from OpenOffice
. After moving as many of Groovy's techies as he can to OCI, he's going to fork and rename Groovy as something like the
. But unofficially, he'll command his minions to refer to it as
, in the same way he told people to talk about Groovy on Grails
after being forced to rename
Groovy on Rails
. And if Apache abandons the OpenOffice brand when they abandon OpenOffice, Rocher will claim it's a precedent in a grab for the Groovy brand as well as its codebase and committers.
But any Groovy committer considering working for Rocher at OCI had better realize any promise they can work on Groovy will become a broken promise. They'll be pimped out full-time to paying clients.
3. Groovy should be standardized for its various implementations
After being picked on as part of Rocher's grab for profits from G2One,Inc in 2007-08, then again when Laforge squashed
and my own GRegexes
in 2011-13, I decided to rebuild a better implementation of Groovy from scratch
, without trying to coordinate with the Apache Groovy committers on a spec. After settling on Go as the best implementation language,
I discovered that a plugin facility is lower level than dynamic typing syntax. I'm therefore now working towards a 1.0 release of
, using Unihan for statements, special identifiers, package names, and commands.
will use Unihan for macros also, and a dynamically-typed language plugin called
will rigorously test out the interface to the plugin.
Later on, an IME (input method editor) called Vy
will provide foreigner-friendly pictorial input of Unihan
. I've now finished purging my old blog entries of redundant uses of the name
. Groo and Vy packaged together
will be called Real Groovy
as the reference implementation of the Groovy Language.
20 October 2016
A minor release of Grolang is out (0.5), with some minor updates such as changing the Unihan character for
Gro's github home page
gives an overview of how to install and run Gro.
An intro page at godoc.org
describes features of Gro's syntax. All the features described on that page have been implemented, and will only change if absolutely necessary.
Implemented features have been motivated by:
- writing programs with no whitepace. I believe whitespace in programming syntax is essentially like colors used by IDE's to markup the syntax. It should be added by the display software, not embedded in the raw code.
- using Unihan in the syntax to make programs shorter. The Simplified Chinese subset of Unihan is used because many programmers know it, and it can be easily entered via IME's.
- allowing many packages and source units to be stored in a single file. This makes browsing programs easy when the source code is short.
- inferring boilerplate code. package main,
func main(), and missing common imports are inferred. This makes conceptual one-liners be actual one-liners.
- seemlessly exposing the underlying platform. Go code can be embedded within Gro code without any special markup syntax. The only restriction is that the Go code can't use Unihan in its identifier names.
Rudimentary features are motivated by:
- providing a command line facility. This includes both a Lisp-like REPL and a J-like tutorial mode. The response-time of the current REPL is slow because the program needs to be recompiled for every statement entered.
- enabling plugins. Gro plugins can define macros for new statements, expressions, and types, as well as enforce a blacklist on use of existing Gro statements, expressions, and types. The response-time is slow for now because the program needs to be recompiled
when a plugin is called.
- building dynamic typing directly into the syntax. This is achieved via a plugin.
Gro brings brevity to the Go language via the Unihan IME, bringing Go back to its command-line roots.
25 August 2016
Last week, Go 1.7 was released. We celebrate by releasing the next version of Gro. It's:
- rewritten as a fork of the faster Go-based recursive descent parser that was first used in Go 1.6, formerly called Qu
- split into two projects: Gro for the Unihan-based vocabulary and macro system, and Groo for the dynamically-typed plugin
- moved from my personal github account to its own at
github.com/grolang, so the former utf88, kern, thomp, gro, and qu are deprecated
Here's some doco on the language
I split grolang into two (i.e. Gro and Groo) because I discovered that dynamic typing in Go could be modelled as macro-based addons to a core language. Hence:
Go -> Gro -> Groo
Groo(lang) is a member of the Groo family of languages, which now includes Apache Groovy, GrooScript, and Grooid.
If there was an edition of Go without garbage collection, and heap memory (pointers, slices, maps, interfaces, functions, and channels)
could be deleted using something like
, then perhaps that language would be called "G":
G -> Go -> Gro -> Groo
Whatever would come next in the hierarchy?
see gavingroovygrover.tumblr.com for originals of deleted copies