Gavin Grover's GROOVY Wikiblog

See later entries

16 January 2017; updated 26 Sep 2017

Designing Gro

My previous post on Go keywords from 17 December 2016 shows how all keywords could be removed from a future version of Go. The keywords that take first position in a statement, such as if and for, could keep their roles without actually being keywords if bare statements weren't allowed. This would need something like the word do being required before all assignments, definitions, and function calls. The downside is that simple stuff like short variable declarations are a little wordier. So:

func dream () {
	do a := 123 //instead of: a := 123
	do callMe(a) //instead of: callMe(a)
	do package := a //instead of a syntax error
}


Gro has the same dilemma if we want it to use any lower-case-initial name for identifiers, but also if we want to use extra keywords, such as assert, to begin statements in our syntax. It's difficult to modify the syntax to allow, say, assert to be used as a keyword yet continue to allow it as an identifier.

At first sight, we don't have this dilemma for new top-level keywords because only 6 keywords, import, func, type, const, var and package, are allowed to begin top-level declarations. But we do have it here because Gro intends to extend Go in the same way Apache Groovy extends Java, which means we want to allow all statements to be placed at the top level, so they'll be automatically wrapped inside a main () or init () function. If we want to allow new special identifiers, which are "keywords" at the expression level, or new type aliases, it's almost impossible unless we make multiple passes through the parser or fiddle with the AST afterwards.

Of course we could extend the Go syntax differently by introducing new symbols and punctuation, but we also want Gro to be future-proof against possible future expansions to Go's version 1.x syntax. Go 1.9 expands on the Go syntax by introducing an = in type declarations, and Go 1.10 to Go 1.99 could expand on Go's syntax in many various unpredictable ways with new symbols and punctuation. It seems impossible to have a syntax for Gro that both allows existing and plausible future Go 1.x code to be embedded seamlessly in it, and enables many new keywords to begin declarations, statements, built-in functions, and types. Let alone allowing any lowercase-initial name to be used as an identifier.

Gro solves this dilemma by prohibiting Unihan in identifier names in the grammar. So myName is valid in both Go and Gro, but my名 and 性名 are invalid in Gro. Virtually no-one uses Unihan in identifiers anyway, only inside strings and comments, so in practise this shouldn't be a problem for anyone wanting to program in Gro. (When Unihan is required in an identifier name, use 引, e.g. 引"X世界" to represent X世界.) We thus free up all the over 75,000 Unihan in Unicode for use in the Gro grammar for all sorts of extra uses, and enable Go 1.x code, both present and what's plausible in the future, to be embedded in Gro code.

One extra use for the Unihan is to enable all possible Gro code to be written without any whitespace at all. Another use is to have a more intuitive mapping between lexical tokens and semantic use. So all lowercase-initials are local variables, all uppercase-initials are exported names, and all Unihan are syntactic control tokens.

The .gro code for the code above is:

功dream(){
	a:=123
	callMe(a)
	做package:=a //做 required here because Go keyword used as identifier
}


which could be stored in the .gro file as:

功dream(){a:=123;callMe(a);做package:=a}


and only formatted with whitespace when it's displayed, at the same time it's colorized.


26 September 2016; updated 17 December 2016

Go keywords

The Syntax section of Golang's first committed draft of its spec says:

The syntax of Go borrows from the C tradition with respect to statements and from the Pascal tradition with respect to declarations. Go programs are written using a lean notation with a small set of keywords, without filler keywords (such as 'of', 'to', etc.) or other gratuitous syntax, and with a slight preference for expressive keywords (e.g. 'function') over operators or other syntactic mechanisms. Generally, "light" language features (variables, simple control flow, etc.) are expressed using a light-weight notation (short keywords, little syntax), while "heavy" language features use a more heavy-weight notation (longer keywords, more syntax). (Emphasis added)

Golang eliminated many keywords that are in other languages such as Java and C# by changing many of them to special identifiers that can be reassigned to:

true := false

so the Go lexer and parser only needs 25 keywords. It's good that programmers who don't use clunky IDE's only need to remember not to use 25 names of identifiers, rather than to not use 52 or even 78 of them. But the ideal is no keywords. Because the first draft of the spec says Go has a "slight preference for expressive keywords over operators or other syntactic mechanisms", one wonders if someone intends for the Go 2.x syntax to eliminate all keywords. There's 6 symbols @ # $ ~ ? \ not used by Go 1.x so perhaps they're reserved for replacing keywords in Go 2.x.

Gro would be easier to implement if there were no keywords in Go because it wouldn't need to add an underscore to the front of names when generating Go code.

Some of Go's 25 keywords are already superfluous, e.g. if map were absent, the correct semantics could still be deduced by the Go parser. Other keywords could be replaced with new operators composed of existing symbols, e.g. <-chan could be replaced by <-, chan<- by ->, and chan by <->. Yet others could be replaced by the new symbols, e.g. struct could be replaced by # and interface by @.

Because the 6 toplevel keywords, import, func, type, const, var and package, are the only identifiers that can take first position in a toplevel declaration, they could keep their roles without actually being keywords if their other uses, type in a switch statement and func in a function literal, were removed. .(type) is superfluous because it can be inferred if types are used as cases. func in a function literal could be replaced by $. The package keyword could even be made into a directive comment, like the //+build directive comment, because it's not really part of the language semantics. Identifiers could be named, say, const and there'd be no confusion with the use of const in first position of a toplevel declaration.

The 11 keywords that take first position in a statement, switch, if, for, select, go, defer, break, continue, return, fallthrough and goto, could similarly keep their roles without actually being keywords if bare statements weren't allowed. Perhaps the word do could be required before all assignments, definitions, and function calls. First position keywords, case and default, could also keep their roles. The mid-statement keywords, if's else and for's range, are superfluous. The if and switch keywords could even be merged into one.

I've only offered a few rough suggestions to show that it's feasible to eliminate all the keywords from Go so identifiers could use any lowercase-initial name. I suspect the core Go developers already have their own ideas for the Go 2.x replacements.


10 October 2016

Groovy's (and Go's ?) TIOBE Fraud

Apache Groovy PMS chair Guillaume Laforge boasts in the _Health_ section of his May 2016 report to the quarterly ASF Directors Meeting: "Since the beginning of the year, Groovy has been in the TIOBE programming language index in the top 20 most popular languages. This month of May, Groovy is ranked 17th most popular language."

The TIOBE Index for October 2016 opens "Who are the candidates for the title of programming language of 2016? There are only 2 languages with an increase of more than 1% if compared to the same period last year, i.e. Go and Groovy. Note that Groovy ended 2015 with a bang, so its annual growth will be much less around January 2017. Google's Go language seems to be unrivalled."

The bang Groovy ended 2015 with was a 2-month jump from 0.328% to 1.182%. Go jumped from 0.160% to 1.625% in a recent 2-month period. Are these languages really jumping in popularity?

I've become concerned someone will boast of Go's position in the TIOBE rankings, and later embarrass the Go developers, so I've finally taken 15 minutes to see what's going on by using a subset of search engines and languages in TIOBE's calculation instructions...

search term bing.com baidu.com wikipedia.org "content" yahoo.com
Groovy 41,500 596,000 18 7,200
GPATH 1 5,690,000 2 0
GSQL 0 2,300 2 151,000
Groovy++ 6 895,000 18 2
Groovy TOTAL 41,507 7,183,300 40 158,202
Go 458,000 18,400,000 81 53,500
Golang 35,600 340,000 2 3,230
Go TOTAL 493,600 18,740,000 83 56,730
Python 1,930,000 5,930,000 351 358,000
Ruby 549,000 5,320,000 162 73,300
Java 3,750,000 7,550,000 1,026 649,000
Scala 133,000 1,630,000 41 133,000
Clojure 58,500 536,000 12 6,090
Kotlin 16,200 69,500 9 2,540


What we see here are some extreme irregularities in TIOBE's calculations for Go and Groovy:
  • +"GPATH programming" giving exorbitantly high figures for Groovy in Baidu
  • +"Groovy programming" in Baidu counted twice (also as +"Groovy++ programming")
  • +"GSQL programming" giving exorbitantly high figures for Groovy in Yahoo
  • +"Go programming" giving exorbitantly high figures for Go in Baidu

The most accurate rankings seem to be in Bing, which puts Groovy a little lower than Clojure. For Laforge to boast of Groovy's Top 20 ranking in TIOBE to the Apache Directors when it's probably below #50-ranked language Clojure is a major distortion. It's unlikely he didn't know about the erroreous calculation, which would make his distorted report a deception. Let's hope the Go developers don't bring a similar shame on themselves by quoting TIOBE.


2 October 2016

Purify Groovy's PMC

When James Strachan founded the Groovy language at Codehaus in 2003, technical people made up 100% of its top leadership. After programmer Jochen Theodorou and manager Guillaume Laforge brought the despotry total to three in 2005, that proportion dropped to 67%. In 2010 when the number of despots increased to five, consecutive non-participators Codehaus admin Ben Walding and Grails rep Graeme Rocher brought the proportion down to 60%.

When Groovy switched to being managed by the Apache Software Foundation (ASF) a year ago (Nov 2015), the number of project managers increased to nine, and only four of them (Theodorou, Paul King, Cedric Champeau, and Pascal Schumacher) have participated in its technical development since then. None of the other five Project Management Committee (PMC) members have any history of participation in the commits or notifications mailing lists (for Github commits or Jira changes) since Groovy moved to Apache. The proportion of technical people in Groovy's top leadership has thus dropped to an all-time low of 44%.

When VMware retrenched its Groovy and Grails developers in March last year (2015), I could sense this travesty coming. The only reason Laforge survived an overthrow by Rocher is because he personally owned the groovy-lang.org domain name at the time. To protect himself against those who were actually doing the technical grunt work building Groovy, Laforge found four like-minded "business" people at the ASF (i.e. Jim Jagielski, Andrew Bayer, Konstantin Boudnik, Roman Shaposhnik) to sponsor Groovy into the Apache incubator. When Groovy became a top-level project, the 5 politicians outnumbered the 4 real contributors in Groovy's PMC, and Laforge got voted in as chairperson.

It was because of that very scenario I issued article 1 of the Groovist Manifesto in March last year, that Groovy should be managed by its technical people. I hereby call on everyone who hasn't contributed a github commit, or even a jira comment, in the past year to unilaterally leave the Groovy PMS now! You have no business being there. Let the next election for PMC chair be voted on by its technical contributors only, and with candidates who actually do the hard work improving Groovy's codebase.

Repository timeline.gif


written: May 2016; published: August 2016

Apache Groovy's continuing fabrications

One year ago, Apache Groovy changed its website-based downloads from Codehaus to Apache (which redirects to groovy-lang.org, addressing an IP currently hosted in Germany). Their website says "all downloads (except the source) are hosted in Bintray's Groovy repository"...

DL Fakery 1 groovy-lang.org download.jpg

which means clicking on download at groovy-lang.org forwards the download request to Bintray. One of Groovy's backers claimed anonymously in a reddit comment that this was the reason for the sudden spike in downloads in May 2015 on Bintray...

DL Fakery 2 bintray downloads march 2016.jpg

..., over 80% of which come from an IP address hosted in Germany...

DL Fakery 3 bintray downloads to July 2015.jpg

Of course, we don't really know how many of the Bintray downloads are genuine user requests on groovy-lang.org, or if they're generating ten downloads for every one real user, or even fifty for every one, but the Apache Groovy backers deserve the benefit of the doubt. Until, that is, groovy-lang.org went down for 2 days one recent weekend...

DL Fakery 4 MailList Groovy Down.jpg

Looking at the Bintray download activity for Groovy on those days makes an interesting picture...

DL Fakery 5 Bintray May June 2016 one month.jpg

Not a single dent in the download numbers for that weekend, despite the groovy-lang.org domain name being down for 2 whole days. Wherever did those requests come from? The most likely explanation is that the German-based downloads are being generated by a timer script running on the machine being pointed to by groovy-lang.org, and genuine download requests are few.

Also interesting in that picture is the overall trend for the non-German downloads -- they start trending downwards a month ago, just about when Jetbrains and Gradleware announced their partnership to make Kotlin instead of Groovy the prefered language for build scripts and plugins in Gradle 3.

Groovy's "popularity" is a fabrication.


22 August 2016

Groovy Dilemma

(this content originally published in January 2010, but recovered here from gavingrover.blogspot.com)

In chapter 7 of Steven Pinker's 1994 book The Language Instinct, he gives an example of a perfect right-branching sentence:

Remarkable is the rapidity of the motion of the wing of the hummingbird.


This is parsed in the human brain as shown by the parentheses:

(Remarkable (is (the (rapidity (of (the (motion (of (the (wing (of (the (hummingbird))))))))))))).


remarkable is the subject, the remainder is the predicate. is is the main verb, the remainder is its object (here, called the complement). the is the article, the remainder is its referent. rapidity is a phrasal head, the remainder is a prepositional phrase as tail. of is a preposition, the remainder is its tail in the phrase. And so on.

Pinker gives another example easy for the brain to parse, one that includes relative and subordinate clauses:

(He gave (the candy (to the girl (that (he met (in New York) while (visiting his parents (for ten days (around Christmas and New Year's)))))))).


He rearranges it so its far harder for our minds to parse:

(He gave (the girl (that (he met (in New York) while (visiting his parents (for ten days (around Christmas and New Year's)))))) the candy).


The direct object the candy after the many closing parentheses forces our short-term memories to keep track of dangling phrases that need particular words to complete them. It seems our brains, unlike computers, can only remember a few dangled branches when parsing sentences.

Perhaps that's why the Lisp code that's easiest for humans to read ends with many closing parens, such as this tail-recursive sample from chapter 2 of Paul Graham's On Lisp:

(defun our-length (lst)
  (if (null lst)
      0
      (1+ (our-length (cdr lst)))))


Left-branching sentences are also easy for humans to parse. Pinker gives another example with two arrangements, one harder for humans to parse:

((The rapidity (of the motion (of the wing (of the hummingbird)))) is remarkable).


and the other, a perfect left-branching sentence, easy:

(((((The hummingbird)'s wing)'s motion)'s rapidity) is remarkable).


English has just a few left-branching structures, but some languages, such as Japanese, are primarily based on them.

One of the universals in Universal Grammar theory, which both Pinker and Noam Chomsky support, is that if a language has verbs before objects, as English does, then it uses prepositions, while if a language has objects before verbs, as Japanese does, it uses postpositions. Pinker mentions a possible reason this universal holds is so the language can enforce a consistent branching decision, either left-branching or right-branching, so our brains can parse it easily.

Some grammatical English sentences are impossible for our brains to parse simply because there's too many dangling branches. The first of these examples parses in our brains OK, but the other two simply don't parse:

(The rapidity (that the motion has) is remarkable).
(The rapidity (that the motion (that the wing has) has) is remarkable).
(The rapidity (that the motion (that the wing (that the hummingbird has) has) has) is remarkable).


They do parse in computer languages, though. When I discovered closures in Groovy, I started using this type of unreadable embedding, but I now realize I should be making my code either all left-branching or all right-branching to make it more readable.


21 August 2016; last updated 26 Sep 2017

Technical Blogs 2007-2016

Lately I deleted some blog entries from gavingrover.blogspot.com, my blog from 2007 to 2016. They were technical entries describing earlier attempts at building Real Groovy, but which cluttered up the longer lasting entries, hence the deletion. Here's a quick summary of what they said...

On 6 April 2007, I announced Grerl, a preprocessor for the Groovy Language, being "the spirit of Groovy with the clothes of Perl". It was to enable programmers to code in their favorite syntactic shortcuts, natural language, and formatting, but also to convert their code to a standard Groovy source listing when giving it to others to read. It would've provided a context-sensitive lexical macro system. I was using JSE (Java Syntactic Extender) as a starting point to implement such a macro system. Grerl would've also enabled us to use Groovy's MOP to alias names in programs into other natural languages, and even map foreign words and CJK characters to each English word in an identifier, rather than directly to the whole identifier name.

On 9 August, I switched to writing a GroovyASTBuilder to make it easier to generate AST nodes while parsing the program source of any syntax. The end result, being written in dynamically typed Groovy, wouldn't have been nearly as fast as the tools written in static Java, though in future compiling Groovy in Groovy may have become a good benchmark. For the lexer and parser, I switched to JParsec, a context-sensitive combinator parser to allow both more readable code and syntactic mutability, and on 9 November, branded it Vy. On 16 December, I started designing Grerl-Vy's syntax to be aimed not at Groovy programmers but at others, even non-Java programmers.

By then (late 2007), I had realized a cohort made up of parties from the Groovy/Grails community, from Australia/New Zealand, and from mainland China were already surveilling me in my apartment as I worked on Grerl-Vy. My subsequent blog entries changed in tone to reflect this situation. On 19 January 2008, I defined Grerl-Vy to be all the software that fits completely between a graphical editor and the Groovy AST, and so replacing both clunky IDE's and IME's for CJK characters. On 14 February, I started building Grerl-Vy again from scratch, but using my own combinator parsing library in Groovy, using Ken Barclay's GParsec as a starting point, instead of JParsec. And on 25 March, I renamed GrerlVy to GroovyScript.

On 19 May, I started experimenting with using Scheme to create a Groovy-like syntax, called Ghreme. On 28 September, I started looking at the feasibility of porting Groovy to .NET, using the then-nascent DLR as the target. By mid-2008, although I was being encouraged publicly for my contributions to the Groovy ecosystem including documentation, I was getting a different message from somewhere through backchannels, telling me to "stop messing with the brand". I didn't know for sure where the antagonism was coming from, nor could I prove it.

In May 2009, I switched back to the JVM, using Scala for its type-checking and greater functional paradigm, and renamed what I was building Groovier. On 9 July, I switched from the Groovy AST to the Scala parse tree as the target. On 9 September, I called this GroovyScala, and on 27 September, Groovy 2.0. And on 31 October, I announced Strach, an IME (input method editor) to help programmers enter all Unicode characters for Groovy.

On 4 November, I switched back to targeting the Groovy AST, using the new Groovy 1.7 ASTBuilder, but still writing it in Scala. On 1 January 2010, I decided to bust open Groovy by stripping out the added cruft and splitting up core components such as the MOP and DGM (default Groovy methods) so programmers could use Groovy's functionality from other JVM-based languages. On 3 March 2010, I decided to switch to the newly-announced Groovy++ for writing the lexer/parser code targeting Groovy's ASTBuilder. This is how I finished logging my technical progress on gavingrover.blogspot.com at the end of 2010.

I ended 2010 targetting the GroovyASTBuilder with a grammar parsed by combinator parsers written in Groovy++, an independent plugin to Groovy. By early 2011, I'd started work on an annotation-based plugin to Groovy, written in the speedier Groovy++, so those combinator parsers could be specified using strings and operators in Groovy's grammar, similar to Groovy's GStrings. I named them GRegexes, and branded the combined distro of Groovy, Groovy++, and GRegexes as Real Groovy. By mid-2011, the Codehaus Groovy despots had responded with two products that cloned Groovy++ and GRegex's functionalities: Grumpy and GParsec. Grumpy later became the static typing facility in Groovy.

In early 2012, I tried rebooting Groovy again by writing it in Clojure, and simply calling it the Groovy Language. With this name, I was challenging Guillaume Laforge's claim that Groovy names the codebase of the Codehaus implementation rather than the language specification begun by James Strachan. By mid-2012, my Clojure-based rewrite of Groovy was testing successfully on both the JVM and ClojureCLR. Then in early 2013, I changed direction by going off the JVM and attempting to reboot Groovy in Haskell.

On 28 April 2013, I'd returned to Clojure and had named it Grojure. On 17 November, I released Real Groovy 0.10 which bundled both Grojure for dynamic typing and Kotlin for static typing. By 2014, I'd abandoned Grojure, and switched from the JVM to Go as the implementation platform for the Groovy reboot. In May, I announced GroovyCode to be an extension to Unicode's UTF-8 encoding, and Ultracode to be a 6-bit language embedded in the 10xxxxxx bytes that follow a 11111110 byte.

In late 2014, I renamed GroovyCode to UTF-88, and released a fully functioning Go package implementing it. By then I'd decided to reboot Groovy module by module, using Go as the platform. In 2015 I released Gro 0.1, incorporating utf-88. I released package kern for combinator parsing in October 2015, and thomp for dynamic typing in February 2016. In April 2016, I released Qu, a recursive descent parser based on that in Go 1.6, enabling Unihan for keywords, special identifiers, and package names. I then rebuilt Gro using Qu as a base, and bundled it all as grolang on its own github.com account.

See earlier entries

Last edited Sep 25 at 7:22 PM by gavingrover, version 24

Comments

No comments yet.