Gavin Grover's GROOVY Wikiblog

6 May 2017

Groovy shazam!

Late last year I explained how the Groovy Language rebuild consists of 3 primary components, i.e. Grolang, Groo(lang), and Vy, with the logo for Grolang chosen 2 years ago. The logos for Groo and Vy have now been finalized...


Groo(lang) will be a dynamically-typed module running atop the Gro language engine, which converts a .gro file into directories of .go files. Groo hooks into Unihan-based annotations in the Gro parser using macros.


The Vy IME will enable programmers to enter all Unihan characters visually. The underlying data is a fork of my earlier UnihanDecomp decompositions of all 80,000 odd Unihan codepoints in Unicode, but with a new license.

Two years ago, I received legal advice that "having the guts to use my own name when exposing anonymous cowards" can expose me legally and gives them more ammo for spinning their nasty stories, and that everything I publish online must link directly and consistently to some clearly articulated aims. I thus proclaimed the 3 articles of the Groovist Manifesto because I see what they address as the root causes of Apache Groovy's problems.

Because Codeplex is closing down in a few months, rather than rushing to vet and fix up my older wikiblog entries to show such links, it's easier to remove many of them and republish them updated somewhere else at a more leisurely pace. So everything you've seen disappear since those outdated technical ones will reappear, linking directly to the articles of Groovy's Manifesto.

17 February 2017

Go Symbology

I listed all the symbols used by Java and Apache Groovy 9 years ago when I was using them for programming. Now that I'm using Go, I'll do the same for Go here.

Go, like many programming language and toolchain systems, is made up of many smaller languages hashed together. The Go language spec describes the containments that can occur between the toplevel structure (i.e. the sequence of package, import, type, var, const, and func keywords) and the lexical syntax (i.e. strings, numerics, identifiers, symbols, and comments). It's almost cleanly divided into the 3 distinct syntactic sub-languages of statements, expressions, and types, with clearly defined boundaries between them. Statements can contain both expressions and types, as well as recursively contain other statements. Expressions, too, can contain statements, other expressions, and types. And types can contain expressions and other types. The doc syntax within comments has it's own syntax, another little sub-language. Commonly used packages in the standard library also have their own distinct syntaxes, such as regexps, f-formats, and templates. And the commands in the Go toolchain have yet another.

Let's look at all the symbols in each sub-language to see the real complexity of the Go language. Like last time, I've ignored alphanumerics used as symbols, such as in \t, 0xFF or \P{Greek} , or there'd be too many to list.

Lexical items

/* */ comment
// comment until end of line
` ` quoted string
" " quoted string with escapes
' ' character
\ escape in string and character
_ in identifier names
. in floats and complex numbers
+ in float exponents
- in float exponents
; optional statement separator
spaces, tabs, and newlines for whitespace

Perhaps ``` ``` will be added in a future release of Go.


( ) in import, type, const, and var specs
. in imports
_ in imports
= in const and var declarations
, in const, var, and type declarations


( ) in parenthesized expressions, type assertions, and calls
. to qualify identifiers
[ ] for indexing
... in calls and array indexes
{ } in literals
, in literals
: in struct and map literals, and array indexes
|| for short-circuit or
&& for short-circuit and
== for equals
!= for not equals
< for less than
<= for less or equal
> for greater than
>= for greater or equal
+ for add or unary positive
- for minus or unary negative
& for addressing and bitwise and
* for pointer contents and multiplication
| for bitwise or
^ for bitwise xor and bitwise complement
/ for division
% for mod
<< for left shift
>> for right shift
&^ for bitwise clear
! for boolean not
<- for channel reads


{ } in blocks
:= in short variable declarations
, in short var decls and case lists
: in labels and switch/select statement clauses
<- for channel sends
++ and -- for increments
= for assignments
+= for add assignments
-= for minus assignments
&= for bitwise and assignments
*= for multiplication assignments
|= for bitwise or assignments
^= for bitwise xor assignments
/= for division assignments
%= for mod assignments
<<= for left shift assignments
>>= for right shift assignments
&^= for bitwise clear assignments
; in for and if statement headers


[ ] in array, slice, and map types
* in pointers
{ } in structs
( ) in interfaces and functions
. for referencing other types
, in structs and function parameters
... in function parameters
<- in channels

Note * has the opposite meaning when used in a type than when used in an expression.

regexp package

. match any character
[ ] character class
^ character class not; match beginning of line/input
- character class range; unset flags
\ various escapes; quote following character
[: :] ASCII character class
{ } Unicode character class, match exact number of times
| alternation
* greedy repetition (zero or more)
+ greedy repetition (one or more)
? greedy option, also reluctant adverb
{ , } match in range of times
( ) capturing group
(?: ) non-capturing group, and setting flags
(?:P<name> ) named non-capturing group
(? ) setting flags
$ match end of line/input
\x{ } hex character code

fmt package

% to indicate insertion points
# alternative format
. for formatting floats
+ sign for numerics
- pad with spaces
[ ] for parameter repetitions
* for width or precision insertion
space for elided sign in numeric

template package

{{ }} code escape
. field reference
{{- trim preceding whitespace
- }} trim trailing whitespace
{{/* */ }} comment
" " named template
$ variable name
( ) parenthesized actions
:= variable capture
| piped actions

Doc comments

+ in build directives
- in build directives
: in go generate
= in go generate
indentation and blank lines in doc comments


- in flags
-- in various flags
= in various flags
/ in directory paths
, to separate args
. in directory paths and go doc arguments
.. in directory paths
... in directory paths

artifact names

_ in filenames

The quantity of these punctuation and other symbols give an idea of how complicated the grammar of Go really is, before we even consider the role of alphanumerics. But even with all this syntactic complexity, it's still only about half of Java or Groovy's syntactic complexity.

See earlier entries

Last edited May 6 at 6:12 AM by gavingrover, version 3


No comments yet.