Gavin Grover's GROOVY Wikiblog

16 October 2017

Gro 0.7

Following on the heels of Go 1.9.1's release comes Gro 0.7, the exciting new scripting language for Go. It's been rewritten, based on Go's cmd/compile/internal/syntax package, much nicer than the go/... code in the Go standard library. It's also been redesigned, with many core features removed, such as the Unihan keywords. This also happened in last year's 0.5 release, when the dynamic typing features were forked into Groo.

The core of Gro is now restricted to a small handful of features, in the same spirit as the minimal style of Go. Further development will focus on adding macros, which will allow Go developers to add their own top-level keywords to Gro. I'll also use those macros to re-add dynamic typing (groo/dyn) and Unihan keywords (groo/han) to Gro, so developers will have the option to use or not use them.

Groo aims ultimately to supplement Go's functionality similar to how the original Groovy supplemented Java, but in a way that makes sense for Go. Gro, however, will be a more minimal language that developers can grow if they want to.

Groovist Manifesto updated

The Groovist Manifesto article 2 is expanded to reflect a two-way street of responsibility:
  • 1. Groovy implementations should be led by their technical people
  • 2. Applications using Groovy should not dictate its direction, and Groovy should not dictate the direction of its host platforms
  • 3. Groovy should be standardized for its various implementations
For Gro/Groo, this means we'll show respect for the Go language, letting those who did the work building it decide how to evolve it. Gro and Groo will simply enable Go to grow, whatever Go decides to be.

groovy.codeplex.org closing

Codeplex is going read-only in November (2017) and lightweight soon after that, so perhaps this is the last wikiblog entry here. I will continue this wikiblog at gavingroovygrover.tumblr.com, which I've previously used to talk about Gro/Groo for 2 yrs until May 2016.


23 August 2017

Groovy Japandarin

Over the past few years I've become convinced that the Groovy Language Unihan-based reboot must utilize Japanese Kanji as well as Simplified Chinese Characters, treating them both equally. After reading up on Kanji, I visited Japan to get a sense of how they're used in everyday life, looking at signage in the streets and such stuff. What I didn't expect, though, were the many Kanji-only signs and notices around, particularly in the tourist areas of Tokyo, and that I could easily understand their meaning, having already learnt to read intermediate-level Mandarin.

Before going back for a second, recent, trip to further sniff out the Kanji, I learnt the sounds of the 100-odd Kana -- though not any meanings of the words, except for some English-sourced katakana words and common hiragana particles and stuff. So being able to pronounce the Kanji in Mandarin but not in Japanese, and able to pronounce the Kana in Japanese, I experienced an odd sensation when walking the streets of Tokyo last month. Whenever I saw a sign, I would vocalize it in a combination of Mandarin and Japanese. So I would verbalize this sentence (from Wikipedia) "ラドクリフ、マラソン五輪代表に 1万m出場にも含み" as "Radokurifu, Marason wulun daibiao ni, yiwanmi chuchang ni mo hanmi" without thinking too hard about it. I can also guess some of the meaning from the Kanji characters. I would have thought this was a common occurence among many Chinese visiting Japan, but perhaps not -- maybe Chinese people only either learn Japanese well or learn none of it at all. I'm calling this speech Japandarin, or 华和语言.

I've also discovered that some of the Japanese simplifications are better than the mainland Chinese ones. Most Japanese ones make their way back into Chinese script, like 仏 for 佛, but others haven't, like the Japanese simplification 仮 for 假, meaning false. Because the Unihan in Groolang won't be specific to any particular script, we'll use the Japanese 仮 as the preferred character for false because it's easier for programmers of every country to learn -- and of course 真 will still mean true. Most of the prefered Han characters in Groolang will be simplified Chinese ones, though, simply because they're the most simplified. The more complex alternatives will still be accepted though, so for for, the preferred character will be 为, but both 為 and 爲 will still compile.

So programmers will be able to vocalize Groolang with Japandarin!


6 May 2017

Groovy shazam!

Late last year I explained how the Groovy Language rebuild consists of 3 primary components, i.e. Grolang, Groo(lang), and Vy, with the logo for Grolang chosen 2 years ago. The logos for Groo and Vy have now been finalized...

 
groo.jpg

Groo(lang) will be a dynamically-typed module running atop the Gro language engine, which converts a .gro file into directories of .go files. Groo hooks into Unihan-based annotations in the Gro parser using macros.

 
vy.png

The Vy IME will enable programmers to enter all Unihan characters visually. The underlying data is a fork of my earlier UnihanDecomp decompositions of all 80,000 odd Unihan codepoints in Unicode, but with a new license.

Two years ago, I received legal advice that "having the guts to use my own name when exposing anonymous cowards" can expose me legally and gives them more ammo for spinning their nasty stories, and that everything I publish online must link directly and consistently to some clearly articulated aims. I thus proclaimed the 3 articles of the Groovist Manifesto because I see what they address as the root causes of Apache Groovy's problems.

Because Codeplex is closing down in a few months, rather than rushing to vet and fix up my older wikiblog entries to show such links, it's easier to remove many of them and republish them updated somewhere else at a more leisurely pace. So everything you've seen disappear since those outdated technical ones will reappear, linking directly to the articles of Groovy's Manifesto.


17 February 2017; updated 26 Sep 2017

Go Symbology

Go, like many programming language and toolchain systems, is made up of many smaller languages hashed together. The Go language spec describes the containments that can occur between the toplevel structure (i.e. the sequence of package, import, type, var, const, and func keywords) and the lexical syntax (i.e. strings, numerics, identifiers, symbols, and comments). It's almost cleanly divided into the 3 distinct syntactic sub-languages of statements, expressions, and types, with clearly defined boundaries between them. Statements can contain both expressions and types, as well as recursively contain other statements. Expressions, too, can contain statements, other expressions, and types. And types can contain expressions and other types. The doc syntax within comments has it's own syntax, another little sub-language. Commonly used packages in the standard library also have their own distinct syntaxes, such as regexps, f-formats, and templates. And the commands in the Go toolchain have yet another.

Let's look at all the symbols in each sub-language to see the real complexity of the Go language. Like last time, I've ignored alphanumerics used as symbols, such as in \t, 0xFF or \P{Greek} , or there'd be too many to list.

Lexical items

/* */ comment
// comment until end of line
` ` quoted string
" " quoted string with escapes
' ' character
\ escape in string and character
_ in identifier names
. in floats and complex numbers
+ in float exponents
- in float exponents
; optional statement separator
spaces, tabs, and newlines for whitespace

Perhaps ``` ``` will be added in a future release of Go.

Top-levels

( ) in import, type, const, and var specs
. in imports
_ in imports
= in const and var declarations
, in const, var, and type declarations

Expressions

( ) in parenthesized expressions, type assertions, and calls
. to qualify identifiers
[ ] for indexing
... in calls and array indexes
{ } in literals
, in literals
: in struct and map literals, and array indexes
|| for short-circuit or
&& for short-circuit and
== for equals
!= for not equals
< for less than
<= for less or equal
> for greater than
>= for greater or equal
+ for add or unary positive
- for minus or unary negative
& for addressing and bitwise and
* for pointer contents and multiplication
| for bitwise or
^ for bitwise xor and bitwise complement
/ for division
% for mod
<< for left shift
>> for right shift
&^ for bitwise clear
! for boolean not
<- for channel reads

Statements

{ } in blocks
:= in short variable declarations
, in short var decls and case lists
: in labels and switch/select statement clauses
<- for channel sends
++ and -- for increments
= for assignments
+= for add assignments
-= for minus assignments
&= for bitwise and assignments
*= for multiplication assignments
|= for bitwise or assignments
^= for bitwise xor assignments
/= for division assignments
%= for mod assignments
<<= for left shift assignments
>>= for right shift assignments
&^= for bitwise clear assignments
; in for and if statement headers

Types

[ ] in array, slice, and map types
* in pointers
{ } in structs
( ) in interfaces and functions
. for referencing other types
, in structs and function parameters
... in function parameters
<- in channels

Note * has the opposite meaning when used in a type than when used in an expression.

regexp package

. match any character
[ ] character class
^ character class not; match beginning of line/input
- character class range; unset flags
\ various escapes; quote following character
[: :] ASCII character class
{ } Unicode character class, match exact number of times
| alternation
* greedy repetition (zero or more)
+ greedy repetition (one or more)
? greedy option, also reluctant adverb
{ , } match in range of times
( ) capturing group
(?: ) non-capturing group, and setting flags
(?:P<name> ) named non-capturing group
(? ) setting flags
$ match end of line/input
\x{ } hex character code

fmt package

% to indicate insertion points
# alternative format
. for formatting floats
+ sign for numerics
- pad with spaces
[ ] for parameter repetitions
* for width or precision insertion
space for elided sign in numeric

template package

{{ }} code escape
. field reference
{{- trim preceding whitespace
- }} trim trailing whitespace
{{/* */ }} comment
" " named template
$ variable name
( ) parenthesized actions
:= variable capture
| piped actions

Doc comments

+ in build directives
- in build directives
: in go generate
= in go generate
indentation and blank lines in doc comments

commands

- in flags
-- in various flags
= in various flags
/ in directory paths
, to separate args
. in directory paths and go doc arguments
.. in directory paths
... in directory paths

artifact names

_ in filenames


The quantity of these punctuation and other symbols give an idea of how complicated the grammar of Go really is, before we even consider the role of alphanumerics. But even with all this syntactic complexity, it's still only about half that of many languages such as Java.

See earlier entries

Last edited Oct 16 at 4:13 AM by gavingrover, version 6

Comments

No comments yet.