Clean up the literals! [AEK 01628-0]
One of most disgraceful perversions in the IT industry are C-style escape-sequences all these awful \"'s, \0x23's and \\'s. URL-style %20's and SGML-style &'s are not better. They are both distracting to the eye and incomprehensible to a fresh person, without having an excuse of being especially handy for experienced users.
There is a much better way: Just abolish escape sequences in favor of suitable interpolation syntax. I'd propose to use square brackets for they are visually appealing and seldom used in natural texts: [expression] should interpolates the value of the expression into the literal, in particular
[Bra] ➡ [
[Ket] ➡ ]
[Tab] ➡ Tab character
[Break] ➡ Line break
[Unicode 12345] ➡ Unicode character 12345.
[# lorem ipsum] ➡ Comment, ignored. Might be displayed as bubbles in an IDE.
I would also propose a special syntax for formatted interpolation in all text literals: [e|f], where e is any expression of a Formattable type and f is a formatter literal of the corresponding type. Date/time formatters [yyyy-MM-dd], and printf-style number formatters are very handy and should be kept alas extended by linguistic add-ons: numbers are often used to construct plural or ordinals [i|ordinal], phrases often need conjugation before inserted into the others [verb|conj: 1st pers. singular].
Example:
Successfully downloaded [v|1.1d liters of milk] and [r|w formulae],
now downloading the [n|ordinal] bottle.
➡
Successfully downloaded 1.0 liter of milk and twenty one formula,
now downloading the second bottle.
Literals including expressions and comments inside them are required to contain only matched square brackets, which leads to a
Killer feature: composability
Since our syntax allows only matched square brackets, any literal can be embedded into a host language when enclosed by square brackets: the parser of the host language can easily find out where the literal ends by finding the matching closing bracket for the opening one.
Of course, square brackets are superfluous for numeric literals and other some literals that are not allowed to contain whitespaces by their nature. These literals can be discriminated by special start sequence which is forbidden in normal identifiers: e.g. numerical literals begin with a digit or minus sign, paths always start with
./ (relative paths or path fragments),
~/ (local paths),
/ (absolute paths) or
// (global paths).
There can be an unlimited multitude of domain-specific literals besides strings and numbers, formatters and paths: dates (2012-02-03) and URLs, colors and fonts. It's unimaginable to define everything beforehand, the best thing we can do is to allow for custom literals upon the common syntax on equal footing with predefined ones. Common syntax is very handy, consider date literal with interpolation (2012-[month]-03) which can be then used for filling or matching, a handy feature hardly found in any existing programming language.
Syntax for custom literals is usually quite awful. For instance, in Scala it's the type"literal" (e.g. s"bla"). I would propose using just [literal] if the type can be derived from context and Type[literal] in other cases, and banish generic "string" literals in favor of language-aware literal types:
Html[Hello, <em>world</em>!]
Text{lang: en_US}[Hello, world]
It's very handy to know both the natural language and the markup language (if any) used in the literal in compile and editing time to provide highlighting, autocompletion and validation/spellcheck, in runtime this information enables proper handling by linguistic tools.
One of most disgraceful perversions in the IT industry are C-style escape-sequences all these awful \"'s, \0x23's and \\'s. URL-style %20's and SGML-style &'s are not better. They are both distracting to the eye and incomprehensible to a fresh person, without having an excuse of being especially handy for experienced users.
There is a much better way: Just abolish escape sequences in favor of suitable interpolation syntax. I'd propose to use square brackets for they are visually appealing and seldom used in natural texts: [expression] should interpolates the value of the expression into the literal, in particular
[Bra] ➡ [
[Ket] ➡ ]
[Tab] ➡ Tab character
[Break] ➡ Line break
[Unicode 12345] ➡ Unicode character 12345.
[# lorem ipsum] ➡ Comment, ignored. Might be displayed as bubbles in an IDE.
I would also propose a special syntax for formatted interpolation in all text literals: [e|f], where e is any expression of a Formattable type and f is a formatter literal of the corresponding type. Date/time formatters [yyyy-MM-dd], and printf-style number formatters are very handy and should be kept alas extended by linguistic add-ons: numbers are often used to construct plural or ordinals [i|ordinal], phrases often need conjugation before inserted into the others [verb|conj: 1st pers. singular].
Example:
Successfully downloaded [v|1.1d liters of milk] and [r|w formulae],
now downloading the [n|ordinal] bottle.
➡
Successfully downloaded 1.0 liter of milk and twenty one formula,
now downloading the second bottle.
Literals including expressions and comments inside them are required to contain only matched square brackets, which leads to a
Killer feature: composability
Since our syntax allows only matched square brackets, any literal can be embedded into a host language when enclosed by square brackets: the parser of the host language can easily find out where the literal ends by finding the matching closing bracket for the opening one.
Of course, square brackets are superfluous for numeric literals and other some literals that are not allowed to contain whitespaces by their nature. These literals can be discriminated by special start sequence which is forbidden in normal identifiers: e.g. numerical literals begin with a digit or minus sign, paths always start with
./ (relative paths or path fragments),
~/ (local paths),
/ (absolute paths) or
// (global paths).
There can be an unlimited multitude of domain-specific literals besides strings and numbers, formatters and paths: dates (2012-02-03) and URLs, colors and fonts. It's unimaginable to define everything beforehand, the best thing we can do is to allow for custom literals upon the common syntax on equal footing with predefined ones. Common syntax is very handy, consider date literal with interpolation (2012-[month]-03) which can be then used for filling or matching, a handy feature hardly found in any existing programming language.
Syntax for custom literals is usually quite awful. For instance, in Scala it's the type"literal" (e.g. s"bla"). I would propose using just [literal] if the type can be derived from context and Type[literal] in other cases, and banish generic "string" literals in favor of language-aware literal types:
Html[Hello, <em>world</em>!]
Text{lang: en_US}[Hello, world]
It's very handy to know both the natural language and the markup language (if any) used in the literal in compile and editing time to provide highlighting, autocompletion and validation/spellcheck, in runtime this information enables proper handling by linguistic tools.