172 lines
4.5 KiB
Plaintext
172 lines
4.5 KiB
Plaintext
### Characters
|
|
|
|
[
|
|
Type characters Alias chars
|
|
Datatype unicode-characters Alias unicode-chars
|
|
Type unicode-points
|
|
Funcon unicode-character Alias unicode-char
|
|
Funcon unicode-point Alias unicode
|
|
Type basic-multilingual-plane-characters Alias bmp-chars
|
|
Type basic-multilingual-plane-points
|
|
Type iso-latin-1-characters Alias latin-1-chars
|
|
Type iso-latin-1-points
|
|
Type ascii-characters Alias ascii-chars
|
|
Type ascii-points
|
|
Funcon ascii-character Alias ascii-char
|
|
Funcon utf-8
|
|
Funcon utf-16
|
|
Funcon utf-32
|
|
Funcon backspace
|
|
Funcon horizontal-tab
|
|
Funcon line-feed
|
|
Funcon form-feed
|
|
Funcon carriage-return
|
|
Funcon double-quote
|
|
Funcon single-quote
|
|
Funcon backslash
|
|
]
|
|
|
|
|
|
Built-in Type
|
|
characters <: values
|
|
/*
|
|
Literal characters can be written `'C'` where `C` is any visible character
|
|
other than a `single-quote` or `backslash` character, which need to be
|
|
escaped as `'\''` and `'\\'`.
|
|
*/
|
|
Alias
|
|
chars = characters
|
|
|
|
|
|
#### Unicode character set
|
|
/*
|
|
The set of Unicode characters and allocated points is open to extension.
|
|
See https://en.wikipedia.org/wiki/Plane_(Unicode)
|
|
*/
|
|
|
|
Built-in Datatype
|
|
unicode-characters <: characters
|
|
Alias
|
|
unicode-chars = unicode-characters
|
|
Built-in Type
|
|
unicode-points <: bounded-integers(0, unsigned-bit-vector-maximum(21))
|
|
Built-in Funcon
|
|
unicode-character(_:unicode-points) : unicode-characters
|
|
Alias
|
|
unicode-char = unicode-character
|
|
/*
|
|
The values in `unicode-characters` are the values of
|
|
`unicode-character(UP:unicode-points)`.
|
|
*/
|
|
Funcon
|
|
unicode-point(_:unicode-characters) : =>unicode-points
|
|
Alias
|
|
unicode = unicode-point
|
|
Rule
|
|
unicode-point(unicode-character(UP:unicode-points)) ~> UP
|
|
|
|
|
|
#### Unicode basic multilingual plane
|
|
/*
|
|
The set of Unicode BMP characters and allocated points is open to extension.
|
|
*/
|
|
|
|
Built-in Datatype
|
|
basic-multilingual-plane-characters <: unicode-characters
|
|
Alias
|
|
bmp-chars = basic-multilingual-plane-characters
|
|
Built-in Type
|
|
basic-multilingual-plane-points <:
|
|
bounded-integers(0, unsigned-bit-vector-maximum(17))
|
|
/*
|
|
The values in `basic-multilingual-plane-characters` are the values of
|
|
`unicode-character(BMPP:basic-multilingual-plane-points)`.
|
|
*/
|
|
|
|
|
|
#### ISO Latin-1 character set
|
|
|
|
Built-in Datatype
|
|
iso-latin-1-characters <: basic-multilingual-plane-characters
|
|
Alias
|
|
latin-1-chars = iso-latin-1-characters
|
|
Type
|
|
iso-latin-1-points ~> bounded-integers(0, unsigned-bit-vector-maximum(8))
|
|
/*
|
|
The values in `iso-latin-1-characters` are the values of
|
|
`unicode-character(ILP:iso-latin-1-points)`.
|
|
*/
|
|
|
|
|
|
#### ASCII character set
|
|
|
|
Built-in Type
|
|
ascii-characters <: iso-latin-1-characters
|
|
Alias
|
|
ascii-chars = ascii-characters
|
|
Type
|
|
ascii-points ~> bounded-integers(0, unsigned-bit-vector-maximum(7))
|
|
/*
|
|
The values in `ascii-characters` are the values of
|
|
`unicode-character(AP:ascii-points)`.
|
|
*/
|
|
Funcon
|
|
ascii-character(_:strings) : =>ascii-characters?
|
|
Alias
|
|
ascii-char = ascii-character
|
|
/*
|
|
`ascii-character"C"` takes a string. When it consists of a single ASCII
|
|
character `C` it gives the character, otherwise `( )`.
|
|
*/
|
|
Rule
|
|
ascii-character[C:ascii-characters] ~> C
|
|
Rule
|
|
C : ~ ascii-characters
|
|
------------------------------------
|
|
ascii-character[C:characters] ~> ( )
|
|
Rule
|
|
length(C*) =/= 1
|
|
--------------------------------------
|
|
ascii-character[C*:characters*] ~> ( )
|
|
|
|
|
|
#### Character point encodings
|
|
/*
|
|
See https://en.wikipedia.org/wiki/Character_encoding
|
|
*/
|
|
|
|
Built-in Funcon
|
|
utf-8(_:unicode-points) : =>(bytes, (bytes, (bytes, bytes?)? )? )
|
|
Built-in Funcon
|
|
utf-16(_:unicode-points) : =>(bit-vectors(16), (bit-vectors(16))? )
|
|
Built-in Funcon
|
|
utf-32(_:unicode-points) : =>bit-vectors(32)
|
|
|
|
|
|
#### Control characters
|
|
|
|
Funcon
|
|
backspace : =>ascii-characters
|
|
~> unicode-character(hexadecimal-natural"0008")
|
|
Funcon
|
|
horizontal-tab : =>ascii-characters
|
|
~> unicode-character(hexadecimal-natural"0009")
|
|
Funcon
|
|
line-feed : =>ascii-characters
|
|
~> unicode-character(hexadecimal-natural"000a")
|
|
Funcon
|
|
form-feed : =>ascii-characters
|
|
~> unicode-character(hexadecimal-natural"000c")
|
|
Funcon
|
|
carriage-return : =>ascii-characters
|
|
~> unicode-character(hexadecimal-natural"000d")
|
|
Funcon
|
|
double-quote : =>ascii-characters
|
|
~> unicode-character(hexadecimal-natural"0022")
|
|
Funcon
|
|
single-quote : =>ascii-characters
|
|
~> unicode-character(hexadecimal-natural"0027")
|
|
Funcon
|
|
backslash : =>ascii-characters
|
|
~> unicode-character(hexadecimal-natural"005c")
|