Title

SRFI 169: Underscores in Numbers

Author

Lassi Kortela

Status

This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-169@nospamsrfi.schemers.org . To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

Many people find that large numbers are easier to read when the digits are broken into small groups. For example, the number 1582439 might be easier to read if written as 1 582 439. This applies to source code as it does to other writing. We propose an extension of Scheme syntax to allow the underscore as a digit separator in numerical constants.

Rationale

How many digits per group

Western cultures tend to divide digits into groups of three. This convention is not universal. For example, in India people write numbers like 3 14 15 926 (read three crore fourteen lakh fifteen thousand nine hundred and twenty-six in Indian English).

For simplicity and universality, we propose that digit groups of all sizes may be mixed freely when writing a number. It is permissible to have just one digit in a group, and groups in a number don’t need to be ordered by increasing or decreasing digit count.

What separator character to use

Human cultures and programming languages differ in what separator to use between groups.

In light of the above, we consider the underscore to be the clear winner. It is the most widely compatible and least ambiguous choice, in both human and machine terms.

Potential ambiguity between numbers and identifiers

Languages in the Lisp family traditionally allow a larger set of characters in identifiers than do most other languages. For example, 1+ and 3*/! would parse as symbols in Common Lisp. Scheme is slightly more restrictive: R5RS, R6RS and R7RS do not recognize identifiers that begin with a decimal digit. Some implementations are more relaxed. For example, MIT Scheme comes with 1+ and -1+ procedures to increment and decrement numbers. Several implementations presently parse tokens consisting entirely of digits and underscores as symbols.

Countless languages outside the Lisp family use the underscores as word separators in multi-word identifiers – i.e. Scheme’s open-input-file would be spelled open_input_file instead. In these languages, it’s common to use a leading underscore to mark private (as opposed to public or exported) identifiers. This leads to potential ambiguity with identifiers such as _123 that start with an underscore and contain only underscores and digits. Those tokens often parse as identifiers. If we made them parse as numbers in Scheme instead, it could confuse users.

Scheme supports a rich numeric tower of integers, ratios, real and complex numbers. These come in exact and inexact variants. For real numbers, we have decimal-point and exponent notation. Particular implementations add quaternions and units of measure to the mix. Common Lisp’s potential numbers offer a glimpse of how far numerical syntax can go. These intricate extensions, some of which we cannot even anticipate yet, make it even trickier for us to specify a digit-separation scheme devoid of ambiguity.

We attempt to solve these problems with a conservative rule that allows underscores only between digits. After considering everything in the above paragraph, we did not manage to come up with any concrete examples of present or future tasks that would be impeded by this restricted version of the syntax extension.

Specification

We stipulate that conforming implementations must allow one underscore between any two digits, in any part of a number.

The rule includes:

The rule excludes:

Conforming implementations may be more lenient in what they allow (to maintain compatibility with existing code). In this document, numbers written according to the above rule are called conforming. Other numbers (which may or may not be valid depending on the implementation) are called non-conforming.

Examples

Integers

0123             ; conforming
0_1_2_3          ; conforming
0_123            ; conforming
01_23            ; conforming
012_3            ; conforming
+0123            ; conforming
+0_123           ; conforming
-0123            ; conforming
-0_123           ; conforming

_0123            ; non-conforming
0123_            ; non-conforming
0123__           ; non-conforming
01__23           ; non-conforming
0_1__2___3       ; non-conforming
+_0123           ; non-conforming
+0123_           ; non-conforming
-_0123           ; non-conforming
-0123_           ; non-conforming

Rational numbers

1_2_3/4_5_6_7    ; conforming
12_34/5_678      ; conforming

1_2_3/_4_5_6_7   ; non-conforming
_12_34/5_678     ; non-conforming

Real numbers

0_1_23.4_5_6     ; conforming
1_2_3.5e6        ; conforming
1_2e1_2          ; conforming

_0123.456        ; non-conforming
0123_.456        ; non-conforming
0123._456        ; non-conforming
0123.456_        ; non-conforming
123_.5e6         ; non-conforming
123._5e6         ; non-conforming
123.5_e6         ; non-conforming
123.5e_6         ; non-conforming
123.5e6_         ; non-conforming
12_e12           ; non-conforming
12e_12           ; non-conforming
12e12_           ; non-conforming

Complex numbers

-12_3.0_00_00-12_34.56_78i   ; conforming
-12_3.0_00_00@-12_34.56_78   ; conforming

-12_3.0_00_00-12_34.56_78_i   ; non-conforming
-12_3.0_00_00-12_34.56_78i_   ; non-conforming
-12_3.0_00_00_@-12_34.56_78   ; non-conforming
-12_3.0_00_00@_-12_34.56_78   ; non-conforming

Hypercomplex numbers

Kawa supports quaternions using the following syntax:

1+2i-3j+4k

By applying the rule a syntax like that can be extended as follows:

1_0+2_0i-3_0j+4_0k   ; conforming

1_0_+2_0i-3_0j+4_0k  ; non-conforming
1_0+2_0_i-3_0j+4_0k  ; non-conforming
1_0+2_0i-3_0j_+4_0k  ; non-conforming
1_0+2_0i-3_0j+4_0k_  ; non-conforming

Units of measure

Kawa supports units of measure using the following syntax:

123456cm^2

By applying the rule a syntax like that can be extended as follows:

123_456cm^2          ; conforming

123_456_cm^2         ; non-conforming
123_456.78_cm^2      ; non-conforming

Numbers with radix or exactness prefixes

#b10_10_10           ; conforming
#o23_45_67           ; conforming
#d45_67_89           ; conforming
#xAB_CD_EF           ; conforming
#x-2_0               ; conforming
#o+2_345_6           ; conforming

#x-_2                ; non-conforming
_#x-_2               ; non-conforming
#d_45_67_89          ; non-conforming
#e_45/67_89          ; non-conforming
#i#o_1234            ; non-conforming
#i_#o_1234           ; non-conforming
#e#x1234_            ; non-conforming

Implementation

TODO

Acknowledgements

TODO

Copyright

Copyright (C) TODO 2019

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.