The String module provides functionality for encoding, escaping and formatting strings. It may also apply a number of fixes to the built-in String object.

Static properties


Adds entries to UsefulJS.featureSupport:

Formatting strings


Formats a string with placeholder fields replaced with argument values.

UsefulJS.String.sprintf(fmt[, field1[, field2 ...]])

Returns: String. The formatted string

Throws: TypeError: unrecognized format code; unexpected argument type for the field.

var company = "Holding Holdings, Inc", year = (new Date()).getFullYear(),
    copyright = UsefulJS.String.sprintf("Copyright (C) %04d %s. All rights reserved.", year, company);

Format field syntax

The general syntax of a sprintf format field is:


Items in square brackets are optional and for some type values they may be ignored.

What, no length field? If you have a C or Perl background you may have used, for example, the %ld sequence for a long int value. Let me put it like this: JAVASCRIPT IS NOT C! C is a strongly typed language and the compiler cares very much about the distinction between an int and long int. JavaScript is a weakly typed language; not only am I unable to make the distinction but I don't even care to.

Field types

Field type Purpose Notes
%d Signed integer Range is ±253 - 1. Locale-aware
%i 32-bit signed integer Range is -231 to 231 - 1. Not locale-aware
%u 32-bit unsigned integer Range is 0 to 232
%x / %X Unsigned integer in hexadecimal notation Range is the same as %u. %X uses uppercase characters and '0X' with the # flag
%o Unsigned integer in octal notation Range is the same as %u
%b / %B Unsigned integer in binary notation Range is the same as %u. %B uses uppercase '0B' with # flag
%f Floating point value in decimal format Locale-aware
%e / %E Floating point value in exponential format %E uses an uppercase 'E' for the exponent part; exponent value is a minimum of two digits
%c A single character or surrogate pair Codepoint in the range 0 to 0x10ffff
%s A character string -
%% A literal '%' character Note that x% is not a valid percentage format in certain locales
A number of fields familiar to C programmers are unimplemented. There is no way that I could implement the %p and %n fields and I wouldn't even if I could since they're features in search of a use case. In twenty five years I've never once used %g which behaves like %f or %e depending on the magnitude and has a funky interaction with the precision field. So that one's out too.


Generally, each field in the format string consumes the next argument. You can change this behaviour by specifying the argument number in the first slot in the format field. The syntax is n$ where n is the argument number. it must be a minimum of 1 since argument 0 is the format string itself. To illustrate:

UsefulJS.String.sprintf("%1$3u %1$#04x %1$#010b", 23); // " 23 0x17 0b00010111"


Flags control the appearance of the formatted value. You can use as many of them as you like, though some may be ignored.

Flag Purpose Notes
- Left-align padded values Fixed width fields are right-aligned unless this flag is set; left aligned fields ignore the '0' flag
+ Prefix positive numbers with '+' Only used when the field indicates a signed value: %d, %i, %e %f
<SPACE> Prefix positive numbers with 'NBSP' (non-breaking space) Only used when the field indicates a signed value
0 Pad numeric fields with '0' characters; otherwise NBSP characters are used -
# Prefixes binary, octal and hex values with a base identifier Hex values are prefixed with '0x', octal values with '0' and binary values with '0b'. Affects the behaviour of the width field
, Group the digits in the output value (e.g. "1,000" rather than "1000") Only used when the field is %d or %f


The width field follows the flags and is a number that specifies the minimum width of the formatted field. Formatted values wider than this are not truncated. Field width is applied after all other formatting is complete, so precision and prefixes are taken into account when calculating how much padding is required:

UsefulJS.String.sprintf("%+7.4f", 1.0);       //  "+1.0000"
UsefulJS.String.sprintf("%+8.4f", 1.0);       // " +1.0000"

A dynamic width can be specified with the '*' character. This consumes an argument. To illustrate:

UsefulJS.String.sprintf("%*d", 2, 1);         // " 1"

The width field also controls the output width with the %s field type:

UsefulJS.String.sprintf("%8s", "expand");  // "  expand"
UsefulJS.String.sprintf("%-8s", "expand"); // "expand  "


The precision field follows the width and is a number preceded by a '.' character that specifies how many digits after the decimal point are to be displayed when used with the %f and %e field types. The default value is 6. Trailing zeroes in the output are not suppressed:

UsefulJS.String.sprintf("%f", 1);             // "1.000000"
UsefulJS.String.sprintf("%.2f", 1);           // "1.00"

As with the width, you can specify precision dynamically with the '*' character:

UsefulJS.String.sprintf("%.*f", 2, 1);        // "1.00"

When used with the %s field type, the precision value controls the output width, truncating if required:

UsefulJS.String.sprintf("%.8s", "truncated"); // "truncate"


The %d and %f fields are locale aware (that is, assuming that the UsefulJS.Number module is available). This means that the decimal separator and digits for the current locale are observed:

UsefulJS.Locale.current = "fr";
UsefulJS.String.sprintf("%.*f", 2, 1);        // "1,00"
UsefulJS.Locale.current = "hi";
UsefulJS.String.sprintf("%.*f", 2, 1);        // "१.००"

You can enable grouped output with the ',' (comma) flag to improve readability:

UsefulJS.Locale.current = "en-IN";
UsefulJS.String.sprintf("%,d", 100000);       // "1,00,000"

Beyond this, the formats used for numbers are fixed. If you have more complex formatting requirements (e.g. currency or suppressing trailing zeroes), you should format the numbers as a separate step and use %s fields.


Here is a simple but functional internationalization framework:

var I18N = {
    strings : {
        de : {
            question : "Was ist das Ergebnis der Multiplikation %1$.1f von %2$.1f?",
        en : {
            question : "What do you get when you multiply %1$.1f by %2$.1f?",
    resolve : function(key/*, arg1, arg2, ... */) {
        var fmt = I18n.strings[UsefulJS.Locale.current][key];
        if (!fmt) {
            return key;
        // Get the rest of the arguments
        var args = Array.from(arguments);
        // Put the format string on the front
        args[0] = fmt;
        return UsefulJS.String.sprintf.apply(null, args);

UsefulJS.Locale.current = "de";
I18n.resolve("question", 6, 9);  // "Was ist das Ergebnis der Multiplikation 6,0 von 9,0?"

Note the use of positional parameters %1$.1f and %2$.1f. This is particularly important for %s fields. When your stringtable entries are translated, the word order can change very radically and there is no way of distinguishing one %s from another when the argument order is fixed. Specifying which arguments to use in the format string means that substitution won't produce gobbledegook.

String encoding

The functions in the UsefulJS.String.encode namespace are used for interchange with backend processes.


Returns its input, UTF-8 encoded.


Returns: String


UTF-8 is a character encoding that uses a variable number of 8-bit bytes to encode a single character. The bit pattern in the first byte of the sequence says how many bytes are in an encoded sequence. UTF-8 has a number of intrinsic advantages over other character encodings:

No more than four bytes are required to represent any defined character. Here are some examples:

UsefulJS.String.escape.js(UsefulJS.String.encode.toUtf8("$"));  // "$"; unchanged
UsefulJS.String.escape.js(UsefulJS.String.encode.toUtf8("£"));  // "\xc2\xa3"
UsefulJS.String.escape.js(UsefulJS.String.encode.toUtf8("€"));  // "\xe2\x82\xac"
UsefulJS.String.escape.js(UsefulJS.String.encode.toUtf8("💩")); // "\xf0\x9f\x92\xa9"

Following the Noncharacter FAQ, noncharacter codepoints are encoded like any other. However, codepoints that represent isolated halves of a surrogate pair are encoded as "\xef\xbf\xbd", the UTF-8 encoding of U+FFFD, the replacement character. Note that no "BOM" (byte-order mark) is emitted - this is absolutely not needed for UTF-8. If, for some reason, the receiving program expects one, you can prefix the encoded string with with "\xef\xbb\xbf".


UTF-8 decodes its input, returning a regular String.


Returns: String


Valid UTF-8 sequences in the input are decoded to the corresponding codepoints. This includes sequences that decode to noncharacters. Invalid sequences are decoded to U+FFFD, the replacement character. Invalid sequences are:


Canonicalizes line-endings as "\n".


Returns: String


Strips carriage returns out of its argument and returns the result.


Canonicalizes line-endings as "\r\n".


Returns: String


Replaces all newlines in its argument with carriage return / newline pairs. If called twice on the same string, carriage returns will not be doubled-up.

Implementation notes

UTF-8 encoding and decoding may trivially be done using the deprecated escape and unescape functions:

var s = "...",
    encoded = unescape(encodeURIComponent(s)),
    decoded = decodeURIComponent(escape(encoded));

This works because URLs must be UTF-8 encoded with individual octets %-escaped. escape/unescape are completely unaware of this and treat the UTF-8 byte values as individual characters.

I chose to implement my own codec for a number of reasons. The primary reason is the dependence on deprecated functions which may stop working at any time. The other is control over error handling; decodeURIComponent throws on invalid input (with, given the context, a slightly weird "Malformed URI" error) while I prefer to replace bad sequences with a replacement character and carry on.

String escaping

Functions in the UsefulJS.String.escape namespace are used to escape potentially problematic characters in strings before they're used in various contexts.


Escapes a String for use in HTML


Returns: String


This is a basic escape function, only escaping &<>'"/. It's intended to prevent you from shooting yourself in the foot when using innerHTML. The better approach is to document.createTextNode or attribute.setValue which turn strings plain old data that will never be interpreted. The function does not emit character entities like &reg; since these have not been required for years - simply specify a UTF-8 charset in a <meta> element.


Escapes a String for use in JavaScript


Returns: String


This uses a fairly paranoid escape algorithm: only ASCII alphanumeric characters (A-Z, a-z and 0-9) are not escaped. Otherwise, codepoints below U+0100 are escaped using the form \xHH where 'H' is a hexadecimal digit. The escaping of other codepoints depends on the capabilities of the browser: if the \u{H...H} escaping style is supported, this form will be used; if not the escape sequence will use the \uHHHH style.


Escapes a String for use in in regular expressions


Returns: String


String values passed to the RegExp constructor may need to be escaped if you don't want certain characters to interpreted as part of the regular expression syntax. This function backslash-escapes the following characters: \.-[]{}()^$|+?*. Note that '/' does not need to be escaped unless you're in the habit of constructing dynamic regexes with eval.


The fixes for the String module implement a number of ES5/6 methods and are defined in the _string namespace of the fix options. Fixes are applied automatically apart from the padLeft and padRight options which, as library extensions, must be explicitly enabled.

String prototype methods are implemented generically so that you can apply any value to them:[1,2,3,4], ",3,")

As per spec, however, they will throw a TypeError if the first argument is null or undefined.


Adds a startsWith method to String.prototype if it is not implemented natively.


See the MDN documentation for details.


Adds an endsWith method to String.prototype if it is not implemented natively.


See the MDN documentation for details.


Adds an includes method to String.prototype if it is not implemented natively.

string.includes(t[, pos])

See the MDN documentation for details.


Adds a trim method to String.prototype if it is not implemented natively.


See the MDN documentation for details. The pattern used for whitespace is Unicode-aware.


Adds a trimLeft method to String.prototype if it is not implemented natively.


See the MDN documentation for details.


Adds a trimRight method to String.prototype if it is not implemented natively.


See the MDN documentation for details.


Adds a repeat method to String.prototype if it is not implemented natively.


See the MDN documentation for details.


Adds a padLeft method to String.prototype.

string.padLeft(padTo, padWith)

Adds 0 or more copies of the character in padWith to the start of string so that it is at least padTo characters long and returns the result. If string is already long enough, no padding is applied.


Adds a padRight method to String.prototype.

string.padRight(padTo, padWith)

Adds 0 or more copies of the character in padWith to the end of string so that it is at least padTo characters long and returns the result. If string is already long enough, no padding is applied.


Adds a fromCodePoint factory method to String if it is not implemented natively.

String.fromCodePoint(codePoint1[, codePoint2, ...])

See the MDN documentation for details.


Adds a codePointAt method to String.prototype if it is not implemented natively.


See the MDN documentation for details.