Revision 80aa77986ec84067f4696439f10fc77c93703bc7 authored by Tony Kelman on 26 October 2015, 12:41:37 UTC, committed by Tony Kelman on 26 October 2015, 12:41:37 UTC
1 parent 737a9a8
Raw File
strings.rst
Strings
=======

.. function:: length(s)

   The number of characters in string ``s``.

.. function:: sizeof(s::String)

   The number of bytes in string ``s``.

.. function:: *(s, t)

   Concatenate strings. The ``*`` operator is an alias to this function.

   .. doctest::

	   julia> "Hello " * "world"
	   "Hello world"

.. function:: ^(s, n)

   Repeat ``n`` times the string ``s``. The ``^`` operator is an alias to this function.

   .. doctest::

   	julia> "Test "^3
   	"Test Test Test "

.. function:: string(xs...)

   Create a string from any values using the ``print`` function.

.. function:: repr(x)

   Create a string from any value using the ``showall`` function.

.. function:: bytestring(::Ptr{Uint8}, [length])

   Create a string from the address of a C (0-terminated) string encoded in ASCII or UTF-8. A copy is made; the ptr can be safely freed. If ``length`` is specified, the string does not have to be 0-terminated.

.. function:: bytestring(s)

   Convert a string to a contiguous byte array representation appropriate for passing it to C functions. The string will be encoded as either ASCII or UTF-8.

.. function:: ascii(::Array{Uint8,1})

   Create an ASCII string from a byte array.

.. function:: ascii(s)

   Convert a string to a contiguous ASCII string (all characters must be valid ASCII characters).

.. function:: utf8(::Array{Uint8,1})

   Create a UTF-8 string from a byte array.

.. function:: utf8(s)

   Convert a string to a contiguous UTF-8 string (all characters must be valid UTF-8 characters).

.. function:: normalize_string(s, normalform::Symbol)

   Normalize the string ``s`` according to one of the four "normal
   forms" of the Unicode standard: ``normalform`` can be ``:NFC``,
   ``:NFD``, ``:NFKC``, or ``:NFKD``.  Normal forms C (canonical
   composition) and D (canonical decomposition) convert different
   visually identical representations of the same abstract string into
   a single canonical form, with form C being more compact.  Normal
   forms KC and KD additionally canonicalize "compatibility
   equivalents": they convert characters that are abstractly similar
   but visually distinct into a single canonical choice (e.g. they expand
   ligatures into the individual characters), with form KC being more compact.

   Alternatively, finer control and additional transformations may be
   be obtained by calling `normalize_string(s; keywords...)`, where
   any number of the following boolean keywords options (which all default
   to ``false`` except for ``compose``) are specified:

   * ``compose=false``: do not perform canonical composition
   * ``decompose=true``: do canonical decomposition instead of canonical composition (``compose=true`` is ignored if present)
   * ``compat=true``: compatibility equivalents are canonicalized
   * ``casefold=true``: perform Unicode case folding, e.g. for case-insensitive string comparison
   * ``newline2lf=true``, ``newline2ls=true``, or ``newline2ps=true``: convert various newline sequences (LF, CRLF, CR, NEL) into a linefeed (LF), line-separation (LS), or paragraph-separation (PS) character, respectively
   * ``stripmark=true``: strip diacritical marks (e.g. accents)
   * ``stripignore=true``: strip Unicode's "default ignorable" characters (e.g. the soft hyphen or the left-to-right marker)
   * ``stripcc=true``: strip control characters; horizontal tabs and form feeds are converted to spaces; newlines are also converted to spaces unless a newline-conversion flag was specified
   * ``rejectna=true``: throw an error if unassigned code points are found
   * ``stable=true``: enforce Unicode Versioning Stability

   For example, NFKC corresponds to the options ``compose=true, compat=true, stable=true``.

.. function:: is_valid_ascii(s) -> Bool

   Returns true if the string or byte vector is valid ASCII, false otherwise.

.. function:: is_valid_utf8(s) -> Bool

   Returns true if the string or byte vector is valid UTF-8, false otherwise.

.. function:: is_valid_char(c) -> Bool

   Returns true if the given char or integer is a valid Unicode code point.

.. function:: is_assigned_char(c) -> Bool

   Returns true if the given char or integer is an assigned Unicode code point.

.. function:: ismatch(r::Regex, s::String) -> Bool

   Test whether a string contains a match of the given regular expression.

.. function:: match(r::Regex, s::String[, idx::Integer[, addopts]])

   Search for the first match of the regular expression ``r`` in ``s`` and return a RegexMatch object containing the match, or nothing if the match failed. The matching substring can be retrieved by accessing ``m.match`` and the captured sequences can be retrieved by accessing ``m.captures`` The optional ``idx`` argument specifies an index at which to start the search.

.. function:: eachmatch(r::Regex, s::String[, overlap::Bool=false])

   Search for all matches of a the regular expression ``r`` in ``s`` and return a iterator over the matches. If overlap is true, the matching sequences are allowed to overlap indices in the original string, otherwise they must be from distinct character ranges.

.. function:: matchall(r::Regex, s::String[, overlap::Bool=false]) -> Vector{String}

   Return a vector of the matching substrings from eachmatch.

.. function:: lpad(string, n, p)

   Make a string at least ``n`` characters long by padding on the left with copies of ``p``.

.. function:: rpad(string, n, p)

   Make a string at least ``n`` characters long by padding on the right with copies of ``p``.

.. function:: search(string, chars, [start])

   Search for the first occurance of the given characters within the given string. The second argument may be a single character, a vector or a set of characters, a string, or a regular expression (though regular expressions are only allowed on contiguous strings, such as ASCII or UTF-8 strings). The third argument optionally specifies a starting index. The return value is a range of indexes where the matching sequence is found, such that ``s[search(s,x)] == x``:

   ``search(string, "substring")`` = ``start:end`` such that ``string[start:end] == "substring"``, or ``0:-1`` if unmatched.

   ``search(string, 'c')``         = ``index`` such that ``string[index] == 'c'``, or ``0`` if unmatched.

.. function:: rsearch(string, chars, [start])

   Similar to ``search``, but returning the last occurance of the given characters within the given string, searching in reverse from ``start``.

.. function:: searchindex(string, substring, [start])

   Similar to ``search``, but return only the start index at which the substring is found, or 0 if it is not.

.. function:: rsearchindex(string, substring, [start])

   Similar to ``rsearch``, but return only the start index at which the substring is found, or 0 if it is not.

.. function:: contains(haystack, needle)

   Determine whether the second argument is a substring of the first.

.. function:: replace(string, pat, r[, n])

   Search for the given pattern ``pat``, and replace each occurrence with ``r``. If ``n`` is provided, replace at most ``n`` occurrences.  As with search, the second argument may be a single character, a vector or a set of characters, a string, or a regular expression. If ``r`` is a function, each occurrence is replaced with ``r(s)`` where ``s`` is the matched substring.

.. function:: split(string, [chars, [limit,] [include_empty]])

   Return an array of substrings by splitting the given string on occurrences of the given character delimiters, which may be specified in any of the formats allowed by ``search``'s second argument (i.e. a single character, collection of characters, string, or regular expression). If ``chars`` is omitted, it defaults to the set of all space characters, and ``include_empty`` is taken to be false. The last two arguments are also optional: they are are a maximum size for the result and a flag determining whether empty fields should be included in the result.

.. function:: rsplit(string, [chars, [limit,] [include_empty]])

   Similar to ``split``, but starting from the end of the string.

.. function:: strip(string, [chars])

   Return ``string`` with any leading and trailing whitespace removed. If ``chars`` (a character, or vector or set of characters) is provided, instead remove characters contained in it.

.. function:: lstrip(string, [chars])

   Return ``string`` with any leading whitespace removed. If ``chars`` (a character, or vector or set of characters) is provided, instead remove characters contained in it.

.. function:: rstrip(string, [chars])

   Return ``string`` with any trailing whitespace removed. If ``chars`` (a character, or vector or set of characters) is provided, instead remove characters contained in it.

.. function:: beginswith(string, prefix | chars)

   Returns ``true`` if ``string`` starts with ``prefix``. If the second argument is a vector or set of characters, tests whether the first character of ``string`` belongs to that set.

.. function:: endswith(string, suffix | chars)

   Returns ``true`` if ``string`` ends with ``suffix``. If the second argument is a vector or set of characters, tests whether the last character of ``string`` belongs to that set.

.. function:: uppercase(string)

   Returns ``string`` with all characters converted to uppercase.

.. function:: lowercase(string)

   Returns ``string`` with all characters converted to lowercase.

.. function:: ucfirst(string)

   Returns ``string`` with the first character converted to uppercase.

.. function:: lcfirst(string)

   Returns ``string`` with the first character converted to lowercase.

.. function:: join(strings, delim, [last])

   Join an array of ``strings`` into a single string, inserting the given delimiter between adjacent strings.
   If ``last`` is given, it will be used instead of ``delim`` between the last two strings.
   For example, ``join(["apples", "bananas", "pineapples"], ", ", " and ") == "apples, bananas and pineapples"``.

   ``strings`` can be any iterable over elements ``x`` which are convertible to strings via ``print(io::IOBuffer, x)``.

.. function:: chop(string)

   Remove the last character from a string

.. function:: chomp(string)

   Remove a trailing newline from a string

.. function:: ind2chr(string, i)

   Convert a byte index to a character index

.. function:: chr2ind(string, i)

   Convert a character index to a byte index

.. function:: isvalid(str, i)

   Tells whether index ``i`` is valid for the given string

.. function:: nextind(str, i)

   Get the next valid string index after ``i``. Returns a value greater than ``endof(str)``
   at or after the end of the string.

.. function:: prevind(str, i)

   Get the previous valid string index before ``i``. Returns a value less than ``1`` at
   the beginning of the string.

.. function:: randstring(len)

   Create a random ASCII string of length ``len``, consisting of upper- and lower-case letters and the digits 0-9

.. function:: charwidth(c)

   Gives the number of columns needed to print a character.

.. function:: strwidth(s)

   Gives the number of columns needed to print a string.

.. function:: isalnum(c::Union(Char,String)) -> Bool

   Tests whether a character is alphanumeric, or whether this
   is true for all elements of a string.

.. function:: isalpha(c::Union(Char,String)) -> Bool

   Tests whether a character is alphabetic, or whether this
   is true for all elements of a string.

.. function:: isascii(c::Union(Char,String)) -> Bool

   Tests whether a character belongs to the ASCII character set, or whether this
   is true for all elements of a string.

.. function:: isblank(c::Union(Char,String)) -> Bool

   Tests whether a character is a tab or space, or whether this
   is true for all elements of a string.

.. function:: iscntrl(c::Union(Char,String)) -> Bool

   Tests whether a character is a control character, or whether this
   is true for all elements of a string.

.. function:: isdigit(c::Union(Char,String)) -> Bool

   Tests whether a character is a numeric digit (0-9), or whether this
   is true for all elements of a string.

.. function:: isgraph(c::Union(Char,String)) -> Bool

   Tests whether a character is printable, and not a space, or whether this
   is true for all elements of a string.

.. function:: islower(c::Union(Char,String)) -> Bool

   Tests whether a character is a lowercase letter, or whether this
   is true for all elements of a string.

.. function:: isprint(c::Union(Char,String)) -> Bool

   Tests whether a character is printable, including space, or whether this
   is true for all elements of a string.

.. function:: ispunct(c::Union(Char,String)) -> Bool

   Tests whether a character is printable, and not a space or
   alphanumeric, or whether this is true for all elements of a string.

.. function:: isspace(c::Union(Char,String)) -> Bool

   Tests whether a character is any whitespace character, or whether this
   is true for all elements of a string.

.. function:: isupper(c::Union(Char,String)) -> Bool

   Tests whether a character is an uppercase letter, or whether this
   is true for all elements of a string.

.. function:: isxdigit(c::Union(Char,String)) -> Bool

   Tests whether a character is a valid hexadecimal digit, or whether this
   is true for all elements of a string.

.. function:: symbol(str) -> Symbol

   Convert a string to a ``Symbol``.

.. function:: escape_string(str::String) -> String

   General escaping of traditional C and Unicode escape sequences. See :func:`print_escaped` for more general escaping.

.. function:: unescape_string(s::String) -> String

   General unescaping of traditional C and Unicode escape sequences. Reverse of :func:`escape_string`. See also :func:`print_unescaped`.

.. function:: utf16(s)

   Create a UTF-16 string from a byte array, array of ``Uint16``, or
   any other string type.  (Data must be valid UTF-16.  Conversions of
   byte arrays check for a byte-order marker in the first two bytes,
   and do not include it in the resulting string.)

   Note that the resulting ``UTF16String`` data is terminated by the NUL
   codepoint (16-bit zero), which is not treated as a character in the
   string (so that it is mostly invisible in Julia); this allows the
   string to be passed directly to external functions requiring
   NUL-terminated data.  This NUL is appended automatically by the
   `utf16(s)` conversion function.  If you have a ``Uint16`` array
   ``A`` that is already NUL-terminated valid UTF-16 data, then you
   can instead use `UTF16String(A)`` to construct the string without
   making a copy of the data and treating the NUL as a terminator
   rather than as part of the string.

.. function:: utf16(::Union(Ptr{Uint16},Ptr{Int16}) [, length])

   Create a string from the address of a NUL-terminated UTF-16 string. A copy is made; the pointer can be safely freed. If ``length`` is specified, the string does not have to be NUL-terminated.

.. function:: is_valid_utf16(s) -> Bool

   Returns true if the string or ``Uint16`` array is valid UTF-16.

.. function:: utf32(s)

   Create a UTF-32 string from a byte array, array of ``Uint32``, or
   any other string type.  (Conversions of byte arrays check for a
   byte-order marker in the first four bytes, and do not include it in
   the resulting string.)

   Note that the resulting ``UTF32String`` data is terminated by the NUL
   codepoint (32-bit zero), which is not treated as a character in the
   string (so that it is mostly invisible in Julia); this allows the
   string to be passed directly to external functions requiring
   NUL-terminated data.  This NUL is appended automatically by the
   `utf32(s)` conversion function.  If you have a ``Uint32`` array
   ``A`` that is already NUL-terminated UTF-32 data, then you
   can instead use `UTF32String(A)`` to construct the string without
   making a copy of the data and treating the NUL as a terminator
   rather than as part of the string.

.. function:: utf32(::Union(Ptr{Char},Ptr{Uint32},Ptr{Int32}) [, length])

   Create a string from the address of a NUL-terminated UTF-32 string. A copy is made; the pointer can be safely freed. If ``length`` is specified, the string does not have to be NUL-terminated.

.. function:: wstring(s)

   This is a synonym for either ``utf32(s)`` or ``utf16(s)``,
   depending on whether ``Cwchar_t`` is 32 or 16 bits, respectively.
   The synonym ``WString`` for ``UTF32String`` or ``UTF16String``
   is also provided.
back to top