String Class

In Ruby strings are sequences of characters, but can also be used to store binary data as sequences of bytes. A string can be created from a sequence of characters:

a=String.new("characters")

Ruby has a complete set of functions to deal with strings. Strings are basic objects in Ruby, inherit Objects and include the Comparable module, which implements the basic operator for comparison: < ; > ; <= ; >=, == ; between?.

String comparison is based on the order of characters in in the ASCII sequence, where there are: first numbers, then uppercase letters, then lowercase ones. If the first characters of two strings are the same, then the longer string is considered greater:

'0'<'A' => true
'A'<'a' => true
'a'<'b' => true

'azzz'<'baaa' => true # the first different characters matters!
'aab'<'abb'   => true
'abc'<'abc0'  => true # the second string is longer

String Encoding

In Ruby each string has it's own encoding, which can be obtained by the method: encoding; the default encoding is UTF-8 for Ruby version 2, US-ASCII in Ruby 1.9; all strings where ASCII in Ruby 1.8.

The encodings are described by the Encode class. The class method Encode.list return an array with the list of all the available encodings.

In a source file the encoding of the file can be specified in the first lines:

#!/usr/bin/ruby
# coding: utf-8

Some operations between strings can't be done if their encoding is not compatible; the encode method can be used to change the encoding of a string; this method has options to deal with undefined or invalid characters in the new encoding and to change the final newline character; there is also a force_encodig method that sets the encoding property of a string:

'a'.encoding => #<Encoding:UTF-8>

b='a'.encode("ISO-8859-1")

b.encoding   => #<Encoding:ISO-8859-1>

b.encode!("UTF-8")             # encode! changes the string on-place

b.encoding   => #<Encoding:UTF-8>

"abcd".encode("UTF-8", undef:   :replace, replace: "X") # "X" replaces undefined characters
"abcd".encode("UTF-8", invalid: :replace, replace: "X") # "X" replaces invalid characters

b='a'.encode("ISO-8859-1")
b.force_encoding("UTF-8")             # tells to Ruby that b is an UTF-8 string,
b.encoding    => #<Encoding:UTF-8>    # but 'b' is not changed

Double-quoted String

Strings can be created from characters between double quotes;

Es.: a="stringa"

An alternative syntax is: %Q( string ) ; where the string inside parentheses can contain double quotes, but not the parentheses, which are used as a string delimiter. You can use the delimiters you like instead of parentheses, but if you use parentheses the initial and final delimiters must match: {} [] () <>

a="stringa\n"
a=%Q(abcd\n)       # () are used as delimiters
a=%QZ 12 ef(), Z   # 'Z' is used as a delimiter

double quoted strings can extend over many lines, preserving the newline character. The backslash can be used to escape the final newline, to effectively join two lines.

Between double quotes all the usual backslash substitutions are performed:

\n end of line

\b backspace

\e escape

\s space

\t \v tabs

\f \r form-feed, return

\hhh ottale

\xhh exadecimal

\uxxxx unicode

\C-x control-x sequence

\M-x meta-char sequence

Single-quoted Strings

Strings can be created from characters between single quotes, when single quotes are used only some backslash substitutions are performed: "\\" and "\`" .

Es.: a='stringa'

an alternative syntax is: %q( string ) ; the string inside parentheses can contain single quotes, but not parentheses You can use the delimiter you like instead of parentheses, but if you use parentheses the initial and final delimiters must match: {} [] () <>

a='stringa'
a=%q(stringa)            # () are used as delimiters
a=%qA xyx string aad A   # The letter 'A' is used as a delimiter

A string between single quotes can extend over many lines , the end of line is not escaped, but inserted into the string as "\n"

String Operators

In the following table a list of some methods of the String class.

+	concatenates strings: "a"+"b" => "ab"
*	repeats strings: "abc" * 2 => "abcabc"
<<	concatenates strings: "a"<<"b" => "ab"
ascii_only?	true if only ascii characters
empty?	true if empty
end_with?("string")	true if end with the given string
include?("substring")	test if substring included: 'abc'.include?(b)=> true
index("substring")	index of a given substring: 'abc'.index('b') => 1
rindex("substring")	index of a given substring starting from the end
insert(index,string)	substring insertion: "abc".insert(1,"xx")=>"axxbc"
split(pattern)	splits into an array, default pattern is a space
capitalize ; capitalize!	makes the first character uppercase
upcase ; upcase!	to upper cases; upcase! changes string in place
downcase ; downcase!	to lowercase
swpacase ;swapcase!	upper case to lower and lower to uppercase
sub(pattern,replacement)	first occurrence substring replacement
gsub(pattern,replacement)	all occurrence substring replacement
tr('old char','new') .tr!	change characters, as the "tr" Unix command
center(n," ")	centers in n characters, specifying the padding character
ljust(n," ")	shifted to left, in n characters, padded with space
rjust(n," ")	shifted to right
lstrip ; lstrip!	strip leading spaces
rstrip ; rstrip!	strip final spaces
strip ; strip!	strip final and leading spaces
squeeze(characters) ;squeeze	eliminates duplicates for the given characters
reverse ; reverse!	reverse the string
clear	empties the string
replace(newstring)	replaces the string with a new one
chomp ; chomp!	strips the final end of line, if present
encoding	returns the string encoding
valid_encoding?	if a valid encoding
encode("iso-8859-1") ; encode!	re-encode the string in the given encoding
force_encoding("utf-8")	tell the encoding to Ruby
to_i ; to_f	conversion to numbers
length ; bytesize	length in characters or bytes
getbyte(num)	get a single byte at a given position
setbyte(num)	set a single byte at a given position
bytes.to_a	byte contents: "ab".bytes.to_a =>[97, 98]
count("substring")	counts how many times the substring is found
count("a-c")	count characters
delete("chars") delete!("b")	delete characters
crypt	crypt the string using the operating system function
sum	computes a simple checksum for the string
next ; succ	next in the ascii sequence: "a".next => "b"
ord	encoding number of first character : "ab".ord => 97

If the argument of the "<<" operator is a number it is intended as the the numeric code of a character in the encoding of the string; the corresponding character is appended to the string:

"a"<<"b" => "ab"
"a"<<98  => "ab"

The function "chars"

It is used to separate a string into an array of characters (produces an Enumerator object):

"string".chars.to_a => ["s", "t", "r", "i", "n", "g", "a"]
count and delete

these functions are very versatile: can count or delete ranges of characters, characters out of a range etc.:

"abccd".count("a-c") => 4 ; "abccd".count("^a-c") => 1 ;

"abcdeff".delete("a-cf") => "de"
sub and gsub
these functions can have a regular expression or a string as the pattern argument, and also the subsequences of the match can be used:
```
"abcdcde".sub("cd","xy") => "abxycde"

"abcdcde".gsub("cd","xy") => "abxyxye"
```
slice
can be used to extract substrings, as the [] operator; the version: slice! changes the string in place:
```
"abcde".slice(2..4)  =>  "cde"
"abcde".slice(1,3)   =>  "bcd"
"abcde".slice("bcd") =>  "bcd"
```
The "?" operator

this operator gives the representation of a single character in a string:

?a => a ; ?C-d => "u0004" # this is the unicode for Cntrl/d
The [] operator

this operator can be used to extract characters from strings, it will be described in the section about Arrays
String interpolation:

A Ruby expression can be inserted into a string, and its result is computed and used into the string:

"stringa #{ruby statements } ... "
The format operator "%" :

this operator acts the same as in the printf routine of the C language:

" string with %s %d " % ['abc',123] => " string with abc 123 "
The plus operator "+"

this operator is used to concatenate strings.

"abc"+"def" => "abcdef"

Strings following strings are automatically concatenated:

a='asd' 'asd' => "asdasd"
The operator "*"

this operator is used to repeat strings.

"a"*3 => "aaa"
Here documents:
very long string can be inserted in the following way:
```
nomestringa = <<HERE
    here a long text

HERE
```
HERE is an arbitrary word, used as a delimiter, the final delimiter is alone, on a single line. No space is allowed between "<<" and the first delimiter, or after the last delimiter.

\n	end of line
\b	backspace
\e	escape
\s	space
\t \v	tabs
\f \r	form-feed, return
\hhh	ottale
\xhh	exadecimal
\uxxxx	unicode
\C-x	control-x sequence
\M-x	meta-char sequence