In Ruby strings are sequences of characters, but can also be used to store binary data as sequences of bytes. A string can be created from a sequence of characters:
a=String.new("characters")
Ruby has a complete set of functions to deal with strings. Strings are basic objects in Ruby, inherit Objects and include the Comparable module, which implements the basic operator for comparison: < ; > ; <= ; >=, == ; between?.
String comparison is based on the order of characters in in the ASCII sequence, where there are: first numbers, then uppercase letters, then lowercase ones. If the first characters of two strings are the same, then the longer string is considered greater:
'0'<'A' => true 'A'<'a' => true 'a'<'b' => true 'azzz'<'baaa' => true # the first different characters matters! 'aab'<'abb' => true 'abc'<'abc0' => true # the second string is longer
In Ruby each string has it's own encoding, which can be obtained by the method: encoding; the default encoding is UTF-8 for Ruby version 2, US-ASCII in Ruby 1.9; all strings where ASCII in Ruby 1.8.
The encodings are described by the Encode class. The class method Encode.list return an array with the list of all the available encodings.
In a source file the encoding of the file can be specified in the first lines:
#!/usr/bin/ruby # coding: utf-8
Some operations between strings can't be done if their encoding is not compatible; the encode method can be used to change the encoding of a string; this method has options to deal with undefined or invalid characters in the new encoding and to change the final newline character; there is also a force_encodig method that sets the encoding property of a string:
'a'.encoding => #<Encoding:UTF-8> b='a'.encode("ISO-8859-1") b.encoding => #<Encoding:ISO-8859-1> b.encode!("UTF-8") # encode! changes the string on-place b.encoding => #<Encoding:UTF-8> "abcd".encode("UTF-8", undef: :replace, replace: "X") # "X" replaces undefined characters "abcd".encode("UTF-8", invalid: :replace, replace: "X") # "X" replaces invalid characters b='a'.encode("ISO-8859-1") b.force_encoding("UTF-8") # tells to Ruby that b is an UTF-8 string, b.encoding => #<Encoding:UTF-8> # but 'b' is not changed
Strings can be created from characters between double quotes;
Es.: a="stringa"
An alternative syntax is: %Q( string ) ; where the string inside parentheses can contain double quotes, but not the parentheses, which are used as a string delimiter. You can use the delimiters you like instead of parentheses, but if you use parentheses the initial and final delimiters must match: {} [] () <>
a="stringa\n" a=%Q(abcd\n) # () are used as delimiters a=%QZ 12 ef(), Z # 'Z' is used as a delimiter
double quoted strings can extend over many lines, preserving the newline character. The backslash can be used to escape the final newline, to effectively join two lines.
Between double quotes all the usual backslash substitutions are performed:
\n end of line \b backspace \e escape \s space \t \v tabs \f \r form-feed, return \hhh ottale \xhh exadecimal \uxxxx unicode \C-x control-x sequence \M-x meta-char sequence
Strings can be created from characters between single quotes, when single quotes are used only some backslash substitutions are performed: "\\" and "\`" .
Es.: a='stringa'
an alternative syntax is: %q( string ) ; the string inside parentheses can contain single quotes, but not parentheses You can use the delimiter you like instead of parentheses, but if you use parentheses the initial and final delimiters must match: {} [] () <>
a='stringa' a=%q(stringa) # () are used as delimiters a=%qA xyx string aad A # The letter 'A' is used as a delimiter
A string between single quotes can extend over many lines , the end of line is not escaped, but inserted into the string as "\n"
In the following table a list of some methods of the String class.
+ | concatenates strings: "a"+"b" => "ab" |
* | repeats strings: "abc" * 2 => "abcabc" |
<< | concatenates strings: "a"<<"b" => "ab" |
ascii_only? | true if only ascii characters |
empty? | true if empty |
end_with?("string") | true if end with the given string |
include?("substring") | test if substring included: 'abc'.include?(b)=> true |
index("substring") | index of a given substring: 'abc'.index('b') => 1 |
rindex("substring") | index of a given substring starting from the end |
insert(index,string) | substring insertion: "abc".insert(1,"xx")=>"axxbc" |
split(pattern) | splits into an array, default pattern is a space |
capitalize ; capitalize! | makes the first character uppercase |
upcase ; upcase! | to upper cases; upcase! changes string in place |
downcase ; downcase! | to lowercase |
swpacase ;swapcase! | upper case to lower and lower to uppercase |
sub(pattern,replacement) | first occurrence substring replacement |
gsub(pattern,replacement) | all occurrence substring replacement |
tr('old char','new') .tr! | change characters, as the "tr" Unix command |
center(n," ") | centers in n characters, specifying the padding character |
ljust(n," ") | shifted to left, in n characters, padded with space |
rjust(n," ") | shifted to right |
lstrip ; lstrip! | strip leading spaces |
rstrip ; rstrip! | strip final spaces |
strip ; strip! | strip final and leading spaces |
squeeze(characters) ;squeeze | eliminates duplicates for the given characters |
reverse ; reverse! | reverse the string |
clear | empties the string |
replace(newstring) | replaces the string with a new one |
chomp ; chomp! | strips the final end of line, if present |
encoding | returns the string encoding |
valid_encoding? | if a valid encoding |
encode("iso-8859-1") ; encode! | re-encode the string in the given encoding |
force_encoding("utf-8") | tell the encoding to Ruby |
to_i ; to_f | conversion to numbers |
length ; bytesize | length in characters or bytes |
getbyte(num) | get a single byte at a given position |
setbyte(num) | set a single byte at a given position |
bytes.to_a | byte contents: "ab".bytes.to_a =>[97, 98] |
count("substring") | counts how many times the substring is found |
count("a-c") | count characters |
delete("chars") delete!("b") | delete characters |
crypt | crypt the string using the operating system function |
sum | computes a simple checksum for the string |
next ; succ | next in the ascii sequence: "a".next => "b" |
ord | encoding number of first character : "ab".ord => 97 |
If the argument of the "<<" operator is a number it is intended as the the numeric code of a character in the encoding of the string; the corresponding character is appended to the string:
"a"<<"b" => "ab" "a"<<98 => "ab"
It is used to separate a string into an array of characters (produces an Enumerator object):
"string".chars.to_a => ["s", "t", "r", "i", "n", "g", "a"]
these functions are very versatile: can count or delete ranges of characters, characters out of a range etc.:
"abccd".count("a-c") => 4 ; "abccd".count("^a-c") => 1 ;
"abcdeff".delete("a-cf") => "de"
these functions can have a regular expression or a string as the pattern argument, and also the subsequences of the match can be used:
"abcdcde".sub("cd","xy") => "abxycde" "abcdcde".gsub("cd","xy") => "abxyxye"
can be used to extract substrings, as the [] operator; the version: slice! changes the string in place:
"abcde".slice(2..4) => "cde" "abcde".slice(1,3) => "bcd" "abcde".slice("bcd") => "bcd"
this operator gives the representation of a single character in a string:
?a => a ; ?C-d => "u0004" # this is the unicode for Cntrl/d
this operator can be used to extract characters from strings, it will be described in the section about Arrays
A Ruby expression can be inserted into a string, and its result is computed and used into the string:
"stringa #{ruby statements } ... "
this operator acts the same as in the printf routine of the C language:
" string with %s %d " % ['abc',123] => " string with abc 123 "
this operator is used to concatenate strings.
"abc"+"def" => "abcdef"
Strings following strings are automatically concatenated:
a='asd' 'asd' => "asdasd"
this operator is used to repeat strings.
"a"*3 => "aaa"
very long string can be inserted in the following way:
nomestringa = <<HERE here a long text HERE
HERE is an arbitrary word, used as a delimiter, the final delimiter is alone, on a single line. No space is allowed between "<<" and the first delimiter, or after the last delimiter.