On the same page as Perl or Python, Ruby has great capabilities to be a powerful text processing language. This article briefly talks about the textual data processing abilities of Ruby and how you can use it to efficiently handle different formats of textual data, whether CSV data or XML data.
Strings in Ruby are a powerful way to hold, compare, and manipulate textual data.
In Ruby, String is a class that you can instantiate by invoking String::new or by just assigning a literal value.
When you assign values to Strings, you can use a pair of single quotes (') to enclose the value, or a pair of double quotes ( "" ).
Single quotes and double quotes specify Strings differently in a few ways. Double quotes
allow escape sequences that use a leading backslash ( \ ) and also allow evaluation of expressions within the strings using the #{} operator. Single quoted strings are simple, straight literals.
Listing 1 is an example.
Listing 1. Working with Ruby strings: Defining strings
message = 'Heal the World…'
puts message
message1 = "Take home Rs #{100*3/2} "
puts message1
Output :
# ./string1.rb
# Heal the World…
# Take home Rs 150
|
Here, the first string is defined with a pair of single quotes. The second one uses a pair of double quotes. In the second example,
the expression within #{} is evaluated before display.
Another useful way to define a string is generally used for multi-line string definitions.
From here on, I will use the interactive ruby console irb>> for my explanations. You should have it installed along with your Ruby installation. If not, I suggest, you get the irb Ruby gem and install it. It is a very useful tool for learning about Ruby and its modules. Once you install it, you can run it with the irb>> command.
Listing 2. Working with Ruby strings : Defining multiline strings
irb>> str = >>EOF irb>> "hello world irb>> "how do you feel? irb>> "how r u ? irb>> EOF "hello, world\nhow do you feel?\nhow r u?\n" irb>> puts str hello, world how do you feel? how r u? |
In Listing 2, everything between >>EOF and EOF is considered as a part of the string, including the \n (new line) characters.
The Ruby String class has a powerful set of methods to manipulate and process data stored in them. The examples in Listings 3, 4, and 5 illustrate a few of them.
Listing 3. Working with Ruby strings : Concatenating
irb>> str = "The world for a horse" # String initialized with a value
The world for a horse
irb>> str*2 # Multiplying with an integer returns a
# new string containing that many times
# of the old string.
The world for a horseThe world for a horse
irb>> str + " Who said it ? " # Concatenation of strings using the '+' operator
The world for a horse Who said it ?
irb>> str<<" is it? " # Concatenation using the '<<' operator
The world for a horse is it?
|
Extracting substrings and manipulating parts of the string
Listing 4. Working with Ruby Strings : Extracting and manipulating
irb>> str[0] # The '[]' operator can be used to extract substrings, just
# like accessing entries in an array.
# The index starts from 0.
84 # A single index returns the ascii value
# of the character at that position
irb>> str[0,5] # a range can be specified as a pair. The first is the starting
# index , second is the length of the substring from the
# starting index.
The w
irb>> str[16,5]="Ferrari" # The same '[]' operator can be used
# to replace substrings in a string
# by using the assignment like '[]='
irb>>str
The world for a Ferrari
Irb>> str[10..22] # The range can also be specified using [x1..x2]
for a Ferrari
irb>> str[" Ferrari"]=" horse" # A substring can be specified to be replaced by a new
# string. Ruby strings are intelligent enough to adjust the
# size of the string to make up for the replacement string.
irb>> s
The world for a horse
irb>> s.split # Split, splits the string based on the given delimiter
# default is a whitespace, returning an array of strings.
["The", "world", "for", "a", "horse"]
irb>> s.each(' ') { |str| p str.chomp(' ') }
# each , is a way of block processing the
# string splitting it on a record separator
# Here, I use chomp() to cut off the trailing space
"The"
"world"
"for"
"a"
"horse"
|
Many other utility methods are available with the Ruby String class, including methods
to change case, get the length, remove record separators, scan through the string,
encrypt, decrypt the string, and so on. Another useful method is the freeze method by which a string can be made immutable. After you
invoke that method on the String str (str.freeze, str cannot be modified).
Ruby also has what are called destructor methods. A method ending with an exclamation point (!) will modify the string permanently. Normal methods (those without the exclamation point at the end) modify and return a copy of the string they were invoked upon. The exclamation point methods modify the string which invokes the method.
Listing 5. Working with Ruby strings : Modifying a string permanently
irb>> str = "hello, world"
hello, world
irb>> str.upcase
HELLO, WORLD
irb>>str # str, remains as is.
Hello, world
irb>> str.upcase! # here, str gets modified by the '!' at the end of
# upcase.
HELLO, WORLD
irb>> str
HELLO, WORLD
|
In Listing 5, the string in str is modified by the upcase! method, but just the upcase method returns a copy of the string with case changed. These ! methods are sometimes very useful.
Ruby Strings are very powerful. Once you have your data captured in Strings, you are on your way to process them in a very easy and efficient manner using a plethora of methods at your disposal.
A CSV file is a very common way to represent tabular data, most commonly used as the format for data exported from a spreadsheet (such as a list of contacts with their contact details).
Ruby has a powerful library to handle and process such files. csv is the ruby module that deals with CSV files. It has methods to create, read and parse such files.
The example in Listing 6 shows how to create such a CSV file and then parse it using the Ruby csv module.
Listing 6. Handling CSV files : Create and parse a CSV file
require 'csv'
writer = CSV.open('mycsvfile.csv','w')
begin
print "Enter Contact Name: "
name = STDIN.gets.chomp
print "Enter Contact No: "
num = STDIN.gets.chomp
s = name+" "+num
row1 = s.split
writer << row1
print "Do you want to add more ? (y/n): "
ans = STDIN.gets.chomp
end while ans != "n"
writer.close
file = File.new('mycsvfile.csv')
lines = file.readlines
parsed = CSV.parse(lines.to_s)
p parsed
puts ""
puts "Details of Contacts stored are as follows..."
puts ""
puts "-------------------------------"
puts "Contact Name | Contact No"
puts "-------------------------------"
puts ""
CSV.open('mycsvfile.csv','r') do |row|
puts row[0] + " | " + row[1]
puts ""
end
|
Listing 7 shows the output:
Listing 7. Handling CSV files : Create and parse a CSV file output
Enter Contact Name: Santhosh Enter Contact No: 989898 Do you want to add more ? (y/n): y Enter Contact Name: Sandy Enter Contact No: 98988 Do you want to add more ? (y/n): n Details of Contacts stored are as follows... --------------------------------- Contact Name | Contact No --------------------------------- Santhosh | 989898 Sandy | 98988 |
Let's quickly review the example.
First, include the csv module (require 'csv').
To create a new CSV file named mycsvfile.csv, open it using the CSV.open() call. This returns a writer object.
This example creates a CSV file which holds a simple contact list, storing the name of the person along with his phone number. In the loop, the user is asked to enter the name of the contact and the phone number. The name and the phone number are concatenated into a single string and then split into an array of two strings. This array is passed to the writer object to be written into the CSV file. Thus, one pair of CSV values is stored as a single line in the file.
Once out of the loop, everything is done. Now close the writer and the data in the file is saved.
The next step is to parse the CSV file that is created.
One way to open and parse the file is to create a new File object that uses the name of the new CSV file.
Call the readlines method to read all the lines in the file into an array called lines.
Convert the lines array into a String object by calling lines.to_s and pass the string to the
CSV.parse method, which parses the CSV data and returns the content as and array of arrays.
Next, you see another way to open and parse the file. Open the file again using the CSV.open
call in read mode. This returns an array of rows. Print each row with some formatting
to display the contact details. Each row here is a line in the file.
As you can see, Ruby provides a powerful module for working with CSV files and data.
For working with XML files, Ruby has a powerful built-in library called REXML. This can be used to read and parse XML documents.
Look at this XML file and try to parse it using Ruby and REXML.
Below is a simple XML file listing the contents of a typical shopping cart in an online shopping mall. It has the following elements:
cart– is the root elementuser- the user who is shoppingitem- item the user has added to his cartid, priceandquantity- sub-elements of item.
Listing 8 shows the structure of the XML:
Listing 8. Working with XML Files : Sample XML File
<cart id="userid"> <item code="item-id"> <price> <price/unit> </price> <qty> <number-of-units> </qty> </item> </cart> |
Go to Download for the sample XML file. Now, load this XML file and parse through the tree using REXML.
Listing 9. Working with XML files : Parsing XML files
require 'rexml/document'
include REXML
file = File.new('shoppingcart.xml')
doc = Document.new(file)
root = doc.root
puts ""
puts "Hello, #{root.attributes['id']}, Find below the bill generated for your purchase..."
puts ""
sumtotal = 0
puts "-----------------------------------------------------------------------"
puts "Item\t\tQuantity\t\tPrice/unit\t\tTotal"
puts "-----------------------------------------------------------------------"
root.each_element('//item') { |item|
code = item.attributes['code']
qty = item.elements["qty"].text.split(' ')
price = item.elements["price"].text.split(' ')
total = item.elements["price"].text.to_i * item.elements["qty"].text.to_i
puts "#{code}\t\t #{qty}\t\t #{price}\t\t #{total}"
puts ""
sumtotal += total
}
puts "-----------------------------------------------------------------------"
puts "\t\t\t\t\t\t Sum total : " + sumtotal.to_s
puts "-----------------------------------------------------------------------"
|
Listing 10 shows the output.
Listing 10. Working with XML files : Parsing XML files output
Hello, santhosh, Find below the bill generated for your purchase...
-------------------------------------------------------------------------
Item Quantity Price/unit Total
-------------------------------------------------------------------------
CS001 2 100 200
CS002 5 200 1000
CS003 3 500 1500
CS004 5 150 750
-------------------------------------------------------------------------
Sum total : 3450
--------------------------------------------------------------------------
|
The example in Listing 9 parses the shopping cart XML file and generates a bill with the individual item totals and the sum total for the purchase (Listing 10).
Let's quickly go through it.
First, include the REXML module of Ruby. This has the methods to parse through the XML file.
Open the shoppingcart.xml file and create a Document object from it. This Document object is the one which contains the parsed XML file.
Assign the root of the document to the element object root.
This will now point to the cart tag in your XML.
Each element object has an attributes object which is a hash of the element
attribute names as keys and their values as values. Here, root.attributes['id'], will give the value of the attribute id of the root element, which in this case is the userid.
Next, initialize the sumtotal to 0 and print the headers.
Each element object also has an object called elements, with each and [] methods to access the sub-elements. The block runs through all the sub-elements of the root element with the name item, specified by the XPath expression //item. Each element object also has an attribute text that holds the textual value for that element.
Next, get the item element's code
attribute and the text value of the price and qty elements and calculate the total for the item. Print the details into the bill. Also, add the item total to the sumtotal.
Finally, print the sum total.
This example shows how easy and simple it is to parse XML files with REXML and Ruby. It is as easy to generate XML files on the fly, and to add and delete elements and their attributes.
Listing 11. Working with XML files : Generate XML files
doc = Document.new
doc.add_element("cart1", {"id" => "user2"})
cart = doc.root.elements[1]
item = Element.new("item")
item.add_element("price")
item.elements["price"].text = "100"
item.add_element("qty")
item.elements["qty"].text = "4"
cart .elements << item
|
The snippet in Listing 11 creates the XML structure by creating a cart element, and an item element and its sub-elements. It populates them with values and adds them to the Document root.
Similarly, to delete elements and attributes, use the delete_element and delete_attribute methods of the Elements object.
The above is an example of what is called tree parsing. Yet another way of parsing XML documents is known as stream parsing. This is faster than tree parsing and can be used where speed is imperative. Stream parsing is event-based and works with listeners. When a tag is encountered, the listener is called and it does the processing.
Listing 12 shows is an example
Listing 12. Working with XML files : Stream parsing
require 'rexml/document'
require 'rexml/streamlistener'
include REXML
class Listener
include StreamListener
def tag_start(name, attributes)
puts "Start #{name}"
end
def tag_end(name)
puts "End #{name}"
end
end
listener = Listener.new
parser = Parsers::StreamParser.new(File.new("shoppingcart.xml"), listener)
parser.parse
|
Listing 13 shows the output
Listing 13. Working with XML files : Stream parsing output
Start cart Start item Start price End price Start qty End qty End item Start item Start price End price Start qty End qty End item Start item Start price End price Start qty End qty End item Start item Start price End price Start qty End qty End item End cart |
Thus , REXML and Ruby provide a powerful combination for you to work with and manipulate XML data in a very efficient and intuitive way.
Ruby has a great set of built-in and external libraries for quick, powerful, and efficient text processing. You can harness this capability to simplify and enhance a variety of textual data processing needs that you might encounter. This article just touches upon a few of the aspects of this ability of Ruby. You can achieve a lot more.
Ruby is definitely, a great tool, that you'll want in your toolbox.
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample code for the article | sample-code.zip | 2KB | HTTP |
Information about download methods
Learn
- Ruby on Rails and XML: Generate a Rails stub to manipulate an XML document (Daniel Wintschel, developerWorks, April 2007): See how well Ruby on Rails works with XML.
- Use JRuby on Rails and XML to supercharge Ajax with a Java Application Server (Tyler Anderson, developerWorks, May 2008): Learn how XML can improve the efficiency of your JRuby on Rails and Ajax development.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- Technology bookstore: Browse for books on these and other technical topics.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks
podcasts: Listen to interesting interviews and discussions for software developers.
Get products and technologies
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- XML zone discussion forums: Participate in any of several XML-related discussions.
- developerWorks XML zone: Share your thoughts: After you read this article, post your comments and thoughts in this forum. The XML zone editors moderate the forum and welcome your input.
- developerWorks blogs: Check out these blogs and get involved in the developerWorks community.





