Developer Forums | About Us | Site Map


Useful Lists

Web Host
site hosted by netplex

Online Manuals

Cultured Perl: Reading and writing Excel files with Perl
By Teodor Zlatanov - 2004-09-23 Page:  1 2 3

Linux example: parsing

This section applies to UNIX, and specifically Linux. It has not been tested under Windows.

It would be difficult to give a better example of parsing with Linux than the one provided in the documentation for the Spreadsheet::ParseExcel module, so I will show that example and then explain how it works.


Listing 3:

#!/usr/bin/perl -w

use strict;
use Spreadsheet::ParseExcel;

my $oExcel = new Spreadsheet::ParseExcel;

die "You must provide a filename to $0 to be parsed as an Excel file" unless @ARGV;

my $oBook = $oExcel->Parse($ARGV[0]);
my($iR, $iC, $oWkS, $oWkC);
print "FILE  :", $oBook->{File} , "\n";
print "COUNT :", $oBook->{SheetCount} , "\n";

print "AUTHOR:", $oBook->{Author} , "\n"
 if defined $oBook->{Author};

for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++)
 $oWkS = $oBook->{Worksheet}[$iSheet];
 print "--------- SHEET:", $oWkS->{Name}, "\n";
 for(my $iR = $oWkS->{MinRow} ;
     defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ;
  for(my $iC = $oWkS->{MinCol} ;
      defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ;
   $oWkC = $oWkS->{Cells}[$iR][$iC];
   print "( $iR , $iC ) =>", $oWkC->Value, "\n" if($oWkC);

This example was tested with Excel 97. If it does not work, try converting to the Excel 97 format. The perldoc page for Spreadsheet::ParseExcel claims Excel 95 and 2000 compatibility as well.

The spreadsheet is parsed into a top-level object called $oBook. $oBook has properties to aid the program, such as "File," "SheetCount," and "Author." The properties are documented in the perldoc page for Spreadsheet::ParseExcel, in the workbook section.

The workbook contains several worksheets; iterate through them by using the workbook SheetCount property. Each worksheet has a MinRow and MinCol and corresponding MaxRow and MaxCol properties, which can be used to figure out the range the worksheet can access. The properties are documented in the perldoc page for Spreadsheet::ParseExcel, in the worksheet section.

Cells can be obtained from a worksheet through the Cells property; that's how the $oWkC object is obtained in Listing 3. Cell properties are documented in the perldoc page for Spreadsheet::ParseExcel, in the Cell section. There does not seem to be a way, according to the documentation, to obtain the formula listed in a particular cell.

Linux example: writing

This section applies to UNIX, and specifically Linux. It has not been tested under Windows.

Spreadsheet::WriteExcel comes with a lot of example scripts in the Examples directory, usually found under /usr/lib/perl5/site_perl/5.6.0/Spreadsheet/WriteExcel/examples. It may have been installed elsewhere; consult with your local Perl administrator if you can't find that directory.

The bad news is that Spreadsheet::WriteExcel can not be used to write to an existing Excel file. You have to import data from an existing Excel file yourself, using Spreadsheet::ParseExcel. The good news is that Spreadsheet::WriteExcel is compatible with Excel 5 up to Excel 2000.

Here's a program that will demonstrate how data can be extracted from an Excel file, modified (all the numbers are multiplied by 2), and written to a new Excel file. Only the data is preserved, without formatting or any properties. Formulas are dropped.


Listing 4:

#!/usr/bin/perl -w

use strict;
use Spreadsheet::ParseExcel;
use Spreadsheet::WriteExcel;
use Data::Dumper;

# cobbled together from examples for the Spreadsheet::ParseExcel and
# Spreadsheet::WriteExcel modules

my $sourcename = shift @ARGV;
my $destname = shift @ARGV or die "invocation: $0 <source file> <destination file>";

my $source_excel = new Spreadsheet::ParseExcel;

my $source_book = $source_excel->Parse($sourcename)
 or die "Could not open source Excel file $sourcename: $!";

my $storage_book;

foreach my $source_sheet_number (0 .. $source_book->{SheetCount}-1)
 my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];

 print "--------- SHEET:", $source_sheet->{Name}, "\n";

 # sanity checking on the source file: rows and columns should be sensible
 next unless defined $source_sheet->{MaxRow};
 next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
 next unless defined $source_sheet->{MaxCol};
 next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};

 foreach my $row_index ($source_sheet->{MinRow} .. $source_sheet->{MaxRow})
  foreach my $col_index ($source_sheet->{MinCol} .. $source_sheet->{MaxCol})
   my $source_cell = $source_sheet->{Cells}[$row_index][$col_index];
   if ($source_cell)
    print "( $row_index , $col_index ) =>", $source_cell->Value, "\n";

    if ($source_cell->{Type} eq 'Numeric')
  $storage_book->{$source_sheet->{Name}}->{$row_index}->{$col_index} = $source_cell->Value*2;
  $storage_book->{$source_sheet->{Name}}->{$row_index}->{$col_index} = $source_cell->Value;
    } # end of if/else
   } # end of source_cell check
  } # foreach col_index
 } # foreach row_index
} # foreach source_sheet_number

print "Perl recognized the following data (sheet/row/column order):\n";
print Dumper $storage_book;

my $dest_book  = Spreadsheet::WriteExcel->new("$destname")
 or die "Could not create a new Excel file in $destname: $!";

print "\n\nSaving recognized data in $destname...";

foreach my $sheet (keys %$storage_book)
 my $dest_sheet = $dest_book->addworksheet($sheet);
 foreach my $row (keys %{$storage_book->{$sheet}})
  foreach my $col (keys %{$storage_book->{$sheet}->{$row}})
   $dest_sheet->write($row, $col, $storage_book->{$sheet}->{$row}->{$col});
  } # foreach column
 } # foreach row
} # foreach sheet


print "done!\n";

It is noteworthy that the data extraction and storage parts of the program are forcibly separated. They could have been done at the same time, but by separating them, bug fixes and improvements can be easily made.

A much better solution to the problem above could be achieved with the XML::Excel CPAN module, but a special converter from XML back to Excel would have to be written. You can also use the DBI interface through the DBD::Excel module, if you want to import data that way. Finally, Spreadsheet::ParseExcel comes with the Spreadsheet::ParseExcel::SaveParser module, which claims to convert between two Excel files but comes with no documentation or examples. My Web site (see Resources) shows an example of using SaveParser. Be forewarned that this is experimental and highly combustible.


If you are using a Windows machine, stick with the Win32::OLE modules unless you don't have Excel at all on your machine. Win32::OLE is the easiest way to get Excel data right now, although the Spreadsheet::WriteExcel and Spreadsheet::ParseExcel modules are catching up.

On UNIX, especially Linux, go with the Spreadsheet::WriteExcel and Spreadsheet::ParseExcel modules for programmatic access to Excel data. But be forewarned that these are fairly young modules, and they may not be perfect for you if you need stability.

You may also consider packages like Gnumeric and StarOffice (see Resources), which are freely available and offer a full GUI interface and import/export capabilities for Excel files. These are useful if you don't need programmatic access to the Excel data. I have used both applications and find them wonderful for day-to-day tasks.

View Cultured Perl: Reading and writing Excel files with Perl Discussion

Page:  1 2 3 Next Page: Resources

First published by IBM developerWorks

Copyright 2004-2021 All rights reserved.
Article copyright and all rights retained by the author.