How to Efficiently Process Large Excel Files Using Ruby


Last year, I worked on a project in which I needed to parse and handle large .xlsx files. Some of those files had more than 200K rows. I was looking for a gem that could efficiently do the job.

Process large excel files using Ruby

I wanted a simple solution that would just iterate over each row, parse the contents of the row and save data in database tables.

Popular gems in that time were Roo and Spreadsheet. I also found a simple gem called Simple xlsx reader that was easy to use and had everything I needed. Except one thing.

When I called, memory consumption would go to 2GB and it would stay that way until the garbage collector cleaned it.

What I needed was a library that would read Excel rows in streams. At that time, none of the mentioned gems had that ability. I found a gem called Creek that does only that - stream parsing large excel files. It is simple and works well. Memory consumption while parsing these files was now under 200 MB, which is acceptable. In the meantime, Roo was updated and now it has the same ability.

Also, to reduce the number of SQL queries, I implemented saving data in batches - 1000 rows in 1 query. So, my method reads a thousand rows, makes an array of ActiveRecord objects and then saves them at once. I used the gem called ActiveRecord Import for this purpose.

class ExcelDataParser
  def initialize(file_path)
    @file_path = file_path
    @records = []
    @counter = 1


  def call
    rows.each do |row|
      records << build_new_record(row)
      import_records if reached_batch_import_size? || reached_end_of_file?


  attr_reader :file_path, :records
  attr_accessor :counter

  def book
    @book ||=

  # in this example, we assume that the
  # content is in the first Excel sheet
  def rows
    @rows ||= book.sheets.first.rows

  def increment_counter
    self.counter += 1

  def row_count
    @row_count ||= rows.count

  def build_new_record(row)
    # only build a new record without saving it

  def import_records
    # save multiple records using activerecord-import gem

    # clear records array

  def reached_batch_import_size?
    (counter % BATCH_IMPORT_SIZE).zero?

  def reached_end_of_file?
    counter == row_count

Finally, this task is being executed asynchronously in a background job since it's too heavy to handle in the web process.