<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/">
	<channel>
		<title>How to Efficiently Process Large Excel Files Using Ruby | Infinum</title>
		<atom:link href="https://infinum.com/blog/how-to-efficiently-process-large-excel-files-using-ruby/feed/" rel="self" type="application/rss+xml" />
		<link>https://infinum.com/blog/how-to-efficiently-process-large-excel-files-using-ruby/</link>
		<description>Building digital products</description>
		<lastBuildDate>Tue, 07 Apr 2026 19:14:13 +0000</lastBuildDate>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>

					<item>
				<image>
					<url>7870https://infinum.com/uploads/2017/02/how-to-efficiently-process-large-excel-files-using-ruby-0.webp</url>
				</image>
				<title>How to Efficiently Process Large Excel Files Using Ruby</title>
				<link>https://infinum.com/blog/how-to-efficiently-process-large-excel-files-using-ruby/</link>
				<pubDate>Wed, 15 Jun 2016 12:30:00 +0000</pubDate>
				<dc:creator>Vladimir Rosancic</dc:creator>
				<guid isPermaLink="false">https://infinum.com/the-capsized-eight/how-to-efficiently-process-large-excel-files-using-ruby/</guid>
				<description>
					<![CDATA[<p>Last year, I worked on a project in which I needed to parse and handle large .xlsx files. Some of those files had more than 200K rows. </p>
<p>The post <a href="https://infinum.com/blog/how-to-efficiently-process-large-excel-files-using-ruby/">How to Efficiently Process Large Excel Files Using Ruby</a> appeared first on <a href="https://infinum.com">Infinum</a>.</p>
]]>
				</description>
				<content:encoded>
					<![CDATA[<div
	class="wrapper"
	data-id="es-120"
	 data-animation-target='inner-items'>
		
			<div class="wrapper__inner">
			<div class="block-blog-content js-block-blog-content">
	
<div class="block-blog-content-sidebar" data-id="es-92">
	</div>

<div class="block-blog-content-main">
	
<div
	class="wrapper wrapper__use-simple--true"
	data-id="es-95"
	 data-animation='slideFade' data-animation-target='inner-items'>
		
			<div class="block-paragraph" data-id="es-93">
	<p	class='typography typography--size-16-text-roman js-typography block-paragraph__paragraph'
	data-id='es-94'
	>
	Last year, I worked on a project in which I needed to parse and handle large .xlsx files. Some of those files had more than 200K rows. I was looking for a gem that could efficiently do the job.</p></div>	</div>

<div
	class="wrapper wrapper__use-simple--true"
	data-id="es-98"
	 data-animation='slideFade' data-animation-target='inner-items'>
		
			<div class="block-media">
	<div	class="media block-media__media media__border--none media__align--center-center"
	data-id="es-96"
	 data-media-type='image'>

	<figure class="image block-media__image-figure image--size-stretch" data-id="es-97">
	<picture class="image__picture block-media__image-picture">
												<img
					src="https://infinum.com/uploads/2017/02/how-to-efficiently-process-large-excel-files-using-ruby-1.webp"
					class="image__img block-media__image-img"
					alt=""
										height="391"
															width="1000"
										loading="lazy"
					 />
					</picture>

	</figure></div></div>	</div>

<div
	class="wrapper wrapper__use-simple--true"
	data-id="es-101"
	 data-animation='slideFade' data-animation-target='inner-items'>
		
			<div class="block-paragraph" data-id="es-99">
	<p	class='typography typography--size-16-text-roman js-typography block-paragraph__paragraph'
	data-id='es-100'
	>
	I wanted a simple solution that would just iterate over each row, parse the contents of the row and save data in database tables.</p></div>	</div>

<div
	class="wrapper wrapper__use-simple--true"
	data-id="es-104"
	 data-animation='slideFade' data-animation-target='inner-items'>
		
			<div class="block-paragraph" data-id="es-102">
	<p	class='typography typography--size-16-text-roman js-typography block-paragraph__paragraph'
	data-id='es-103'
	>
	Popular gems in that time were <a href="https://github.com/roo-rb/roo">Roo</a> and <a href="https://github.com/zdavatz/spreadsheet">Spreadsheet</a>. I also found a simple gem called <a href="https://github.com/woahdae/simple_xlsx_reader">Simple xlsx reader</a> that was easy to use and had everything I needed. Except one thing.</p></div>	</div>

<div
	class="wrapper wrapper__use-simple--true"
	data-id="es-107"
	 data-animation='slideFade' data-animation-target='inner-items'>
		
			<div class="block-paragraph" data-id="es-105">
	<p	class='typography typography--size-16-text-roman js-typography block-paragraph__paragraph'
	data-id='es-106'
	>
	When I called <em>SimpleXlsxReader.open(file_path)</em>, memory consumption would go to <strong>2GB</strong> and it would stay that way until the garbage collector cleaned it.</p></div>	</div>

<div
	class="wrapper wrapper__use-simple--true"
	data-id="es-110"
	 data-animation='slideFade' data-animation-target='inner-items'>
		
			<div class="block-paragraph" data-id="es-108">
	<p	class='typography typography--size-16-text-roman js-typography block-paragraph__paragraph'
	data-id='es-109'
	>
	What I needed was a library that would read Excel rows in streams. At that time, none of the mentioned gems had that ability. I found a gem called <a href="https://github.com/pythonicrubyist/creek">Creek</a> that does only that – stream parsing large excel files. It is simple and works well. Memory consumption while parsing these files was now under <strong>200 MB</strong>, which is acceptable. In the meantime, Roo was updated and now it <a href="https://github.com/roo-rb/roo#excel-xlsx-and-xlsm-support">has the same ability</a>.</p></div>	</div>

<div
	class="wrapper wrapper__use-simple--true"
	data-id="es-113"
	 data-animation='slideFade' data-animation-target='inner-items'>
		
			<div class="block-paragraph" data-id="es-111">
	<p	class='typography typography--size-16-text-roman js-typography block-paragraph__paragraph'
	data-id='es-112'
	>
	Also, to reduce the number of SQL queries, I implemented saving data in batches – 1000 rows in 1 query. So, my method reads a thousand rows, makes an array of ActiveRecord objects and then saves them at once. I used the gem called <a href="https://github.com/zdennis/activerecord-import">ActiveRecord Import</a> for this purpose.</p></div>	</div>

<div
	class="wrapper wrapper__use-simple--true"
	data-id="es-115"
	 data-animation='slideFade' data-animation-target='inner-items'>
		
			<div class="block-code">
	<pre class="phiki language-ruby github-light" data-language="ruby" style="background-color: #fff;color: #24292e;"><code><span class="line"><span class="token" style="color: #d73a49;">class</span><span class="token"> </span><span class="token" style="color: #6f42c1;">ExcelDataParser</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">initialize</span><span class="token">(</span><span class="token">file_path</span><span class="token">)</span><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #24292e;">@</span><span class="token" style="color: #24292e;">file_path</span><span class="token"> </span><span class="token" style="color: #d73a49;">=</span><span class="token"> file_path
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #24292e;">@</span><span class="token" style="color: #24292e;">records</span><span class="token"> </span><span class="token" style="color: #d73a49;">=</span><span class="token"> </span><span class="token">[</span><span class="token">]</span><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #24292e;">@</span><span class="token" style="color: #24292e;">counter</span><span class="token"> </span><span class="token" style="color: #d73a49;">=</span><span class="token"> </span><span class="token" style="color: #005cc5;">1</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #005cc5;">BATCH_IMPORT_SIZE</span><span class="token"> </span><span class="token" style="color: #d73a49;">=</span><span class="token"> </span><span class="token" style="color: #005cc5;">1000</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">call</span><span class="token">
</span></span><span class="line"><span class="token">    rows</span><span class="token">.</span><span class="token" style="color: #6f42c1;">each</span><span class="token"> </span><span class="token" style="color: #d73a49;">do</span><span class="token"> </span><span class="token" style="color: #d73a49;">|</span><span class="token">row</span><span class="token" style="color: #d73a49;">|</span><span class="token">
</span></span><span class="line"><span class="token">      increment_counter
</span></span><span class="line"><span class="token">      records </span><span class="token" style="color: #d73a49;">&lt;&lt;</span><span class="token"> </span><span class="token" style="color: #6f42c1;">build_new_record</span><span class="token">(</span><span class="token">row</span><span class="token">)</span><span class="token">
</span></span><span class="line"><span class="token">      import_records </span><span class="token" style="color: #d73a49;">if</span><span class="token"> reached_batch_import_size? </span><span class="token" style="color: #d73a49;">||</span><span class="token"> reached_end_of_file?
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">private</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">attr_reader</span><span class="token"> </span><span class="token" style="color: #005cc5;">:</span><span class="token" style="color: #005cc5;">file_path</span><span class="token">,</span><span class="token"> </span><span class="token" style="color: #005cc5;">:</span><span class="token" style="color: #005cc5;">records</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">attr_accessor</span><span class="token"> </span><span class="token" style="color: #005cc5;">:</span><span class="token" style="color: #005cc5;">counter</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">book</span><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #24292e;">@</span><span class="token" style="color: #24292e;">book</span><span class="token"> </span><span class="token" style="color: #d73a49;">||=</span><span class="token"> </span><span class="token" style="color: #005cc5;">Creek</span><span class="token">::</span><span class="token" style="color: #005cc5;">Book</span><span class="token">.</span><span class="token" style="color: #d73a49;">new</span><span class="token">(</span><span class="token">file_path</span><span class="token">)</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #6a737d;">#</span><span class="token" style="color: #6a737d;"> in this example, we assume that the</span><span class="token" style="color: #6a737d;">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #6a737d;">#</span><span class="token" style="color: #6a737d;"> content is in the first Excel sheet</span><span class="token" style="color: #6a737d;">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">rows</span><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #24292e;">@</span><span class="token" style="color: #24292e;">rows</span><span class="token"> </span><span class="token" style="color: #d73a49;">||=</span><span class="token"> book</span><span class="token">.</span><span class="token" style="color: #6f42c1;">sheets</span><span class="token">.</span><span class="token" style="color: #6f42c1;">first</span><span class="token">.</span><span class="token" style="color: #6f42c1;">rows</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">increment_counter</span><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #005cc5;">self</span><span class="token">.</span><span class="token" style="color: #6f42c1;">counter</span><span class="token"> </span><span class="token" style="color: #d73a49;">+=</span><span class="token"> </span><span class="token" style="color: #005cc5;">1</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">row_count</span><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #24292e;">@</span><span class="token" style="color: #24292e;">row_count</span><span class="token"> </span><span class="token" style="color: #d73a49;">||=</span><span class="token"> rows</span><span class="token">.</span><span class="token" style="color: #6f42c1;">count</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">build_new_record</span><span class="token">(</span><span class="token">row</span><span class="token">)</span><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #6a737d;">#</span><span class="token" style="color: #6a737d;"> only build a new record without saving it</span><span class="token" style="color: #6a737d;">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #005cc5;">RecordModel</span><span class="token">.</span><span class="token" style="color: #d73a49;">new</span><span class="token">(</span><span class="token">.</span><span class="token">.</span><span class="token">.</span><span class="token">)</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">import_records</span><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #6a737d;">#</span><span class="token" style="color: #6a737d;"> save multiple records using activerecord-import gem</span><span class="token" style="color: #6a737d;">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #005cc5;">RecordModel</span><span class="token">.</span><span class="token" style="color: #6f42c1;">import</span><span class="token">(</span><span class="token">records</span><span class="token">)</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token" style="color: #6a737d;">#</span><span class="token" style="color: #6a737d;"> clear records array</span><span class="token" style="color: #6a737d;">
</span></span><span class="line"><span class="token">    records</span><span class="token">.</span><span class="token" style="color: #6f42c1;">clear</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">reached_batch_import_size?</span><span class="token">
</span></span><span class="line"><span class="token">    </span><span class="token">(</span><span class="token">counter </span><span class="token" style="color: #d73a49;">%</span><span class="token"> </span><span class="token" style="color: #005cc5;">BATCH_IMPORT_SIZE</span><span class="token">)</span><span class="token">.</span><span class="token" style="color: #6f42c1;">zero?</span><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">def</span><span class="token"> </span><span class="token" style="color: #6f42c1;">reached_end_of_file?</span><span class="token">
</span></span><span class="line"><span class="token">    counter </span><span class="token" style="color: #d73a49;">==</span><span class="token"> row_count
</span></span><span class="line"><span class="token">  </span><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token" style="color: #d73a49;">end</span><span class="token">
</span></span><span class="line"><span class="token">
</span></span></code></pre></div>	</div>

<div
	class="wrapper wrapper__use-simple--true"
	data-id="es-118"
	 data-animation='slideFade' data-animation-target='inner-items'>
		
			<div class="block-paragraph" data-id="es-116">
	<p	class='typography typography--size-16-text-roman js-typography block-paragraph__paragraph'
	data-id='es-117'
	>
	Finally, this task is being executed asynchronously in a background job since it’s too heavy to handle in the web process.</p></div>	</div>
</div>
</div>		</div>
	</div><p>The post <a href="https://infinum.com/blog/how-to-efficiently-process-large-excel-files-using-ruby/">How to Efficiently Process Large Excel Files Using Ruby</a> appeared first on <a href="https://infinum.com">Infinum</a>.</p>
]]>
				</content:encoded>
			</item>
		
	</channel>
</rss>