Ruby's CSV class choking on large files - Solved!

badp · April 15, 2008

I've got a nice little Ruby script designed to take a CSV file and convert it to a particular format. It works flawlessly, until I feed it a ~50MB CSV file. It opens the file but when it goes to parse, Ruby throws and Visual Studio catches an exception:

"An unhandled win32 exception occurred in ruby.exe [3040]." Then it goes on with the generic debugging message.

The Ruby script is below. I adapted it from an open source CSV-to-XML script. Also, as the comments indicate, the point of all this is to get a CSV file into a format that splunk will process correctly (it processed the CSV values, but does not get the fields right and refuses to learn them properly.)

The script:

#!/usr/bin/ruby
# CSV 2 splunk
# Converts a CSV file to a splunk-readable format

require 'csv'

print "CSV file to read: "
input_file = gets.chomp

print "File to write to: "
output_file = gets.chomp

puts "Opening CSV file..."
csvfile = File.open(input_file) {|f| f.read}
puts "CSV file opened."

puts "Parsing CSV file..."

csv = CSV::parse(csvfile)
fields = csv.shift

puts "Writing file..."

File.open(output_file, 'w') do |f|
  csv.each do |record|
    for i in 0..(fields.length - 1)
      f.print "#{fields[i]}="#{record[i]}", "
    end
    f.print "n"
  end
end # End file block - close file
puts "Contents of #{input_file} written to #{output_file}."

CSV::Parse(csvfile) is where it seems to choke. Any ideas?

badp · April 15, 2008

I installed the FasterCSV class and used it instead. For those who care, here's the new code that doesn't choke:

#!/usr/bin/ruby
# CSV 2 splunk
# Converts a CSV file to a splunk-readable format

require 'fastercsv'

print "CSV file to read: "
input_file = gets.chomp

print "File to write to: "
output_file = gets.chomp

puts "Opening CSV file..."
csvfile = File.open(input_file) {|f| f.read}
puts "CSV file opened."

puts "Parsing CSV file..."

csv = FasterCSV::parse(csvfile)
fields = csv.shift

puts "Writing file..."

File.open(output_file, 'w') do |f|
  csv.each do |record|
    for i in 0..(fields.length - 1)
      f.print "#{fields[i]}="#{record[i]}", "
    end
    f.print "n"
  end
end # End file block - close file
puts "Contents of #{input_file} written to #{output_file}."

Sign In

Ruby's CSV class choking on large files - Solved!

Recommended Posts

badp

Link to comment

Share on other sites

badp

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Browse

Activity