Jump to content

Ruby's CSV class choking on large files - Solved!


Recommended Posts

Posted

I've got a nice little Ruby script designed to take a CSV file and convert it to a particular format.  It works flawlessly, until I feed it a ~50MB CSV file.  It opens the file but when it goes to parse, Ruby throws and Visual Studio catches an exception:

"An unhandled win32 exception occurred in ruby.exe [3040]."  Then it goes on with the generic debugging message.

The Ruby script is below.  I adapted it from an open source CSV-to-XML script.  Also, as the comments indicate, the point of all this is to get a CSV file into a format that splunk will process correctly (it processed the CSV values, but does not get the fields right and refuses to learn them properly.)

The script:

#!/usr/bin/ruby
# CSV 2 splunk
# Converts a CSV file to a splunk-readable format

require 'csv'

print "CSV file to read: "
input_file = gets.chomp

print "File to write to: "
output_file = gets.chomp

puts "Opening CSV file..."
csvfile = File.open(input_file) {|f| f.read}
puts "CSV file opened."

puts "Parsing CSV file..."

csv = CSV::parse(csvfile)
fields = csv.shift

puts "Writing file..."

File.open(output_file, 'w') do |f|
  csv.each do |record|
    for i in 0..(fields.length - 1)
      f.print "#{fields[i]}="#{record[i]}", "
    end
    f.print "n"
  end
end # End file block - close file
puts "Contents of #{input_file} written to #{output_file}."

CSV::Parse(csvfile) is where it seems to choke.  Any ideas?

Posted

I installed the FasterCSV class and used it instead.  For those who care, here's the new code that doesn't choke:

#!/usr/bin/ruby
# CSV 2 splunk
# Converts a CSV file to a splunk-readable format

require 'fastercsv'

print "CSV file to read: "
input_file = gets.chomp

print "File to write to: "
output_file = gets.chomp

puts "Opening CSV file..."
csvfile = File.open(input_file) {|f| f.read}
puts "CSV file opened."

puts "Parsing CSV file..."

csv = FasterCSV::parse(csvfile)
fields = csv.shift

puts "Writing file..."

File.open(output_file, 'w') do |f|
  csv.each do |record|
    for i in 0..(fields.length - 1)
      f.print "#{fields[i]}="#{record[i]}", "
    end
    f.print "n"
  end
end # End file block - close file
puts "Contents of #{input_file} written to #{output_file}."

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...