5 Replies - 3048 Views - Last Post: 06 December 2011 - 12:13 PM Rate Topic: -----

#1 NotarySojac  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 53
  • View blog
  • Posts: 428
  • Joined: 30-September 10

regex -- parse into a hash?

Posted 02 December 2011 - 04:24 PM

Hey, I'm working on a script that will convert this:

#hello{
   content: "hi there";
}
#goodbye {
   content: "fare well";
}



into something more manageable (it will eventually be exported to XML because of how much IE sucks at everything I make.. there's a designer involved that I don't want getting flustered).

So it will probably get to this state at some point:

 myHash = { :hello => "hi there", :goodbye => "fare well" }



I'm trying to do this the cool way with regex, but I've been stumped for the past hour or so. At this point I'm up to:

http://rubular.com/r/WKyWomf4Nm

or rather, /#(\w+)\S*/m but I'm having trouble working out the next part and how it will be implemented in my code exactly.


Here's the entirety of my code, btw, if anyone wants to see what's going on with this puzzle I wound up creating for myself. Basically there's some ajax on a webpage that, for IE < 9, it will send a query to the data.xml file that I have this script setup since it can't simply pull from the css as would be most preferable.


# The designer is adding the paragraph content into the
# CSS (top of partners grid) and this works great for all good
# browsers, but not IE windows XP, so you need to run this task
# whenever the lovely paragraphs change so the changes are reflected
# in pathetic browsers
#
CSS_LOCATION = "#{Rails.root}/public/stylesheets/grid.css"
END_OF_LINE_STRING = "/* End of Grid"


namespace :db do
  desc "Converts the grid.css paragraph contents into xml for IE."
  task :convert => :environment do
    convert_paragraphs
  end
end

def convert_paragraphs
  css_file = get_css_file

  data = parse_to_data(css_file)

  puts css_file
end

def parse_to_data(file)
  # create a hash for each id

  my_data_hash = {}

  # find an occurance of '#', then collect the string attached to it...
  #

end

def get_css_file
  css_file = File.open(CSS_LOCATION, "rb").read

  css_file = cut_css_file(css_file)
  css_file = remove_comments(css_file)
  css_file = remove_empty_lines(css_file)

  return css_file
end

def remove_empty_lines(file)
  file = file.gsub /^\s*$\n/, ''
  return file
end

def remove_comments(file)
  file = file.gsub( /\/\*(...).*\*\//, "")
  return file
end

def cut_css_file(file)
  cut_point = file =~ /\/\* End of Grid/
  file = file[0..cut_point-1]

  return file
end



Is This A Good Question/Topic? 0
  • +

Replies To: regex -- parse into a hash?

#2 NotarySojac  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 53
  • View blog
  • Posts: 428
  • Joined: 30-September 10

Re: regex -- parse into a hash?

Posted 03 December 2011 - 03:47 PM

Ok, looks like I'm getting there. This little problem hasn't been as fun to solve with ruby as other things I've done, but it wasn't all that grueling (although it would have gone a lot faster if ruby googled as well as C# does).

Here's some sample code that runs on 1.9.2 just fine. It's very, VERY wordy, so I have to assume that I've truly taken the longest route possible to accomplish my goal. If anyone has a link that explains a faster way, plz share it.

my_string = <<DOC
#hello {
   content: "hi there";
}
s
#goodbye {
   content: "fare well";
}
DOC



def get_all_the_blocks(file, next_start=0)
  ret_hash = {}
  
  while (true)
    s = parse_data(file, next_start)  #=> [Array, end]   ...Array = [key, value]
    array_out = s[0]
    next_start = s[1]
    
    if next_start.nil?   # Get out if we're all done parsing out data
      return ret_hash
    end
    
    # Put the data into the master hash...
    ret_hash[array_out[0]] = array_out[1]
    
  end
end

def parse_data(file, start=0)
  # find the next '#'
  index = file.index("#", start)
  
  if index.nil?
    return [nil, nil]
  end
  
  # find the next '}' after
  index_end = file.index("}", index)
  
  my_block = file[index..(index_end+1)]
  
  my_datas = get_data_from_block(my_block)    #=> [key, value]
  
  return [my_datas, index_end]
end

def get_data_from_block(block)
  my_array = Array.new
  m = block =~ /#(\S*).*content:\s*"(.*)"/m
  
  # my_array[:$1] = $2
  my_array[0] = $1
  my_array[1] = $2
  
  return my_array
end


s = get_all_the_blocks(my_string)

puts s



#m = my_string =~ /#(\S*).*content:\s*"(.*)"/m


#puts $1


This post has been edited by NotarySojac: 03 December 2011 - 05:41 PM

Was This Post Helpful? 0
  • +
  • -

#3 Karel-Lodewijk  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 449
  • View blog
  • Posts: 849
  • Joined: 17-March 11

Re: regex -- parse into a hash?

Posted 04 December 2011 - 06:03 AM

Pure regular expressions can get you there, try

\s*#(\w+)\s*{\s*content:\s*"(.*?)".*?}\s*

http://rubular.com/r/hUWNdEJiS5

Some of the tricks I've used are

+? and *? are non-greedy versions of + and *, meaning they will stop as soon as they can (another character is matched). they help you to express things like .*?} (I don't care what else is there before the next curly brace) or "(.*?)" (match everything between two ").

Also it's often useful to spread \s* liberally around, basically everywhere where any number of whitespaces/newlines are allowed. It will not break anything and make the regex more robust.

If you will get more fields than just a content field, then I would recommend parsing it in two stages

Something like:

/#(\w+?)\s*{(.*?)}/m

And then looping over the results, where result[0] will be the name and results[1] will be the contents between {}, then something like

/(\w+?):\s*"(.*?)"\s*;/m

will further break the fields down.

This post has been edited by Karel-Lodewijk: 04 December 2011 - 07:12 AM

Was This Post Helpful? 3
  • +
  • -

#4 NotarySojac  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 53
  • View blog
  • Posts: 428
  • Joined: 30-September 10

Re: regex -- parse into a hash?

Posted 05 December 2011 - 10:09 AM

Thanks for those tips Karel-Lodewijk. What about the implementation in ruby code? Does the MatchData object contain both sets of matches or can it only get one set at a time and I will need to loop through.

my_string = <<DOC
#hello {
   content: "hi there";
}
s
#goodbye {
   content: "fare well";
}
DOC

m = my_string =~ /\s*#(\w+)\s*{\s*content:\s*"(.*?)".*?}\s*/m

p $~



The match data object seems to only know about the #hello part, and doesn't have information on the #goodbye part. But on rubular, it seems to imply that it can do both sets at once.

This post has been edited by NotarySojac: 05 December 2011 - 10:10 AM

Was This Post Helpful? 0
  • +
  • -

#5 Karel-Lodewijk  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 449
  • View blog
  • Posts: 849
  • Joined: 17-March 11

Re: regex -- parse into a hash?

Posted 05 December 2011 - 06:09 PM

m = my_string.scan /\s*#(\w+)\s*{\s*content:\s*"(.*?)".*?}\s*/m
p m



Or my second suggestion

top_levels = my_string.scan /#(\w+?)\s*{(.*?)}/m

for top_level in top_levels
    p  "name " + top_level[0]
    tags = top_level[1].scan /(\w+?):\s*"(.*?)"\s*;/m
    p tags
end



It's more robust as it can parse things like

my_string = <<DOC
#hello {
   content: "hi there";
}
#goodbye {
   content: "fare well";
   other tag: "aaaaaaa";
}
DOC


Was This Post Helpful? 2
  • +
  • -

#6 NotarySojac  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 53
  • View blog
  • Posts: 428
  • Joined: 30-September 10

Re: regex -- parse into a hash?

Posted 06 December 2011 - 12:13 PM

Thanks for your help! I'm going to have to put some time into practicing these new patterns till they start feeling more natural to me.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1