Why would the following regular expression produce the results below? ****input: john,doe,"1,200.99",training,12345 ****code executed on input: values = line.split(/("\d*,\d+\.\d*")|,/) ****output: ["john","doe","","1,200.99","","training","12345"] What''s up with the two empty array indexes that are generated during the split surrounding my dollar value? The regular expression is pretty explicit and the input string doesn''t contain anything that should cause them when run through the expression, yet there they are so obviously I''m doing something wrong in my regexp. Help. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On 19 Dec 2008, at 19:48, Corey Murphy <rails-mailing-list@andreas- s.net> wrote:> > Why would the following regular expression produce the results below? > > ****input: > john,doe,"1,200.99",training,12345 > > ****code executed on input: > values = line.split(/("\d*,\d+\.\d*")|,/) > > ****output: > ["john","doe","","1,200.99","","training","12345"] >Pretend you''re split. You see the comma after doe so you split. Then you see the other part of your regex and so you split again, resulting in the empty array Fred> What''s up with the two empty array indexes that are generated during > the > split surrounding my dollar value? > > The regular expression is pretty explicit and the input string doesn''t > contain anything that should cause them when run through the > expression, > yet there they are so obviously I''m doing something wrong in my > regexp. > Help. > -- > Posted via http://www.ruby-forum.com/. > > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Frederick Cheung wrote:> Pretend you''re split. You see the comma after doe so you split. Then > you see the other part of your regex and so you split again, resulting > in the empty array > > FredOk, but the pipe character stands for alternation much like an OR so once it sees doe and splits, the next part of the csv input it evaluates should match to the monetary portion of the regex so it splits around it. Maybe I''m just not getting somethings because I want it to split my (comma delimited) input line based on comma or when a value is wrapped with double quotes. I can always handle the extra empty array positions by simple performing a array.delete_if with a block that looks for empty strings but that seems like overkill if I could handle it with the regex instead. Sorry if I''m still not getting the point. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Dec 19, 2008, at 2:58 PM, Frederick Cheung wrote:> On 19 Dec 2008, at 19:48, Corey Murphy <rails-mailing-list@andreas- > s.net> wrote: > >> Why would the following regular expression produce the results below? >> >> ****input: >> john,doe,"1,200.99",training,12345 >> >> ****code executed on input: >> values = line.split(/("\d*,\d+\.\d*")|,/) >> >> ****output: >> ["john","doe","","1,200.99","","training","12345"] >> > Pretend you''re split. You see the comma after doe so you split. Then > you see the other part of your regex and so you split again, resulting > in the empty array > > Fred >> What''s up with the two empty array indexes that are generated during >> the >> split surrounding my dollar value? >> >> The regular expression is pretty explicit and the input string >> doesn''t >> contain anything that should cause them when run through the >> expression, >> yet there they are so obviously I''m doing something wrong in my >> regexp. >> Help.I think you actually get: => ["john", "doe", "", "\"1,200.99\"", "", "training", "12345"] as your output. (At least I do.) However, it looks like your input is CSV so why not let a CSV library do the heavy-lifting for you: irb> require ''rubygems'' => true irb> require ''fastercsv'' => true irb> FasterCSV.parse_line(line) => ["john", "doe", "1,200.99", "training", "12345"] Note: FasterCSV becomes the CSV in the standard library for Ruby 1.9, but it''s a gem before that so look at the rdocs/ri for whatever you''re using. -Rob Rob Biedenharn http://agileconsultingllc.com Rob-xa9cJyRlE0mWcWVYNo9pwxS2lgjeYSpx@public.gmane.org --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Rob Biedenharn wrote:> However, it looks like your input is CSV so why not let a CSV library > do the heavy-lifting for you:Primarily because this is a budget application I''m writing and the data coming from Accounts Payable (.csv files) need to be read in and then massaged some before moving it into the database. A lot of the massaging relates to categorizing transactions and associating other information with a record that isn''t a part of the original input file. So, I''m manually parsing this data into an array for active record column attribute assignment and the like. Unless fasterCSV reads a file in for me and give me the data in an array I can use in my model or controller for further processing, it doesn''t fit my needs. So forgive my ignorance if it does do this and I''m simply not aware. However, if it does, then woohoo and I''ll be pulling down the gem pronto. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Dec 19, 2008, at 3:26 PM, Corey Murphy wrote:> Rob Biedenharn wrote: >> However, it looks like your input is CSV so why not let a CSV library >> do the heavy-lifting for you: > > Primarily because this is a budget application I''m writing and the > data > coming from Accounts Payable (.csv files) need to be read in and then > massaged some before moving it into the database. A lot of the > massaging relates to categorizing transactions and associating other > information with a record that isn''t a part of the original input > file. > So, I''m manually parsing this data into an array for active record > column attribute assignment and the like. > > Unless fasterCSV reads a file in for me and give me the data in an > array > I can use in my model or controller for further processing, it doesn''t > fit my needs. So forgive my ignorance if it does do this and I''m > simply > not aware. However, if it does, then woohoo and I''ll be pulling down > the gem pronto.Then start now! gem install fastercsv I was just using one of the lower-level methods to parse your input line. I''ll let you read the docs[1] for yourself, but it will read the file and give the data back to you as an array or a hash (you choose). You can configure whether there are first-row headers, too. -Rob [1] http://fastercsv.rubyforge.org/ Rob Biedenharn http://agileconsultingllc.com Rob-xa9cJyRlE0mWcWVYNo9pwxS2lgjeYSpx@public.gmane.org --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Dec 19, 3:08 pm, Corey Murphy <rails-mailing-l...-ARtvInVfO7ksV2N9l4h3zg@public.gmane.org> wrote:> Frederick Cheung wrote: > > Pretend you''re split. You see the comma after doe so you split. Then > > you see the other part of your regex and so you split again, resulting > > in the empty array > > > Fred > > Ok, but the pipe character stands for alternation much like an OR so > once it sees doe and splits, the next part of the csv input it evaluates > should match to the monetary portion of the regex so it splits around > it. Maybe I''m just not getting somethingsMaybe not. :) Frederick is absolutely right. Here''s what I believe happens when split is working on your input string. The split routine starts parsing at the beginning of the string (I will represent the cursor in these examples with a #.) #john,doe,"1,200.99",training,12345 => [] At the moment, no results have been yielded, so the resulting array (after the =>) is still empty. Split runs until it sees a delimiter. john#,doe,"1,200.99",training,12345 => [''john''] Split discards the delimiter. john,#doe,"1,200.99",training,12345 => [''john''] Repeat for the next element (''doe'') and delimiter ('',''). john,doe,#"1,200.99",training,12345 => [''john'', ''doe''] Having just read a delimiter ('',''), split is now prepared to read the next value. But it doesn''t see a value -- it sees another delimiter (''"1,200.99"'', which matches your regex and is therefore considered a delimiter just like '',''). Split therefore writes an empty array element. john,doe,#"1,200.99",training,12345 => [''john'', ''doe'', ''''] Split skips over the delimiter -- but since the regexp contains *capturing parentheses*, split inserts the parenthesized portion of the match (which happens to match the whole delimiter) into the result set. john,doe,"1,200.99"#,training,12345 => [''john'', ''doe'', '''', ''"1,200.99"''] Split sees another delimiter ('','') with no intervening value, so it writes another empty string. This time, the regexp that matches the delimiter has no parentheses, so split does not put it into the result set. john,doe,"1,200.99",#training,12345 => [''john'', ''doe'', '''', ''"1,200.99"'', ''''] Split reads the last two values, skipping the delimiter in between. john,doe,"1,200.99",training,12345# => [''john'', ''doe'', '''', ''"1,200.99"'', '''', ''training'', ''12345''] Does that make it clearer? Since you defined the "1,200.99" string as a delimiter, the fact that it appears in the result set at all is actually accidental, and due 100% to the parentheses that you put around it in the regexp. Remember, the regexp in split defines the pattern for the *delimiter*, not the *values*. If you''re still confused about what the parentheses do, here''s a minimal example: ''aabbcc''.split(/b+/) => [''aa'', ''cc''] ''aabbcc''.split(/(b+)/) => [''aa'', ''bb'', ''cc'']> because I want it to split my > (comma delimited) input line based on comma or when a value is wrapped > with double quotes.I hate to break it to you, but a single regexp alone cannot handle this kind of parsing, because the meaning of a comma is *state- dependent* -- that is, a comma either acts as a delimiter or not, depending on whether an odd or even number of double quotes have been encountered since the beginning of the string. And regexps are not state-dependent, so that kind of logic is outside the capabilities of a regexp. You''ll need to write some higher-level routines in Ruby -- or, as others have suggested, simply use an off-the-shelf CSV parser. (I''d choose the latter approach if I were you -- parsers are a pain to get right!) Best, -- Marnen Laibow-Koser marnen-sbuyVjPbboAdnm+yROfE0A@public.gmane.org http://www.marnen.org --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---