Hi, So on our web application we allow users to upload vcards, and as you might expect, some vcards break the vpim parser. I''ve noticed that the vast majority of these are from outlook. Here is what I think the issue is -- outlook is not respecting the ''soft line break'' correctly, and is not putting a space in front of the line after a soft line break. Consider the following: ADR;WORK;PREF:;;445 S. Longridge Road;Indianapolis;IN;46219;United States of America LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE:445 S. Longridge Road=0D=0AIndianapolis, IN 46219 The last line is the obvious error, as the 2nd line has the soft break (the equals sign), and the 3rd line does NOT have a space in front of it, as it needs. So the solution I cooked up was to run the whole file through this filter: filedata.gsub!(/=\r\n/,"=") this will search for any = that is followed directly by a line break, and simply remove the line break, thus creating one long continuous line. Now the questions are: 1. is this safe? 2. is this the right way to handle the outlook issue? I''ve run my solution through a bunch of examples and have yet to break it. However, I don''t know about vcards (and outlook) to conclude that this is the best solution. Please advise. Grant Ammons
On Thu, Oct 9, 2008 at 8:32 AM, Grant Ammons <grant at fakeboard.com> wrote:> Here is what I think the issue is -- outlook is not respecting the > ''soft line break'' correctly, and is not putting a space in front of > the line after a soft line break. > > Consider the following: > > ADR;WORK;PREF:;;445 S. Longridge Road;Indianapolis;IN;46219;United > States of America > LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE:445 S. Longridge Road=0D=0A> Indianapolis, IN 46219 > > > The last line is the obvious error, as the 2nd line has the soft break > (the equals sign), and the 3rd line does NOT have a space in front of > it, as it needs.I can''t tell which lines are which, can you attach a sample card as an attachment instead of pasted inline?> So the solution I cooked up was to run the whole file through this filter: > > filedata.gsub!(/=\r\n/,"=") > > this will search for any = that is followed directly by a line break, > and simply remove the line break, thus creating one long continuous > line.I think replacing with "=\r\n " might be better. Send me a sample file and I can look more closely. Sam
On Thu, Oct 9, 2008 at 8:32 AM, Grant Ammons <grant at fakeboard.com> wrote:> So on our web application we allow users to upload vcards, and as you > might expect, some vcards break the vpim parser. I''ve noticed that > the vast majority of these are from outlook. > > Here is what I think the issue is -- outlook is not respecting the > ''soft line break'' correctly, and is not putting a space in front of > the line after a soft line break.These are actually version 2.1 vCards, and vPim implements 3.0. But, vPim tries to be useful anyhow. I think I''ve looked at it before, and concluded that it isn''t even valid in vCard 2.1, though :-(. Your strategy won''t work in general, because any field could just happen to end with a = sign, its a perfectly valid character in any string fields. Here''s a card I exported from AddressBook.app. Assume "=" is the name of a conceptual artist. BEGIN:VCARD VERSION:3.0 N:=;=;;; FN:= CATEGORIES:f - Vancouver X-ABUID:76D3ED90-4BB4-49A9-B9DE-3E99408CF03C\:ABPerson END:VCARD> Consider the following: > > ADR;WORK;PREF:;;445 S. Longridge Road;Indianapolis;IN;46219;United > States of America > LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE:445 S. Longridge Road=0D=0A> Indianapolis, IN 46219> So the solution I cooked up was to run the whole file through this filter: > > filedata.gsub!(/=\r\n/,"=") >I think I''ve seen this a few times before, and you would be better with /=0D=0A=\r?\n([^ \t])/i replaced with "=0D=0A\1" In other words, catch the full pattern, a qp-escaped CRLF, followed by the qp =\r\n, followed by a non-SPC stuffed. Also, I''d suggest trying to parse without the mangling, then catching an invalid encoding error, and retrying after gsubbing.> Now the questions are: > > 1. is this safe?I wouldn''t do it on all cards, but for cards that fail, its a decent heuristic - better than failing.> 2. is this the right way to handle the outlook issue?> I''ve run my solution through a bunch of examples and have yet to break > it. However, I don''t know about vcards (and outlook) to conclude that > this is the best solution. Please advise.My example is a bit contrived, but the gsub would break it. In general, an = can show up anywhere, though, such as notes fields, category names, locally defined fields. Etc. You will also have a problem with the blank line after the b64 encoded picture. That I think might be easier to deal with, because it''s always invalid. Actually, maybe vPim deals with that already, there are lots of cards around that have randomly inserted blank lines. Anyhow, consecutive CRNL pairs can be concatenated into a single one safely. Any remaining cards that don''t pass, I''d be interested in seeing. Also, because it can damage valid cards, I can''t make your behaviour the default, but if you''d like to submit a patch, maybe a Vcard.decode_broken() that implements your scheme, it would be really great. I''d merge something like that in a second, faster if it came in unified diff with a unit test... Thanks, Sam
On Thu, Oct 9, 2008 at 5:32 PM, Grant Ammons <grant at fakeboard.com> wrote:> So on our web application we allow users to upload vcards, and as you > might expect, some vcards break the vpim parser. I''ve noticed that > the vast majority of these are from outlook.We have the same problem. I feel your pain.> Here is what I think the issue is -- outlook is not respecting the > ''soft line break'' correctly, and is not putting a space in front of > the line after a soft line break. > > Consider the following: > > ADR;WORK;PREF:;;445 S. Longridge Road;Indianapolis;IN;46219;United > States of America > LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE:445 S. Longridge Road=0D=0A> Indianapolis, IN 46219 > > > The last line is the obvious error, as the 2nd line has the soft break > (the equals sign), and the 3rd line does NOT have a space in front of > it, as it needs. > > So the solution I cooked up was to run the whole file through this filter: > > filedata.gsub!(/=\r\n/,"=") > > this will search for any = that is followed directly by a line break, > and simply remove the line break, thus creating one long continuous > line. > > Now the questions are: > > 1. is this safe?No, I don''t think so.> 2. is this the right way to handle the outlook issue?I concluded the problem was only with quoted-printable encoded fields so I made an exception handler for it. Parsing your example will raise an exception; we then conclude the vcard is broken and we''ll do our best to fix it. Our best guess is this: # we will search for lines for which if line =~ /ENCODING=QUOTED-PRINTABLE/i and line =~ /=$/ and lines.length>i+1 # is true # then remove the soft-line break i.e. the last character (=)... # (The decoder does not mind processing lines longer than the maximum of 76 chars # that are allowed in a quoted-printable encoded string.) and we concatenate those lines. This is similar to your approach; the big difference is that we''re doing this in an exception. I posted the patch some time ago to this or another vpim-mailinglist. I can send it to you of course, just drop me a mail. Cheers, Tijn