I''m having trouble with string parameter types in web services. I''ve got the following as part of structured parameter that inherits from ActionWebServic::Base. member :name, :string Everything works great when :name contains regular ASCII text. But, if :name contains UTF-8 encoded characters AWS dynamically changes the return type from string to base64. Unfortunately, this confuses SOAP clients that have created services from the advertised WSDL file that told the client to expect a regular string type. Is this the expected behavior? If so, I''m a little confused as to why a conversion from string to base64 is required since the entire XML response is encoded in UTF-8. One option is to simply change all strings to base64 types, which I''ve done for now, but I''d prefer to switch back to strings since dealing with base64 makes things a little more difficult for web service clients. Tyler
On 7/12/05, Tyler Kovacs <tyler.kovacs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> I''m having trouble with string parameter types in web services. I''ve > got the following as part of structured parameter that inherits from > ActionWebServic::Base. > > member :name, :string > > Everything works great when :name contains regular ASCII text. But, > if :name contains UTF-8 encoded characters AWS dynamically changes the > return type from string to base64. Unfortunately, this confuses SOAP > clients that have created services from the advertised WSDL file that > told the client to expect a regular string type.Hi Tyler, Could you send me (off-list perhaps) the UTF-8 string you''re having problems with? It would be useful if I could use it to test, and add it to the test suite once resolved. I haven''t intentionally converted to base64 if the string contains binary characters, so this isn''t expected behaviour :) Thanks Leon
leon breedt
2005-Jul-13 10:47 UTC
Re: utf-8 string return types in action web services -- HEADSUP
On 7/13/05, Tyler Kovacs <tyler.kovacs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> When the name field contains only normal ASCII chars, it is returned as > type xsd:string but when it contains non-ASCII characters it gets > encoded as base64.Hi Tyler, This was a really interesting problem, thanks for reporting it :) As it turns out, SOAP4R validates the contents of strings before mapping them to the appropriate SOAP type. To do this, it checks the contents of the $KCODE global variable, to determine what the encoding of Ruby strings *in your application* is. The default (on most systems, unless compiled with different options) is US-ASCII. Of course, your string was no longer valid US-ASCII, so its validation failed, and SOAP4R falls back to using Base64 to transport the string. The fix, as it turns out, is quite simple: Either start up the Ruby interpreter with the -Ku command-line option, or, set $KCODE=''UTF8'' in your config/environment.rb. I prefer the latter approach, its less work (don''t have to go modifying all your scripts). Once you''ve set this, you''ll need to be sure all your strings are valid UTF-8 before returning them from your ActionWebService methods if you don''t want this problem to occur again. The changeset is http://dev.rubyonrails.com/changeset/1822 As you''ll see, I''ve also updated AWS to always return UTF-8, whereas before it wasn''t using a reliable algorithm for the response encoding.> I encountered some other issues during development, but I don''t want to > pester you with more than one issue at a time. I''d be happy to file > them as bugs/feature requests...Please do, if you still experience problems, that will probably result in a faster response, as the Rails list is quite high volume and I may miss something :) Thanks, Leon
leon breedt
2005-Jul-14 00:04 UTC
Re: utf-8 string return types in action web services -- HEADSUP
On 7/14/05, Tyler Kovacs <tyler.kovacs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> "Once you''ve set this, you''ll need to be sure all your strings are > valid UTF-8 before returning them from your ActionWebService methods > if you don''t want this problem to occur again."I''ve updated http://dev.rubyonrails.com/ticket/1728 with my comments and added an example snippet to the ticket of how to do the conversion, assuming your data is all ISO-8859-1 (as it appears to be). (It will throw an exception if the input string is malformed). It seems either your database is encoding the data as ISO-8859-1 itself, or it was mangled into ISO-8859-1 before it entered the database, and the database is just storing the raw bytes. See http://www.w3.org/International/questions/qa-forms-utf-8 for a discussion on encoding and forms, if the data was submitted by a form, as you''ll really want to fix it at the source to normalize it into UTF-8 in the database and not rely on the Iconv hack. Hope this helps! Leon
Tyler Kovacs
2005-Jul-14 01:05 UTC
Re: utf-8 string return types in action web services -- HEADSUP
In going through your instructions, I came across a couple gotchas that I thought I''d share with the list. 1. MySQL 4.1 and greater allows each column to have a different charset. I had my table set to UTF-8, but somehow the columns were set to latin1. 2. I''m importing a CSV file that contains valid UTF-8 using Ruby''s csv.rb. When I examine strings read in by the CSV reader, I noticed that it is no longer in UTF-8 format so I used Iconv to convert it before loading into the database. Thanks for the tips, I really appreciate the help! Tyler On 7/13/05, leon breedt <bitserf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On 7/14/05, Tyler Kovacs <tyler.kovacs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > "Once you''ve set this, you''ll need to be sure all your strings are > > valid UTF-8 before returning them from your ActionWebService methods > > if you don''t want this problem to occur again." > I''ve updated http://dev.rubyonrails.com/ticket/1728 with my comments > and added an example snippet to the ticket of how to do the > conversion, assuming your data is all ISO-8859-1 (as it appears to > be). (It will throw an exception if the input string is malformed). > > It seems either your database is encoding the data as ISO-8859-1 > itself, or it was mangled into ISO-8859-1 before it entered the > database, and the database is just storing the raw bytes. > > See http://www.w3.org/International/questions/qa-forms-utf-8 for a > discussion on encoding and forms, if the data was submitted by a form, > as you''ll really want to fix it at the source to normalize it into > UTF-8 in the database and not rely on the Iconv hack. > > Hope this helps! > Leon >
Simen Brekken
2005-Aug-01 14:10 UTC
Re: utf-8 string return types in action web services -- HEADSUP
Hi, I''ve also been getting the same problems you are discussing now and I just had a breakthrough: - I prepared my db to use utf-8: ALTER TABLE articles CONVERT TO CHARACTER SET utf8 - Then I added $KCODE = ''UTF8'' to config/environment.rb However I still got base64 encoded data back, after reading around I decided to check if my string really _was_ valid utf-8 so I tried to inspect the string with iconv. Wierd, I didn''t have iconv installed on my windows machine so I proceeded to download the .so and .dll files. Now after just restarting the application everything with utf-8 works perfectly. So you _need_ the iconv library installed to get any utf-8 from your AWS. Hope this helped more than me! SIMEN BREKKEN / born to synthesize. Tyler Kovacs wrote:> In going through your instructions, I came across a couple gotchas > that I thought I''d share with the list. > > 1. MySQL 4.1 and greater allows each column to have a different > charset. I had my table set to UTF-8, but somehow the columns were > set to latin1. > 2. I''m importing a CSV file that contains valid UTF-8 using Ruby''s > csv.rb. When I examine strings read in by the CSV reader, I noticed > that it is no longer in UTF-8 format so I used Iconv to convert it > before loading into the database. > > Thanks for the tips, I really appreciate the help! > > Tyler > > On 7/13/05, leon breedt <bitserf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >>On 7/14/05, Tyler Kovacs <tyler.kovacs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >>>"Once you''ve set this, you''ll need to be sure all your strings are >>>valid UTF-8 before returning them from your ActionWebService methods >>>if you don''t want this problem to occur again." >> >>I''ve updated http://dev.rubyonrails.com/ticket/1728 with my comments >>and added an example snippet to the ticket of how to do the >>conversion, assuming your data is all ISO-8859-1 (as it appears to >>be). (It will throw an exception if the input string is malformed). >> >>It seems either your database is encoding the data as ISO-8859-1 >>itself, or it was mangled into ISO-8859-1 before it entered the >>database, and the database is just storing the raw bytes. >> >>See http://www.w3.org/International/questions/qa-forms-utf-8 for a >>discussion on encoding and forms, if the data was submitted by a form, >>as you''ll really want to fix it at the source to normalize it into >>UTF-8 in the database and not rely on the Iconv hack. >> >>Hope this helps! >>Leon >>
Simen Brekken
2005-Aug-01 14:21 UTC
Re: utf-8 string return types in action web services -- HEADSUP
Updating myself: You actually don''t have to perform the first step, as long as you have the icon library installed it''s automagically converted. Simen Brekken wrote:> Hi, > > I''ve also been getting the same problems you are discussing now and I > just had a breakthrough: > > - I prepared my db to use utf-8: > > ALTER TABLE articles CONVERT TO CHARACTER SET utf8 > > - Then I added $KCODE = ''UTF8'' to config/environment.rb > > However I still got base64 encoded data back, after reading around I > decided to check if my string really _was_ valid utf-8 so I tried to > inspect the string with iconv. Wierd, I didn''t have iconv installed on > my windows machine so I proceeded to download the .so and .dll files. > Now after just restarting the application everything with utf-8 works > perfectly. > > So you _need_ the iconv library installed to get any utf-8 from your > AWS. Hope this helped more than me! > > > SIMEN BREKKEN / born to synthesize. > > Tyler Kovacs wrote: > >> In going through your instructions, I came across a couple gotchas >> that I thought I''d share with the list. >> >> 1. MySQL 4.1 and greater allows each column to have a different >> charset. I had my table set to UTF-8, but somehow the columns were >> set to latin1. >> 2. I''m importing a CSV file that contains valid UTF-8 using Ruby''s >> csv.rb. When I examine strings read in by the CSV reader, I noticed >> that it is no longer in UTF-8 format so I used Iconv to convert it >> before loading into the database. >> >> Thanks for the tips, I really appreciate the help! >> >> Tyler >> >> On 7/13/05, leon breedt >> <bitserf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >>> On 7/14/05, Tyler Kovacs >>> <tyler.kovacs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> >>>> "Once you''ve set this, you''ll need to be sure all your strings are >>>> valid UTF-8 before returning them from your ActionWebService methods >>>> if you don''t want this problem to occur again." >>> >>> >>> I''ve updated http://dev.rubyonrails.com/ticket/1728 with my comments >>> and added an example snippet to the ticket of how to do the >>> conversion, assuming your data is all ISO-8859-1 (as it appears to >>> be). (It will throw an exception if the input string is malformed). >>> >>> It seems either your database is encoding the data as ISO-8859-1 >>> itself, or it was mangled into ISO-8859-1 before it entered the >>> database, and the database is just storing the raw bytes. >>> >>> See http://www.w3.org/International/questions/qa-forms-utf-8 for a >>> discussion on encoding and forms, if the data was submitted by a form, >>> as you''ll really want to fix it at the source to normalize it into >>> UTF-8 in the database and not rely on the Iconv hack. >>> >>> Hope this helps! >>> Leon >>>
It''s been an uphill battle but I now have the definitive answer to the life, the universe and everything - all neatly formatted in UTF-8: 1. iconv is teh cool, you need it (or the character detection in SOAP won''t work and it will just presume it''s a dirty string and BASE64-molest your strings) 2. In your application.rb you need the following: before_filter :configure_charsets def configure_charsets @response.headers[''Content-Type''] = ''text/html; charset=utf-8'' suppress(ActiveRecord::StatementInvalid) do ActiveRecord::Base.connection.execute ''SET NAMES UTF8'' end end ... this makes MySQL 4.1 (if you are running it switch to UTF-8 for most output) 3. For AWS you also need to instruct SOAP on what encoding to use, this is done by putting the following in environment.rb: $KCODE = ''UTF8'' SIMEN BREKKEN / puppies love utf-8 too. Simen Brekken wrote:> Updating myself: You actually don''t have to perform the first step, as > long as you have the icon library installed it''s automagically converted. > > > Simen Brekken wrote: > >> Hi, >> >> I''ve also been getting the same problems you are discussing now and I >> just had a breakthrough: >> >> - I prepared my db to use utf-8: >> >> ALTER TABLE articles CONVERT TO CHARACTER SET utf8 >> >> - Then I added $KCODE = ''UTF8'' to config/environment.rb >> >> However I still got base64 encoded data back, after reading around I >> decided to check if my string really _was_ valid utf-8 so I tried to >> inspect the string with iconv. Wierd, I didn''t have iconv installed on >> my windows machine so I proceeded to download the .so and .dll files. >> Now after just restarting the application everything with utf-8 works >> perfectly. >> >> So you _need_ the iconv library installed to get any utf-8 from your >> AWS. Hope this helped more than me! >> >> >> SIMEN BREKKEN / born to synthesize. >> >> Tyler Kovacs wrote: >> >>> In going through your instructions, I came across a couple gotchas >>> that I thought I''d share with the list. >>> >>> 1. MySQL 4.1 and greater allows each column to have a different >>> charset. I had my table set to UTF-8, but somehow the columns were >>> set to latin1. >>> 2. I''m importing a CSV file that contains valid UTF-8 using Ruby''s >>> csv.rb. When I examine strings read in by the CSV reader, I noticed >>> that it is no longer in UTF-8 format so I used Iconv to convert it >>> before loading into the database. >>> >>> Thanks for the tips, I really appreciate the help! >>> >>> Tyler >>> >>> On 7/13/05, leon breedt >>> <bitserf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> >>>> On 7/14/05, Tyler Kovacs >>>> <tyler.kovacs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>> >>>>> "Once you''ve set this, you''ll need to be sure all your strings are >>>>> valid UTF-8 before returning them from your ActionWebService methods >>>>> if you don''t want this problem to occur again." >>>> >>>> >>>> >>>> I''ve updated http://dev.rubyonrails.com/ticket/1728 with my comments >>>> and added an example snippet to the ticket of how to do the >>>> conversion, assuming your data is all ISO-8859-1 (as it appears to >>>> be). (It will throw an exception if the input string is malformed). >>>> >>>> It seems either your database is encoding the data as ISO-8859-1 >>>> itself, or it was mangled into ISO-8859-1 before it entered the >>>> database, and the database is just storing the raw bytes. >>>> >>>> See http://www.w3.org/International/questions/qa-forms-utf-8 for a >>>> discussion on encoding and forms, if the data was submitted by a form, >>>> as you''ll really want to fix it at the source to normalize it into >>>> UTF-8 in the database and not rely on the Iconv hack. >>>> >>>> Hope this helps! >>>> Leon >>>>
Simen Brekken
2005-Aug-10 20:32 UTC
Re: utf-8 string return types in action web services UPDATED with lib/utf8.rb
To make this behaviour easier to integrate in future applications, I came up with a small library that sets everything up for UTF-8 goodness: # lib/utf8.rb $KCODE = ''UTF8'' class ActionController::Base before_filter :configure_charsets def configure_charsets @response.headers[''Content-Type''] = ''text/html; charset=utf-8'' suppress(ActiveRecord::StatementInvalid) do ActiveRecord::Base.connection.execute ''SET NAMES UTF8'' end end end ... then in your environment.rb: require ''utf8'' SIMEN BREKKEN / born to synthesize. Simen Brekken wrote:> It''s been an uphill battle but I now have the definitive answer to the > life, the universe and everything - all neatly formatted in UTF-8: > > 1. iconv is teh cool, you need it (or the character detection in SOAP > won''t work and it will just presume it''s a dirty string and > BASE64-molest your strings) > > 2. In your application.rb you need the following: > > before_filter :configure_charsets > > def configure_charsets > @response.headers[''Content-Type''] = ''text/html; charset=utf-8'' > suppress(ActiveRecord::StatementInvalid) do > ActiveRecord::Base.connection.execute ''SET NAMES UTF8'' > end > end > > ... this makes MySQL 4.1 (if you are running it switch to UTF-8 for most > output) > > 3. For AWS you also need to instruct SOAP on what encoding to use, this > is done by putting the following in environment.rb: > > $KCODE = ''UTF8'' > > > SIMEN BREKKEN / puppies love utf-8 too. > > Simen Brekken wrote: > >> Updating myself: You actually don''t have to perform the first step, as >> long as you have the icon library installed it''s automagically converted. >> >> >> Simen Brekken wrote: >> >>> Hi, >>> >>> I''ve also been getting the same problems you are discussing now and I >>> just had a breakthrough: >>> >>> - I prepared my db to use utf-8: >>> >>> ALTER TABLE articles CONVERT TO CHARACTER SET utf8 >>> >>> - Then I added $KCODE = ''UTF8'' to config/environment.rb >>> >>> However I still got base64 encoded data back, after reading around I >>> decided to check if my string really _was_ valid utf-8 so I tried to >>> inspect the string with iconv. Wierd, I didn''t have iconv installed >>> on my windows machine so I proceeded to download the .so and .dll >>> files. Now after just restarting the application everything with >>> utf-8 works perfectly. >>> >>> So you _need_ the iconv library installed to get any utf-8 from your >>> AWS. Hope this helped more than me! >>> >>> >>> SIMEN BREKKEN / born to synthesize. >>> >>> Tyler Kovacs wrote: >>> >>>> In going through your instructions, I came across a couple gotchas >>>> that I thought I''d share with the list. >>>> >>>> 1. MySQL 4.1 and greater allows each column to have a different >>>> charset. I had my table set to UTF-8, but somehow the columns were >>>> set to latin1. >>>> 2. I''m importing a CSV file that contains valid UTF-8 using Ruby''s >>>> csv.rb. When I examine strings read in by the CSV reader, I noticed >>>> that it is no longer in UTF-8 format so I used Iconv to convert it >>>> before loading into the database. >>>> >>>> Thanks for the tips, I really appreciate the help! >>>> >>>> Tyler >>>> >>>> On 7/13/05, leon breedt >>>> <bitserf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>> >>>>> On 7/14/05, Tyler Kovacs >>>>> <tyler.kovacs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>>> >>>>>> "Once you''ve set this, you''ll need to be sure all your strings are >>>>>> valid UTF-8 before returning them from your ActionWebService methods >>>>>> if you don''t want this problem to occur again." >>>>> >>>>> >>>>> >>>>> >>>>> I''ve updated http://dev.rubyonrails.com/ticket/1728 with my comments >>>>> and added an example snippet to the ticket of how to do the >>>>> conversion, assuming your data is all ISO-8859-1 (as it appears to >>>>> be). (It will throw an exception if the input string is malformed). >>>>> >>>>> It seems either your database is encoding the data as ISO-8859-1 >>>>> itself, or it was mangled into ISO-8859-1 before it entered the >>>>> database, and the database is just storing the raw bytes. >>>>> >>>>> See http://www.w3.org/International/questions/qa-forms-utf-8 for a >>>>> discussion on encoding and forms, if the data was submitted by a form, >>>>> as you''ll really want to fix it at the source to normalize it into >>>>> UTF-8 in the database and not rely on the Iconv hack. >>>>> >>>>> Hope this helps! >>>>> Leon >>>>>_______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
i have done that, and when i edit or create or show some records everything works fine, but when i want to export with pdf-writer or export it to excel, and import from excel i get the wierd characters again... can anyone help me with that?