I am attempting a text editor using wxRuby. I''m having character issues. Strings are not binary-safe. Some characters are not allowed. - newline / line feed (\n), tab (\t) are displayed - carraige return (\r) is stripped - Other control characters and high-ascii cause control values to become empty. Affected controls include: Wx::TextCtrl, Wx::StaticText, Wx::Clipboard, et al. Most text editors allow editing of recognized characters (in whatever specified format without disrupting unknown characters. Known control/special characters are hidden unless usually printable (CR, LF, HT). When "highlight special characters" is enabled, they are displayed as their abbr or icon: "[CR]","[LF]","[HT]","[NUL]". For example, editors Scite, ConTEXT, et al. The only solution I can come come up with is to implement folding using XML or similar. The real document is sanitized before displayed in the control. When the control is edited, a compare is performed and the real document is modified accordingly. The Wx::Clipboard is even worse. If I copy text from non UTF-8, the original encoding/binary is lost. Ug. Thoughts? Thanks in advance. -AH -- Posted via http://www.ruby-forum.com/.
Hi Alexander Hawley wrote:> I am attempting a text editor using wxRuby. I''m having character issues. > > Strings are not binary-safe. > > Some characters are not allowed. >Can you explain what you mean - when they are being set from Ruby to the control (eg TextCtrl#value=) or retrieving user input from the GUI? Ideally a short test case or example bit of code which shows what''s going on.> - newline / line feed (\n), tab (\t) are displayed > - carraige return (\r) is stripped > - Other control characters and high-ascii cause control values to become > empty. > > Affected controls include: Wx::TextCtrl, Wx::StaticText, Wx::Clipboard, > et al. > > Most text editors allow editing of recognized characters (in whatever > specified format without disrupting unknown characters. >How it''s expected to work: all strings passed into wxRuby should be UTF-8; all strings returned from wxRuby will be UTF-8. If you have data (eg read from a file/) in another encoding, you need to use Iconv or similar to fix that.> Known control/special characters are hidden unless usually printable > (CR, LF, HT). When "highlight special characters" is enabled, they are > displayed as their abbr or icon: "[CR]","[LF]","[HT]","[NUL]". > > For example, editors Scite, ConTEXT, et al. >If you want a code-oriented text editor, you probably want to use Wx::StyledTextCtrl as the base for your text control. It''s the same editing/highlighting component used by Scite.> The only solution I can come come up with is to implement folding using > XML or similar. The real document is sanitized before displayed in the > control. When the control is edited, a compare is performed and the real > document is modified accordingly. > > The Wx::Clipboard is even worse. If I copy text from non UTF-8, the > original encoding/binary is lost. Ug.A short test case would really help here: what DataObject, what platform, what version etc. a
Thanks for your quick response!> Strings are not binary-safe. > Some characters are not allowed. > Can you explain what you mean - when they are being set from Ruby to the control (eg TextCtrl#value=) or retrieving user input from the GUI? > Ideally a short test case or example bit of code which shows what''s going on.This script shows the different behavior for starting Wx::TextCtrl values. ........................................ require "wx" class TheApp < Wx::App def on_init frame = Wx::Frame.new(nil, -1, "TheApp") sizer = Wx::FlexGridSizer.new(2,4) string = "fooboo" # string = "foo\xC2\xA5boo" # string = "foo\xE2\x90\x80boo" # string = "foo\x1Fboo" # string = "foo\x0boo" # string = "foo\x95boo" puts "ruby string:\t#{string.length} #{string.inspect} #{string.unpack(''H2'' * string.length).join(" ").upcase}" @text = Wx::TextCtrl.new(frame, -1, string, :style => Wx::TE_MULTILINE) sizer.add(@text, 0, Wx::GROW|Wx::ALL, 4) value = @text.get_value puts "starting value:\t#{value.length} #{value.inspect} #{value.unpack(''H2'' * value.length).join(" ").upcase}" saveButton = Wx::Button.new(frame, -1, ''Save'') saveButton.evt_button(saveButton.get_id) { | e | on_do_save } sizer.add(saveButton, 0, Wx::ALL, 4) frame.set_sizer(sizer) sizer.set_size_hints(frame) sizer.fit(frame) frame.show end def on_do_save value = @text.get_value puts "saved value:\t#{value.length} #{value.inspect} #{value.unpack(''H2'' * value.length).join(" ").upcase}" end end TheApp.new.main_loop ........................................ C:\>ruby script.rb (different strings uncommented) ruby string: 6 "fooboo" 66 6F 6F 62 6F 6F starting value: 6 "fooboo" 66 6F 6F 62 6F 6F saved value: 6 "fooboo" 66 6F 6F 62 6F 6F ruby string: 8 "foo\302\245boo" 66 6F 6F C2 A5 62 6F 6F starting value: 8 "foo\302\245boo" 66 6F 6F C2 A5 62 6F 6F saved value: 8 "foo\302\245boo" 66 6F 6F C2 A5 62 6F 6F ruby string: 9 "foo\342\220\200boo" 66 6F 6F E2 90 80 62 6F 6F starting value: 9 "foo\342\220\200boo" 66 6F 6F E2 90 80 62 6F 6F saved value: 9 "foo\342\220\200boo" 66 6F 6F E2 90 80 62 6F 6F ruby string: 7 "foo\037boo" 66 6F 6F 1F 62 6F 6F starting value: 7 "foo\037boo" 66 6F 6F 1F 62 6F 6F saved value: 7 "foo\037boo" 66 6F 6F 1F 62 6F 6F ruby string: 7 "foo\000boo" 66 6F 6F 00 62 6F 6F starting value: 3 "foo" 66 6F 6F saved value: 3 "foo" 66 6F 6F ruby string: 7 "foo\225boo" 66 6F 6F 95 62 6F 6F starting value: 0 "" saved value: 0 "" ........................................ Some control characters work fine (e.g. \x1F). Other control characters (e.g., \x00) cause the value to be truncated before the character. Still other control characters (e.g., \x95) cause the value to be altogether empty. Is this behavior on purpose? Is there a list of which control characters do what?> If you want a code-oriented text editor, you probably want to use Wx::StyledTextCtrl as the base for your text control. It''s the same editing/highlighting component used by Scite.I guess my noob side shown through. Thanks for pointing me to that control. I guess I need to read up before I open my mouth. Let the character testing begin.> Wx::Clipboard > what DataObject, what platform, what versionWx::DF_TEXT Windows API: CF_OEMTEXT, CF_TEXT, CF_UNICODETEXT Windows XP I was just testing someone elses script. This object is complex! I suspect it''s a combination of how Windows implements clipboard data formats and the Wx UTF-8 requirement.>From tests of Windows native clipboard API calls, it seems they getthemselves confused about text display versus binary value. It seems this is a hot issue for general wxWidgets as well. Thanks. -AH -- Posted via http://www.ruby-forum.com/.
It seems Wx::StyledTextCtrl is no better on the NUL character problem. This is really weird, because Scite can definitely handle NULs. This script shows the different behavior for starting Wx::StyledTextCtrl values. ........................................ require "wx" class TheApp < Wx::App def on_init frame = Wx::Frame.new(nil, -1, "TheApp") sizer = Wx::FlexGridSizer.new(2,4) string = "foo\x00boo" puts "ruby string:\t#{string.length} #{string.inspect} #{string.unpack(''H2'' * string.length).join(" ").upcase}" # file = ''null.txt'' # fileContents = nil # File.open(file, ''rb'') { |m_file| # fileContents = m_file.read # } # puts "ruby file contents:\t#{fileContents.length} #{fileContents.inspect} #{fileContents.unpack(''H2'' * fileContents.length).join(" ").upcase}" @text = Wx::StyledTextCtrl.new(frame) @text.set_text(string) # @text.load_file(file) sizer.add(@text, 0, Wx::ALL, 4) value = @text.get_text puts "starting value:\t#{value.length} #{value.inspect} #{value.unpack(''H2'' * value.length).join(" ").upcase}" saveButton = Wx::Button.new(frame, -1, ''Save'') saveButton.evt_button(saveButton.get_id) { | e | on_do_save } sizer.add(saveButton, 0, Wx::ALL, 4) frame.set_sizer(sizer) sizer.set_size_hints(frame) sizer.fit(frame) frame.show end def on_do_save value = @text.get_text puts "saved value:\t#{value.length} #{value.inspect} #{value.unpack(''H2'' * value.length).join(" ").upcase}" end end TheApp.new.main_loop ........................................ C:\>ruby script.rb (string versus file uncommented) ruby string: 7 "foo\000boo" 66 6F 6F 00 62 6F 6F starting value: 3 "foo" 66 6F 6F saved value: 3 "foo" 66 6F 6F ruby file contents: 7 "foo\000boo" 66 6F 6F 00 62 6F 6F starting value: 3 "foo" 66 6F 6F saved value: 3 "foo" 66 6F 6F ........................................ NUL characters (e.g., \x00) cause the value to be truncated before the character. Both Wx::StyledTextCtrl#set_text and Wx::StyledTextCtrl#load_file exhibit the same problem. Thanks -AH -- Posted via http://www.ruby-forum.com/.
Alexander Hawley wrote:> This script shows the different behavior for starting Wx::TextCtrl > values. >Thanks for the sample code. It seems to me there are a couple of different issues here:> # string = "foo\x0boo" >Embedded NUL characters. At the moment the wxRuby Ruby->C++ conversion for String relies on C conventions - ie that the NUL character terminates a string. So although both Ruby and wxWidgets permit Strings with embedded NUL, they get truncated in conversion. This is a bug, and I think this it''s fairly easy to fix in the wrapping - but it''s also quite far-reaching so we need to check the byte/character counts are right, so it doesn''t cause regressions elsewhere.> # string = "foo\x95boo" >This is just isn''t a valid UTF-8 string. Presumably it makes sense in some 8-bit encoding (eg ISO-8859-1) so you need to use Iconv or similar to convert it before feeding it to wxRuby.> @text = Wx::TextCtrl.new(frame, -1, string, :style => > Wx::TE_MULTILINE) >The Wx::TextCtrl documentation states (although not very pointedly) that the only control characters permitted are a newline. I think TextCtrl is aimed only at natural language text, so it has to be StyledTextCtrl (Scintilla) here. I saw your email re STC, and have been trying something similar here. I''m not sure, but I think STC (wxWidgets'' wrapping of Scintilla) is making assumptions about a NUL character terminating a string; even if I pass it the right stuff from Ruby, it''s still truncating it.>> Wx::Clipboard >> what DataObject, what platform, what version >> > > Wx::DF_TEXT > Windows API: CF_OEMTEXT, CF_TEXT, CF_UNICODETEXT > Windows XP > > I was just testing someone elses script. This object is complex! I > suspect it''s a combination of how Windows implements clipboard data > formats and the Wx UTF-8 requirement. > > >From tests of Windows native clipboard API calls, it seems they get > themselves confused about text display versus binary value. > > It seems this is a hot issue for general wxWidgets as well. > >Yes, getting the clipboard to work across platforms is a messy business, because each platform uses different native encodings and a different scheme to denote data types. What I find on OS X is that the raw data is UTF16, but for higher-level calls, wxRuby & wxWidgets will do the conversion. Even if I place DF_TEXT on the clipboard, I can only retrieve DF_UNICODETEXT. This isn''t resolved in other Ruby GUI libraries as well - of the other two most popular libraries, GNOME2 doesn''t support Windows clipboard at all, and Shoes doesn''t even try to offer that GUI convention. Thanks for bringing this up. I''ll see what we can fix for the next release, but I have a hunch it may not be possible to get it 100% perfect b/c of all the other components involved. Some test cases may really help, if possible, and I may focus on getting things most correct for Ruby 1.9. An example test for Clipboard is here: http://wxruby.rubyforge.org/svn/trunk/wxruby2/tests/test_clipboard.rb alex
Alex Fenton wrote:> Alexander Hawley wrote: >> # string = "foo\x0boo" >> > > Embedded NUL characters. At the moment the wxRuby Ruby->C++ conversion > for String relies on C conventions - ie that the NUL character > terminates a string. So although both Ruby and wxWidgets permit > Strings with embedded NUL, they get truncated in conversion.I had a closer look at this. We can tweak the wrappings so that embedded NUL characters are preserved as they are passed between Ruby and wxWidgets. However, the wxWidgets wrapping around Scintilla makes the assumption that strings are terminated by NUL - so, even a NUL character is entered, it can''t be retrieved. See http://lists.wxwidgets.org/pipermail/wxpython-users/2004-September/031993.html I had a look at adding special methods to bypass this problem, but no joy yet. What I suggest as a workaround is gsub-bing the string as it goes in and out, and replacing NUL with the unicode character symbol-for-null. string.gsub(/\x00/, "\xE2\x90\x80") By the way, other control characters are displayed as you describe in Scite. I''ve filed a bug to track this: http://rubyforge.org/tracker/index.php?func=detail&aid=23814&group_id=35&atid=218 alex
> However, the wxWidgets wrapping around Scintilla makes the assumption that strings are terminated by NUL - so, even a NUL character is entered, it can''t be retrieved.A dependency/assumption in how wxWidgets uses C, I suspect. Scite must not have the same C dependencies/assumptions as wxWidgets. It can display most control characters okay.> What I suggest as a workaround is gsub-bing the string as it goes in and out, and replacing NUL with the unicode character symbol-for-null.Already there. I''m already doing a string search of existing not safe for UTF-8 characters.> By the way, other control characters are displayed as you describe in Scite.Yup. I got the EOL display, whitespace display, control display methods working nicely. Thanks for all the work looking into it. And the formal bug. -AH -- Posted via http://www.ruby-forum.com/.