[#1378] differences between Module and Class ? — Mathieu Bouchard <matju@...>

25 messages 2003/08/11
[#1387] Re: differences between Module and Class ? — matz@... (Yukihiro Matsumoto) 2003/08/12

Hi,

[#1442] Re: differences between Module and Class ? — Mathieu Bouchard <matju@...> 2003/08/21

[#1406] _id2ref bug? — Ryan Pavlik <rpav@...>

While debugging some caching code, I've come across a segfault related

22 messages 2003/08/14
[#1407] Re: _id2ref bug? — matz@... (Yukihiro Matsumoto) 2003/08/14

Hi,

[#1413] Re: _id2ref bug? (REPRODUCED, short) — Ryan Pavlik <rpav@...> 2003/08/14

On Fri, 15 Aug 2003 01:57:18 +0900

Broken REXML in Ruby 1.8

From: Alexander Bokovoy <a.bokovoy@...>
Date: 2003-08-06 13:33:29 UTC
List: ruby-core #1364
Greetings!

REXML in Ruby 1.8 is broken in UTF-16 support. Fix is simple -- arrayEnc
-> array_enc and looks like just copy&paste bug.

Another 'bug' or rather misfeature is inability to utilise iconv(3) when
it is available and there is no native support for specified encoding in
REXML. I made a simple patch for it, so that REXML will dynamically look
up encoding conversion through iconv(3) if there is no native support for
it.

Patch is attached. Sean, initial patch I've sent to you was incomplete,
please use this one.

Also attached is simple and naive test utility which performs testing of all
encodings supported through iconv(3) and reports about failures.

On my glibc 2.2.5+ system it fails at 182 encodings out of 918 which is OK
because most of failures are actually for non-ASCII compatible charsets.

-- 
/ Alexander Bokovoy
---
It is all right to hold a conversation, but you should let go of it
now and then.
		-- Richard Armour

Attachments (2)

xmltest.rb (1.16 KB, text/x-ruby)
require 'rexml/document'

UnixCharsets = open("| iconv -l") do |f|
   f.readlines[5..-1].collect { |x| x.sub(/\/\/\n/,"").delete(' ') }
end

DATA = <<END
<?xml version="1.0" encoding='ENC'?>
<Ruby xmlns="http://www.ruby-lang.org/ruby/1.8">
</Ruby>
END

broken_encodings = 0
UnixCharsets.each do |enc|
	begin
		puts "Testing encoding #{enc}"
		data = DATA.dup
		data[/ENC/] = enc
		REXML::Document.new(data).root
	rescue REXML::ParseException => e
		broken_encodings += 1
		puts "Encoding #{enc} does not work with REXML: #{e.message}"
	rescue Errno::EINVAL => e
		broken_encodings += 1
		puts "Encoding #{enc} does not work with REXML: #{e.message}"
	rescue ArgumentError => e
		broken_encodings += 1
		puts "Encoding #{enc} does not work with REXML: #{e.message}"
	rescue NoMethodError => e
		broken_encodings += 1
		puts "Encoding #{enc} does not work with REXML: #{e.message}"
	end
end

if broken_encodings > 0 
	puts "There were #{broken_encodings} encoding failures out of #{UnixCharsets.size} plus some REXML internal encodings"
else
	puts "There were no encoding failures"
end

puts "Full list of registered encodings in REXML:"
puts REXML::Encoding::ENCODING_CLAIMS.values.join(', ')
ruby-1.8-rexml.patch (1.89 KB, text/x-diff)
--- ./encoding.rb.orig	2003-08-06 14:41:20 +0300
+++ ./encoding.rb	2003-08-06 16:09:50 +0300
@@ -22,7 +22,30 @@
 		def encoding=( enc )
                 	enc = UTF_8 unless enc
                 	@encoding = enc.upcase
-                	require "rexml/encodings/#@encoding" unless @encoding == UTF_8
+			begin
+                		require "rexml/encodings/#@encoding" unless @encoding == UTF_8
+			rescue LoadError => e
+				# Encoding file is absent, try to use iconv if possible
+				begin
+					dc_enc = encoding.tr('-:/.()', '______').downcase
+					if ! self.respond_to? "to_#{dc_enc}"
+						require "iconv"
+						REXML::Encoding.claim(encoding)
+						eval <<END
+							def to_#{dc_enc} content
+								return Iconv::iconv(encoding, "utf-8", content).join('')
+							end
+						
+							def from_#{dc_enc} str
+								return Iconv::iconv("utf-8", encoding, str).join('')
+							end
+END
+					end
+				rescue LoadError => e
+					raise LoadError, e.message + "\nTried to load encoding through iconv but the latter was unavailable"
+				end
+				
+			end
 		end
 
 		def check_encoding str
--- ./source.rb.orig	2003-06-10 04:31:05 +0300
+++ ./source.rb	2003-08-06 16:10:22 +0300
@@ -40,8 +40,8 @@
 		def encoding=(enc)
 			super
 			eval <<-EOL
-				alias :encode :to_#{encoding.tr('-', '_').downcase}
-				alias :decode :from_#{encoding.tr('-', '_').downcase}
+				alias :encode :to_#{encoding.tr('-:/.()', '_____').downcase}
+				alias :decode :from_#{encoding.tr('-:/.()', '_____').downcase}
 			EOL
 			@line_break = encode( '>' )
 			if enc != UTF_8
--- ./encodings/UTF-16.rb.orig	2003-08-06 16:11:27 +0300
+++ ./encodings/UTF-16.rb	2003-08-06 16:23:00 +0300
@@ -18,7 +18,7 @@
 		def from_utf_16(str)
 			array_enc=str.unpack('C*')
 			array_utf8 = []
-			2.step(arrayEnc.size-1, 2){|i| 
+			2.step(array_enc.size-1, 2){|i| 
 				array_utf8 << (array_enc.at(i+1) + array_enc.at(i)*0x100)
 			}
 			array_utf8.pack('U*')

In This Thread

Prev Next