Skip to content

Commit

Permalink
Fix ASCII/UTF-8 error. (#38)
Browse files Browse the repository at this point in the history
* Add reproducible test for UTF-8/ASCII error.

* Change encoding according to `xml` tag.

* Add changelog entry.

* Add helper method to parse XML encoding.
  • Loading branch information
reitermarkus authored and mattbrictson committed Apr 24, 2017
1 parent 1867b05 commit 39f3518
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 6 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
https://github.com/patsplat/plist/compare/dece870...HEAD

* Your contribution here!
* Fix ASCII/UTF-8 error (https://github.com/patsplat/plist/pull/38).
* Fix Fixnum, Bignum deprecations in Ruby 2.4
* Fix unused variable `e` warning

Expand Down
31 changes: 27 additions & 4 deletions lib/plist/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,10 @@ def initialize( plist_data_or_file, listener )
end

TEXT = /([^<]+)/
XMLDECL_PATTERN = /<\?xml\s+(.*?)\?>*/um
DOCTYPE_PATTERN = /\s*<!DOCTYPE\s+(.*?)(\[|>)/um
COMMENT_START = /\A<!--/u
COMMENT_END = /.*?-->/um
XMLDECL_PATTERN = /<\?xml\s+(.*?)\?>*/m
DOCTYPE_PATTERN = /\s*<!DOCTYPE\s+(.*?)(\[|>)/m
COMMENT_START = /\A<!--/
COMMENT_END = /.*?-->/m


def parse
Expand All @@ -91,7 +91,14 @@ def parse
if @scanner.scan(COMMENT_START)
@scanner.scan(COMMENT_END)
elsif @scanner.scan(XMLDECL_PATTERN)
encoding = parse_encoding_from_xml_declaration(@scanner[1])
next if encoding.nil?

# use the specified encoding for the rest of the file
next unless String.method_defined?(:force_encoding)
@scanner.string = @scanner.rest.force_encoding(encoding)
elsif @scanner.scan(DOCTYPE_PATTERN)
next
elsif @scanner.scan(start_tag)
@listener.tag_start(@scanner[1], nil)
if (@scanner[2] =~ /\/$/)
Expand All @@ -106,6 +113,22 @@ def parse
end
end
end

private

def parse_encoding_from_xml_declaration(xml_declaration)
return unless defined?(Encoding)

xml_encoding = xml_declaration.match(/(?:\A|\s)encoding=(?:"(.*?)"|'(.*?)')(?:\s|\Z)/)

return if xml_encoding.nil?

begin
Encoding.find(xml_encoding[1])
rescue ArgumentError
nil
end
end
end

class PTag
Expand Down
8 changes: 8 additions & 0 deletions test/assets/non-ascii-but-utf-8.plist
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>non-ascii-but-utf8-character</key>
<string>™</string>
</dict>
</plist>
14 changes: 12 additions & 2 deletions test/test_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,16 @@ def test_filename_or_xml_is_stringio
assert_nil data
end

end
def test_filename_or_xml_is_encoded_with_ascii_8bit
# skip if Ruby version does not support String#force_encoding
return unless String.method_defined?(:force_encoding)

xml = File.read("test/assets/non-ascii-but-utf-8.plist")
xml.force_encoding("ASCII-8BIT")

__END__
assert_nothing_raised do
data = Plist::parse_xml(xml)
assert_equal("\u0099", data["non-ascii-but-utf8-character"])
end
end
end

0 comments on commit 39f3518

Please sign in to comment.