Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] CSS nth-of-type and friends does not match browser behavior #3238

Open
flavorjones opened this issue Jun 19, 2024 · 0 comments
Open

[bug] CSS nth-of-type and friends does not match browser behavior #3238

flavorjones opened this issue Jun 19, 2024 · 0 comments
Milestone

Comments

@flavorjones
Copy link
Member

flavorjones commented Jun 19, 2024

Please describe the bug

While working on #2560 (re-implementing the CSS-selector-to-XPath-expression subsystem), I realized that the current code has incorrectly implemented the pseudo-class functions nth-of-type et al.

This should be fixed by the ongoing work, but I wanted to capture this bug for posterity (and to help support the value of the rewrite!).

Help us reproduce what you're seeing

#! /usr/bin/env ruby

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "nokogiri", path: "."
end

# A table with 20 rows, each row has one class in common and a rotating set of five other classes.
html = <<~HTML
  <html>
    <body>
      <div id="container-1">
        <span class="common first-in-5n" >1</span>
        <b    class="common second-in-5n">2</b>
        <i    class="common third-in-5n" >3</i>
        <span class="common fourth-in-5n">4</span>
        <b    class="common fifth-in-5n" >5</b>
        <i    class="common first-in-5n" >6</i>
        <span class="common second-in-5n">7</span>
        <b    class="common third-in-5n" >8</b>
        <i    class="common fourth-in-5n">9</i>
        <span class="common fifth-in-5n" >10</span>
        <b    class="common first-in-5n" >11</b>
        <i    class="common second-in-5n">12</i>
        <span class="common third-in-5n" >13</span>
        <b    class="common fourth-in-5n">14</b>
        <i    class="common fifth-in-5n" >15</i>

        <span class="common second-in-5n">99</span>
        <span class="common second-in-5n">100 here just to prove a point</span>
      </div>
    </body>
  </html>
HTML

doc = Nokogiri::HTML5(html)

# In a browser, "#container-1 span.second-in-5n:nth-of-type(3)" returns:
#
#   <span class="common second-in-5n">7</span>
#
# This is the correct behavior that Nokogiri should exhibit. It's asking for the third "span" but
# only if it ALSO has the class `second-in-5n`. That is, the returned node must match BOTH
# conditions.

# But Nokogiri returns a different row:
doc.css("#container-1 span.second-in-5n:nth-of-type(3)").to_html
# => "<span class=\"common second-in-5n\">100 here just to prove a point</span>"

# Let's look at the XPath generated by Nokogiri ...
Nokogiri::CSS.xpath_for("#container-1 span.second-in-5n:nth-of-type(3)")
# => ["//*[@id='container-1']//span[contains(concat(' ',normalize-space(@class),' '),' second-in-5n ')][position()=3]"]
#
# Ruh roh! Because this uses a separate predicate for the position check:
#
#   [position()=3]"]
#
# this query will return the third "span.second-in-5n"!

# Let's try to fix this by using "and" to combine the conditions into a single complex predicate.
#
# Here's the original expression for easy comparison:
# 
#                  "//*[@id='container-1']//span[contains(concat(' ',normalize-space(@class),' '),' second-in-5n ')][position()=3]"
fixed_xpath_expr = "//*[@id='container-1']//span[contains(concat(' ',normalize-space(@class),' '),' second-in-5n ') and position()=3]"

doc.xpath(fixed_xpath_expr).to_html
# => "<span class=\"common second-in-5n\">7</span>"
#
# Woot! This results in the correct behavior.

# For comparison, this bug doesn't appear to affect `nth-child` queries. For example:
#
#   document.querySelectorAll('#container-1 span.second-in-5n:nth-child(7)')
#
# returns
#
#   <span class="common second-in-5n">7</span>
#
# and so does Nokogiri:
doc.css('#container-1 span.second-in-5n:nth-child(7)').to_html
# => "<span class=\"common second-in-5n\">7</span>"

# The XPath generated by Nokogiri is:
Nokogiri::CSS.xpath_for("#container-1 span.second-in-5n:nth-child(7)")
# => ["//*[@id='container-1']//span[contains(concat(' ',normalize-space(@class),' '),' second-in-5n ') and count(preceding-sibling::*)=6]"]
#
# This is right! It counts the number of overall siblings, and puts the conditional into the single predicate.

This will be fixed by the work being done for #2560.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant