Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the css selector #15

Open
Krugloff opened this issue Oct 23, 2024 · 3 comments
Open

Fix the css selector #15

Krugloff opened this issue Oct 23, 2024 · 3 comments
Labels
invalid This doesn't seem right

Comments

@Krugloff
Copy link

Krugloff commented Oct 23, 2024

The ~ selector is not working as expected.

I'm trying to extract only the blocks that appear before the .more-news element. This works in the browser but doesn't behave as expected in my code.

Environment

  • OS: MacOS Ventura 13.5
  • Ruby version: 2.7.8
  • Nokolexbor version: 0.5.4

Additional context

test_string = <<-STR
<div>
<div class="newscard position1"></div>
<div class="newscard position2"></div>
<div class="more-news"></div>
<div class="newscard position3"></div>
<div class="newscard position4"></div>
<div>
STR

require 'nokolexbor'

doc = Nokolexbor::HTML(test_string)
doc.css(".newscard:not(.more-news ~ .newscard)").count # => 4 (should be 2)

image
image

@Krugloff Krugloff added the invalid This doesn't seem right label Oct 23, 2024
@lexborisov
Copy link

lexborisov commented Oct 23, 2024

@Krugloff

I think we should just update the lexbor sources in nokolexbor.

In lexbor:

<div><div class="newscard position1"></div><div class="newscard position2"></div><div class="more-news"></div><div class="newscard position3"></div><div class="newscard position4"></div><div></div></div>

Selectors: .newscard:not(.more-news ~ .newscard)

1) <div class="newscard position1">
2) <div class="newscard position2">
Count: 2

@zyc9012
Copy link
Collaborator

zyc9012 commented Dec 17, 2024

@lexborisov The reason I didn't update lexbor is that the new versions were not as performant as the current one.

I just did the benchmark again.

Lexbor at b2c0a61 Lexbor at 9677d13 New vs Old
parse (367 KB) 957.4 i/s 1028.7 i/s 1.07x faster
parse (1100 B) 64706.6 i/s 65730.1 i/s 1.01x faster
at_css 84346.4 i/s 49911.8 i/s 1.68x slower
css 10116.3 i/s 8010.6 i/s 1.26x slower
Raw data:

Lexbor at b2c0a61

Warming up --------------------------------------
Nokolexbor parse (367 KB)
                        96.000  i/100ms
Nokogiri parse (367 KB)
                        19.000  i/100ms
Calculating -------------------------------------
Nokolexbor parse (367 KB)
                        957.363  (± 0.5%) i/s -     19.200k in  20.055619s
Nokogiri parse (367 KB)
                        212.086  (±12.3%) i/s -      4.180k in  20.088497s

Comparison:
Nokolexbor parse (367 KB):      957.4 i/s
Nokogiri parse (367 KB):      212.1 i/s - 4.51x  (± 0.00) slower

Warming up --------------------------------------
Nokolexbor parse (1100 B)
                         6.479k i/100ms
Nokogiri parse (1100 B)
                         2.994k i/100ms
Calculating -------------------------------------
Nokolexbor parse (1100 B)
                         64.707k (± 1.4%) i/s -      1.296M in  20.029948s
Nokogiri parse (1100 B)
                         28.328k (± 3.8%) i/s -    565.866k in  20.004329s

Comparison:
Nokolexbor parse (1100 B):    64706.6 i/s
Nokogiri parse (1100 B):    28327.9 i/s - 2.28x  (± 0.00) slower

Warming up --------------------------------------
   Nokolexbor at_css     8.326k i/100ms
     Nokogiri at_css    13.000  i/100ms
Calculating -------------------------------------
   Nokolexbor at_css     84.346k (± 0.8%) i/s -      1.690M in  20.039873s
     Nokogiri at_css    139.870  (± 0.0%) i/s -      2.808k in  20.076093s

Comparison:
   Nokolexbor at_css:    84346.4 i/s
     Nokogiri at_css:      139.9 i/s - 603.04x  (± 0.00) slower

Warming up --------------------------------------
      Nokolexbor css     1.019k i/100ms
        Nokogiri css    14.000  i/100ms
Calculating -------------------------------------
      Nokolexbor css     10.116k (± 1.1%) i/s -    202.781k in  20.047377s
        Nokogiri css    139.903  (± 0.0%) i/s -      2.800k in  20.014070s

Comparison:
      Nokolexbor css:    10116.3 i/s
        Nokogiri css:      139.9 i/s - 72.31x  (± 0.00) slower

Lexbor at 9677d13

Warming up --------------------------------------
Nokolexbor parse (367 KB)
                       102.000  i/100ms
Nokogiri parse (367 KB)
                        19.000  i/100ms
Calculating -------------------------------------
Nokolexbor parse (367 KB)
                          1.029k (± 1.0%) i/s -     20.604k in  20.030938s
Nokogiri parse (367 KB)
                        211.331  (±12.3%) i/s -      4.161k in  20.066281s

Comparison:
Nokolexbor parse (367 KB):     1028.7 i/s
Nokogiri parse (367 KB):      211.3 i/s - 4.87x  (± 0.00) slower

Warming up --------------------------------------
Nokolexbor parse (1100 B)
                         6.654k i/100ms
Nokogiri parse (1100 B)
                         2.969k i/100ms
Calculating -------------------------------------
Nokolexbor parse (1100 B)
                         65.730k (± 0.7%) i/s -      1.317M in  20.044978s
Nokogiri parse (1100 B)
                         28.221k (± 3.8%) i/s -    564.110k in  20.018026s

Comparison:
Nokolexbor parse (1100 B):    65730.1 i/s
Nokogiri parse (1100 B):    28220.7 i/s - 2.33x  (± 0.00) slower

Warming up --------------------------------------
   Nokolexbor at_css     4.984k i/100ms
     Nokogiri at_css    13.000  i/100ms
Calculating -------------------------------------
   Nokolexbor at_css     49.912k (± 0.5%) i/s -      1.002M in  20.071530s
     Nokogiri at_css    132.617  (± 0.8%) i/s -      2.665k in  20.095940s

Comparison:
   Nokolexbor at_css:    49911.8 i/s
     Nokogiri at_css:      132.6 i/s - 376.36x  (± 0.00) slower

Warming up --------------------------------------
      Nokolexbor css   806.000  i/100ms
        Nokogiri css    13.000  i/100ms
Calculating -------------------------------------
      Nokolexbor css      8.011k (± 1.3%) i/s -    160.394k in  20.026210s
        Nokogiri css    132.339  (± 0.8%) i/s -      2.652k in  20.039860s

Comparison:
      Nokolexbor css:     8010.6 i/s
        Nokogiri css:      132.3 i/s - 60.53x  (± 0.00) slower

The newer version shows a small improvement in parsing but a big downgrade in selecting (at_css and css). Actually, the downgrade was introduced since this commit lexbor/lexbor@9677d13. I think CSS parsing is slowing down the whole selecting process.

Is there something I can do to recover the performance?

@lexborisov
Copy link

Hi @zyc9012

Okay, I'll take a look at it. It's weird that the parser slowed down, it didn't seem to change anything there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

3 participants