-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to use to_text
instead of to_html
to FullSanitizer
#154
Comments
@Earlopain this seems like a good idea to me, and I'd consider merging something like this. Do you want to try to implement this in a pull request? I'd be happy to support and mentor you if that's something that would be helpful! |
@flavorjones I'd like to give it a shot. Thank you for the mentoring offer, I really appreciate that. I'll see how far I get and if I get stuck anywhere I'll get back to you on that. I should have time for this at the end of the week by the latest. |
OK, I'm going to step back and ask whether your use case could be addressed by just using Loofah's built-in scrubber: test "ActionView::Helpers::SanitizeHelper#sanitize with a Loofah builtin scrubber" do
input = %(<div>hello</div><p>world</p>)
assert_equal(%(\nhello\n\nworld\n), @subject.sanitize(input, scrubber: :newline_block_elements))
end That test passes today! The symbol I'm also curious why you're not just using Loofah directly? |
Thanks for the reply. I'm mostly looking to improve the functionality in the rails I occasionally find myself wanting to scrub html, loosing whitespace was never what I wanted. I know I can use Loofah (though I wasn't aware of the scrubber syntax) but the native rails method just doesn't work for me. I think it would be great if rails could support this behaviour directly. |
This is the use case I'm curious to know more about. What kind of view is it, what kind of user agent is rendering the output? I think understanding the use case will be good to make the case upstream to change or add features in SanitizeHelper. |
Of course, I'm hoping I'm not getting to long-winded here: Consider a page where users can write text in the form of markdown, like in Github Issues. There is a bunch of decorative text like the issue title or who wrote the issue, but the only relevant part is what the user provided, like this for example: # Cool title
Very descriptive text In html terms this would mean something like this: <h1 id="cool-title">Cool title</h1><p>Very descriptive text</p> Usually search engines are pretty good about what text they include in their search results but sometimes the text from outside gets included. Something like Another use case is for web scraping where you consume and store arbitrary HTML from websites. I'd like to display that to users without embeding untrusted input while at least preserving the general structure of the text without making it unreadable. I realize that these things are probably very specific to me, I Just thought a few others might find a quick solution useful as well. I'm perfectly fine with just using Loofah myself. |
@Earlopain Thank you for this context! Very helpful, I get it now. Let me think about this a bit. I think you're making a good case that the existing behavior of |
Hi there @flavorjones, did you have a chance to give this some more thought? |
@Earlopain Thanks for the ping. This is still on my to-do list. So many Loofah and RHS changes have gone into Rails 7.1 that I've been kicking this can down the road until after 7.1 ships (which should be pretty soon). I've also learned a bit more about how changes like this can be framed/proposed, and I have an idea of how I want to drive that conversation. Thanks for your patience. |
Waiting for after Rails 7.1 is totally fair. Thanks for letting me know |
@flavorjones Thanks for sharing the |
Tagging this on the Rails Conf 2024 hack day project board. If someone wants to think/talk about this that would be great! |
I'm looking at writing some tests/PR in actionview for this! |
Rails has the
strip_tags
method, which is great. Howerever for input like<p>a</p><p>b</p>
the output isab
which doesn't follow what is displayed in the browser.Loofah has the function
to_text
which handles newlines like this. I would like to introduce an option likepreserve_whitespace
(or similar) to theFullSanitizer
to do just that.Is that something you would consider adding?
The text was updated successfully, but these errors were encountered: