Weird hybrid pages which show content from mixed locations #136

miohtama · 2012-11-15T11:24:11Z

Not sure what to do with this

I want to report an issue with Plone that has been bugging me for a very long time now. There is a lot of issue trackers around though and I don't know which one is the appropriate. I am sure that there is some talk about this already but I cant find it anywhere. It's the problem of when you try to visit a plone site and you type a url that combines more than one valid path for that site. It's hard to explain but its pretty simple with an example:
https://plone.org/documentation (GOOD) https://plone.org/support (GOOD) https://plone.org/documentation/support (BAD - weird hybrid page that shows mixed contents from different locations). Expected would be 404 page for https://plone.org/documentation/support
DavidJonas: hi

miohtama · 2012-11-15T11:34:33Z

Not that I know of. It happens on any Plone website (Plone.org is just an example). It might happen when there are any relative links on the site that appear in more than one page. My problem is that some of those pages have been popping out on google searches.
I don't think there are links within the site but maybe somebody mistyped/mixed URLs on some other site's link and it ends up on google.
<Moo^_^> robots.txt is the easiest way to eliminate them from google

miohtama · 2012-11-15T11:35:14Z

Tuning robots.txt need to be assigned someone with Plone god priviledges: I can take if we cannot find anyone else

davidjonas · 2012-11-15T11:56:34Z

I do still think that robots.txt will only hide a part of the problem since the wrong link would still be on the internet somewhere. The real problem is that Plone allows this type of traversal through the URL. Any possible combination of of 2 or more valid paths in the URL end up on a 200 OK page with unpredictable broken content. On any Plone website out there.

I think the problem is somewhere in either acquisition or traversal that allows this behavior. I think it might be actually a Zope bug instead of a Plone bug. Unfortunately I don't know how to go deeper into this.

It can result in really weird URLs being valid such as:

https://plone.org/news/plone-framework-team-accepts-new-members/news/plone-tune-up-scheduled-for-friday-november-16th

That end up in almost normal looking pages with random slight differences that drive developers insane. For example in the above page. It looks exactly like the valid page https://plone.org/news/plone-tune-up-scheduled-for-friday-november-16th but if you are logged in, you will not see the published state of the page for example. That would be very hard to debug if you didn't notice that the URL was actually wrong.

davisagli · 2012-11-17T20:37:13Z

Yes, this is because Zope's DefaultPublishTraverse class uses acquisition: it first tries traversing using bobo_traverse, then tries an attribute lookup on the aq_base of the object (i.e. without acquisition), then tries a view lookup, then tries an attribute lookup with acquisition.

We could try experimenting with registering a replacement IBrowserPublisher adapter that doesn't try acquisition, but I suspect that we've got things that depend on it (traversing to items in CMF skin layers, for example, though I haven't confirmed that).

djay · 2012-11-17T20:57:26Z

On 18/11/2012, at 7:37 AM, David Glick [email protected] wrote:

Yes, this is because Zope's DefaultPublishTraverse class uses acquisition: it first tries traversing using bobo_traverse, then tries an attribute lookup on the aq_base of the object (i.e. without acquisition), then tries a view lookup, then tries an attribute lookup with acquisition.

We could try experimenting with registering a replacement IBrowserPublisher adapter that doesn't try acquisition, but I suspect that we've got things that depend on it (traversing to items in CMF skin layers, for example, though I haven't confirmed that).

There are some pretty weird bugs caused by it so it would be worth seeing what does depend on acquisition. For example you get no 404 pages or redirections for anything named the same as something elsewhere in the acquisition path, such as the id of another plone site.

—
Reply to this email directly or view it on GitHub.

davisagli · 2012-11-17T21:13:12Z

I tried it, and as I suspected skin layer items can't be found without getting acquired. We can revisit this once the PLIP to remove skin layers is complete (at which point an option could be added to Zope to turn off acquisition during traversal).

k-j-kleist · 2012-11-28T20:15:23Z

see http://dev.plone.org/ticket/13354

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird hybrid pages which show content from mixed locations #136

Weird hybrid pages which show content from mixed locations #136

miohtama commented Nov 15, 2012

miohtama commented Nov 15, 2012

miohtama commented Nov 15, 2012

davidjonas commented Nov 15, 2012

davisagli commented Nov 17, 2012

djay commented Nov 17, 2012

davisagli commented Nov 17, 2012

k-j-kleist commented Nov 28, 2012

Weird hybrid pages which show content from mixed locations #136

Weird hybrid pages which show content from mixed locations #136

Comments

miohtama commented Nov 15, 2012

miohtama commented Nov 15, 2012

miohtama commented Nov 15, 2012

davidjonas commented Nov 15, 2012

davisagli commented Nov 17, 2012

djay commented Nov 17, 2012

davisagli commented Nov 17, 2012

k-j-kleist commented Nov 28, 2012