Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird hybrid pages which show content from mixed locations #136

Open
miohtama opened this issue Nov 15, 2012 · 7 comments
Open

Weird hybrid pages which show content from mixed locations #136

miohtama opened this issue Nov 15, 2012 · 7 comments

Comments

@miohtama
Copy link
Contributor

Not sure what to do with this

I want to report an issue with Plone that has been bugging me for a very long time now. There is a lot of issue trackers around though and I don't know which one is the appropriate. I am sure that there is some talk about this already but I cant find it anywhere. It's the problem of when you try to visit a plone site and you type a url that combines more than one valid path for that site. It's hard to explain but its pretty simple with an example:
https://plone.org/documentation (GOOD) https://plone.org/support (GOOD) https://plone.org/documentation/support (BAD - weird hybrid page that shows mixed contents from different locations). Expected would be 404 page for https://plone.org/documentation/support
DavidJonas: hi

@miohtama
Copy link
Contributor Author

Not that I know of. It happens on any Plone website (Plone.org is just an example). It might happen when there are any relative links on the site that appear in more than one page. My problem is that some of those pages have been popping out on google searches.
I don't think there are links within the site but maybe somebody mistyped/mixed URLs on some other site's link and it ends up on google.
<Moo^_^> robots.txt is the easiest way to eliminate them from google

@miohtama
Copy link
Contributor Author

Tuning robots.txt need to be assigned someone with Plone god priviledges: I can take if we cannot find anyone else

@davidjonas
Copy link

I do still think that robots.txt will only hide a part of the problem since the wrong link would still be on the internet somewhere. The real problem is that Plone allows this type of traversal through the URL. Any possible combination of of 2 or more valid paths in the URL end up on a 200 OK page with unpredictable broken content. On any Plone website out there.

I think the problem is somewhere in either acquisition or traversal that allows this behavior. I think it might be actually a Zope bug instead of a Plone bug. Unfortunately I don't know how to go deeper into this.

It can result in really weird URLs being valid such as:

https://plone.org/news/plone-framework-team-accepts-new-members/news/plone-tune-up-scheduled-for-friday-november-16th

That end up in almost normal looking pages with random slight differences that drive developers insane. For example in the above page. It looks exactly like the valid page https://plone.org/news/plone-tune-up-scheduled-for-friday-november-16th but if you are logged in, you will not see the published state of the page for example. That would be very hard to debug if you didn't notice that the URL was actually wrong.

@davisagli
Copy link
Member

Yes, this is because Zope's DefaultPublishTraverse class uses acquisition: it first tries traversing using bobo_traverse, then tries an attribute lookup on the aq_base of the object (i.e. without acquisition), then tries a view lookup, then tries an attribute lookup with acquisition.

We could try experimenting with registering a replacement IBrowserPublisher adapter that doesn't try acquisition, but I suspect that we've got things that depend on it (traversing to items in CMF skin layers, for example, though I haven't confirmed that).

@djay
Copy link
Member

djay commented Nov 17, 2012

On 18/11/2012, at 7:37 AM, David Glick [email protected] wrote:

Yes, this is because Zope's DefaultPublishTraverse class uses acquisition: it first tries traversing using bobo_traverse, then tries an attribute lookup on the aq_base of the object (i.e. without acquisition), then tries a view lookup, then tries an attribute lookup with acquisition.

We could try experimenting with registering a replacement IBrowserPublisher adapter that doesn't try acquisition, but I suspect that we've got things that depend on it (traversing to items in CMF skin layers, for example, though I haven't confirmed that).

There are some pretty weird bugs caused by it so it would be worth seeing what does depend on acquisition. For example you get no 404 pages or redirections for anything named the same as something elsewhere in the acquisition path, such as the id of another plone site.


Reply to this email directly or view it on GitHub.

@davisagli
Copy link
Member

I tried it, and as I suspected skin layer items can't be found without getting acquired. We can revisit this once the PLIP to remove skin layers is complete (at which point an option could be added to Zope to turn off acquisition during traversal).

@k-j-kleist
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants