-
Notifications
You must be signed in to change notification settings - Fork 20
Implement shz.de decryption #2
Comments
The text scrambling seems to happen server-side. When visiting as a logged in user, the HTML repsonse does not contain any The mentioned script (here is a deobfuscated version btw) seems to only deal with displaying the paywall / registration options. However - as the scrambled text still closely resembles the original words, there may be a reversable algorithm at work here. Do we have any indication on what software they're using? Also, it bugs me why that scrambled original text is part of the raw response anyway... what could be the advantage of doing so (SEO)? |
Yup, also I figured out that the text gets rescrambled when opening the same article in a new tab. As I said, they move all chars in a string split by space randomly. |
Damn, you're right.
Actually, i don't even need to delete cookies. It's enough to open in a new tab, open up dev tools with "Deactivate Caching" on the network tab and it gets rescrambled. At other times, only opening a new tab has it rescramled. I noticed the Not sure if this is interconnected or purely related to tracking. |
Seems to be related. C1 probably stands for CeleraOne which is a berlin-based company focused on creating paywalls. On their page, you can find a review by Nicolas L. Fromm which is the CEO at Digital (?) of medienhaus:nord, the company behind shz and others. |
Maybe free access for 1 month may be helpful. |
Just wanted to read an RP Online article and when i looked at the HTML, the way the text is scrambled looks pretty familiar. Different CSS classes though, and the global window object to init & track the user paywall is called Also, when i go back to shz.de as a logged in user, this is the cookie being sent with every request:
So, on seeing creid it looks very likely they use one and the same product. |
The RP Online CMS is definitely InterRed Online. |
So how do we get ahold of their scrambling algorithm? Applying for a demo account probably may not help, as we most likely would need to get a peek into the server-side code. |
It seems like this kind of obfuscation is indeed coming from InterRed CMS, which is used by a bunch of German newspapers: I'm listing their domains here so others can find this issue.
As @Philzen already pointed out, there does not seem to be any client-side deobfuscation. The very first GET response already includes the plain text when logged in. Makes you wonder why they include the scrambled text at all. Perhaps just as some kind of gimmick to make the layout of the blurred text align with that of the plain text? I also noticed that Google does find paywalled articles when searching for parts of the obfuscated plain text. I've tried imitating Googlebot and Googlebot-News' request headers, but always got the obfuscated text. Setting |
There is a predefined set of scrambled text variations. SHZ seems to have two, other newspapers four. I assume this merely depends number of caching backends used by the respective newspaper where each of those backends holds one version of scrambled text. Apart from Some observations:
|
This issue is for discussing the decryption of shz.de articles.
They are saved with an encryption moving all letters inside a string split by space into another position. The position changes per article, but not per reload. The decryption probably happens in this script.
When loading without uBlock Origin, the text gets replaced by Lorem Ipsum.
The text was updated successfully, but these errors were encountered: