Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: huge table e107_session #5262

Open
ericc-dream opened this issue May 23, 2024 · 12 comments
Open

[Question]: huge table e107_session #5262

ericc-dream opened this issue May 23, 2024 · 12 comments
Labels
type: question An ask about behavior that is not found documented anywhere

Comments

@ericc-dream
Copy link

Question

Hi

I have a question.
I'm moving my site to some different hosting and I noted something "annoying"
When I exported my database from the server of my old hosting, it had a size of around 8Mb give or take
I restored everything less than 1 week ago, and today I tried to export it again , and the size is now 25Mb !!! more than 3 times ...
more specifically the table e107_session is now more than 17Mb alone !
I had 6 rows inside, I have now 17334 ... and 17333 have "session_user = 0"

I wonder, do you store the session ID for every visitor of the site in this table ?
it's a little bit too much ... taking into account that I have only 2 users on my site, including myself
How can I limit, reduce that ?

I see a lot of bots hitting my site and crawling in the forum, mainly from "semrush"
I guess that they are the reason of the explosion of my database
Do I have a simple way to block this bot(s) ?

In a more general question :
When I export my database , is there some tables that I can exclude ?
If a table doesn't exist, is the code able to recreate it ?
(I can export the "CREATE" but not the content eventually)

Expected Source

I checked the content of the current DB with the previous export

Additional Context

No response

@ericc-dream ericc-dream added the type: question An ask about behavior that is not found documented anywhere label May 23, 2024
@Jimmi08
Copy link
Contributor

Jimmi08 commented May 23, 2024

Hi, it was discussed on gitter some time ago. A session record is created for boots too. I reported it. Not sure if an issue was created. it is around Thu, May 11, 2023 , you can look for word sessions.
I am copying solution by @Deltik :

Although there currently isn't a way to clean up guest sessions sooner, you could create a cron job script like this:

<?php
require_once('e107_config.php');

$gc_maxlifetime = (int)ini_get('session.gc_maxlifetime');
if (!$gc_maxlifetime) {
    $gc_maxlifetime = 3600;
}

$expiration_time = time() - $gc_maxlifetime + 3600; // One hour

$dsn = "mysql:host=$mySQLserver;dbname=$mySQLdefaultdb;charset=$mySQLcharset";
$pdo = new PDO($dsn, $mySQLuser, $mySQLpassword);

$table_name = $mySQLprefix . "session";

$query = "DELETE FROM `$table_name` WHERE (session_user = 0 OR session_user IS NULL) AND session_expires < $expiration_time LIMIT 1000";

$total_deleted = 0;
do {
    $stmt = $pdo->prepare($query);
    $stmt->execute();
    $deleted_rows = $stmt->rowCount();
    $total_deleted += $deleted_rows;
} while ($deleted_rows > 0);

echo "Deleted " . $total_deleted . " rows from the $table_name table.\n";

(I haven't tested it myself, but it should delete guest sessions older than 1 hour.)

@ericc-dream
Copy link
Author

I have blocked them in .htaccess !

# allow all except those indicated here
<Files *>
order allow,deny
allow from all
#  SEMrushBot ban
deny from 46.229.164.98
deny from 46.229.164.100
deny from 46.229.164.101
deny from 85.208.96.192/27
deny from 185.191.171.0/27
</Files>

I don't see them anymore in the admin panel and the number of entries in the table seems to decrease ... but it's only 10mins ago so it will take time

@rica-carv
Copy link
Contributor

@ericc-dream You'll have a lot of IP's to block, that's just a very very very small tip of the iceberg....

I've used some third party software to block those bots right from the start, and i'll try to use it again when i got my site back up and running, because i had the same issue that you have, and my db just was groing everyday...

For me, that software was the best and easiest approach

@Jimmi08
Copy link
Contributor

Jimmi08 commented May 24, 2024

@rica-carv I will add badbehaviour script for other cms they are very satisfied with it too.

The topic above (at least part of it) was about the life of the quest session - I complained about this a long time ago. They should be deleted after an hour by core, Or you can set the session lifetime to an hour and you will not have this issue.

@ericc-dream
Copy link
Author

So I don't really mind about bots ... at the end, they can help to have some traffic
But this one "Semrush" was way too aggressive !
I had always 15 connections minimum from it
Since I have added the range in .htaccess, I didn't saw any , and the size of my db decrease ...

But if you can share the name of the "third party software" that you used ... in case of need ?

@rica-carv
Copy link
Contributor

rica-carv commented May 24, 2024

But if you can share the name of the "third party software" that you used ... in case of need ?

@ericc-dream CIDRAM
But supposedly E107 should have a forum spam blocker.....
Not the same thing, i know, but who knows if it might help?

@Moc
Copy link
Member

Moc commented May 24, 2024

The SFS plugin is used on registrations (or on already existing accounts) to check against the stopforumspam database. If a match is found, the registration is denied.

It does not block visitors/guests (bots or not), so it will not help solve this issue. Sessions will still be created.

@Alex-e107nl
Copy link
Contributor

And expanding the .htaccess file, this line is to block bad bots?

Block Bad Bots

SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(libwww-perl|aesop_com_spiderman) HTTP_SAFE_BADBOT
Deny from env=HTTP_SAFE_BADBOT

@Moc
Copy link
Member

Moc commented May 24, 2024

You'll never get all bots blocked by using htaccess as you're always one step behind. It has become an entire business model to sell packages of IP ranges for bots to use, or for individuals to buy to then block those same IP addresses.

Additionally, using htaccess to block bots strains server performance and affects loading times.

The solution to this issue is not blocking but implementing proper session handling and cleanup.

Please refrain from further comments on blocking bots in this issue report.

@Deltik do you have some enhancements planned for session handling?

@ericc-dream
Copy link
Author

To give an update
After blocking semrush in .htaccess, the number of session drop quickly.
At one moment in time, I deleted all remaining with "user=0" ...
Today, 2 days later, I have 304 rows ... which is still high for a site with only 2 members !
but a little more acceptable than the 17333 from the first message

I'm agree with @Moc , we can't win at this game. htaccess was a solution done in urgency to block a too aggressive bot, but not a definitive one.

Is there a way to change this session time from the admin pages ?
Can we consider to have 2 different session time ? one short for non logged people and one longer for member

And for my personal knowledge : What will be the consequence for a non-logged person if the session time is like 2min by example ?

@CaMer0n
Copy link
Member

CaMer0n commented May 31, 2024

The method e_user_model::isBot() could help with this.

@Deltik
Copy link
Member

Deltik commented May 31, 2024

@Moc: Last I knew, guest sessions were triggered by the e107 CAPTCHA system, which stores the CAPTCHA answer in a session. I want to eliminate sessions for guests by storing the CAPTCHA answer somewhere else. (More info)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question An ask about behavior that is not found documented anywhere
Projects
None yet
Development

No branches or pull requests

7 participants