Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

500 Internal Server Error while using Admin REST API #181

Open
Kh-Oleg opened this issue May 2, 2023 · 6 comments
Open

500 Internal Server Error while using Admin REST API #181

Kh-Oleg opened this issue May 2, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@Kh-Oleg
Copy link

Kh-Oleg commented May 2, 2023

Describe the bug

Fief responds with "500 Internal Server Error" while using different Admin REST API from a script.

To Reproduce

Steps to reproduce the behavior:

  1. Start Fief instance with one default tenant, one default client, one Admin API Key
  2. Execute a script doing the following:
    • GET /tenants/ # Get default tenant id
    • POST /permissions/ # Create permission1
    • POST /permissions/ # Create permission2
    • POST /users/ # Create user1
    • POST /users/ # Create user2
    • POST /users/{user_id1}/permissions # Assign permission1 to user1
    • POST /users/{user_id2}/permissions # Assign permission2 to user2
      Assuming that requests have all needed and correct POST payloads, headers and Authorization Bearer.

Expected behavior

No "500 Internal server error" responses from Fief.

Configuration

self-hosted, Fief version: 0.24.2

Additional context

The error is non-deterministic, it's not related to any particular request, any request can return 500.
When I added 1 second delay between requests, everything went OK. You can see the successful log below.

xxx@xxx:/opt/src# fief run-server --port 8001 2>&1
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Main Fief workspace already exists
Main Fief user already exists
2023-05-02 15:08:24.725 | INFO     | fief.lifespan:lifespan:31 - Fief Server started - {"version": "0.24.2"}
INFO:     127.0.0.1:50620 - "GET /tenants/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:50630 - "POST /permissions/ HTTP/1.1" 201 Created
2023-05-02 15:08:31.188 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 15:08:31.274 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
INFO:     127.0.0.1:50632 - "POST /permissions/ HTTP/1.1" 201 Created
2023-05-02 15:08:31.636 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 15:08:31.647 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
INFO:     127.0.0.1:50644 - "POST /users/ HTTP/1.1" 201 Created
2023-05-02 15:08:32.796 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "on_after_register"}
2023-05-02 15:08:32.871 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "on_after_register"}
2023-05-02 15:08:32.877 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 15:08:32.889 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
INFO:     127.0.0.1:50650 - "POST /users/ HTTP/1.1" 201 Created
2023-05-02 15:08:33.847 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "on_after_register"}
2023-05-02 15:08:33.900 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "on_after_register"}
2023-05-02 15:08:33.921 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 15:08:33.936 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
INFO:     127.0.0.1:50662 - "POST /users/40284c05-9b18-431d-a2ef-b9f1707d76d3/permissions HTTP/1.1" 201 Created
2023-05-02 15:08:34.871 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 15:08:34.886 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
INFO:     127.0.0.1:50668 - "POST /users/90e4cecd-7898-43a3-84d6-feacdeebaab6/permissions HTTP/1.1" 201 Created
2023-05-02 15:08:35.884 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 15:08:35.912 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
^C

Without delays, however, I could not execute my script fully, not getting 500 at some of the requests. The log in this case looks like this:

xxx@xxx:/opt/src# fief run-server --port 8001 2>&1
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Main Fief workspace already exists
Main Fief user already exists
2023-05-02 15:12:02.562 | INFO     | fief.lifespan:lifespan:31 - Fief Server started - {"version": "0.24.2"}
INFO:     127.0.0.1:60742 - "GET /tenants/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:52424 - "POST /permissions/ HTTP/1.1" 201 Created
INFO:     127.0.0.1:52434 - "POST /permissions/ HTTP/1.1" 201 Created
INFO:     127.0.0.1:52450 - "POST /users/ HTTP/1.1" 201 Created
2023-05-02 15:12:08.245 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
INFO:     127.0.0.1:52456 - "POST /users/ HTTP/1.1" 500 Internal Server Error
2023-05-02 15:12:08.344 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
2023-05-02 15:12:08.351 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 15:12:08.381 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
2023-05-02 15:12:08.402 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 15:12:08.437 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
2023-05-02 15:12:08.443 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "on_after_register"}
2023-05-02 15:12:08.519 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "on_after_register"}
^C

xxx@xxx:/opt/src# fief run-server --port 8001 2>&1
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Main Fief workspace already exists
Main Fief user already exists
2023-05-02 15:36:08.331 | INFO     | fief.lifespan:lifespan:31 - Fief Server started - {"version": "0.24.2"}
INFO:     127.0.0.1:44862 - "GET /tenants/ HTTP/1.1" 200 OK
INFO:     127.0.0.1:44870 - "POST /permissions/ HTTP/1.1" 201 Created
INFO:     127.0.0.1:44874 - "POST /permissions/ HTTP/1.1" 500 Internal Server Error
2023-05-02 15:36:12.437 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 15:36:12.511 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
^C

It looks like, there is a race related to webhooks somewhere.

@Kh-Oleg Kh-Oleg added the bug Something isn't working label May 2, 2023
@frankie567
Copy link
Member

That's strange indeed.

Would it be possible for you to share the full script?

Meanwhile, a few things to try, would be glad to see the effect on your issue:

  1. Set the environment variable WEB_CONCURRENCY to 1
  2. Set the environment variable TELEMETRY_ENABLED to False

@Kh-Oleg
Copy link
Author

Kh-Oleg commented May 2, 2023

TELEMETRY_ENABLED was set to 0 before. I tried also with WEB_CONCURRENCY=1 - nothing has changed.
My real script does more, but I could reproduce the bug with a smaller one:

import requests
import json
import time

fief_url = 'http://localhost:8001/admin/api'

# Assuming Admin API Key exists in Fief's database
API_KEY='*************************************'

def post(route, data):
    # time.sleep(1) # Required, since Fief 0.24.2 has a race condition somewhere
    headers = {"Content-Type": "application/json", 
               "accept": "application/json",
               "Authorization": f"Bearer {API_KEY}"}

    response = requests.post(f"{fief_url}/{route}", headers=headers, data=json.dumps(data), timeout=0.5)
    if response.status_code >= 400:
        print(f"Error! data: {json.dumps(data)}")
        raise Exception(response.content)
    return response.json()

def create_permission(data):
    response = post('permissions/', data)
    return response['id']

# Create permissions
for i in range(1, 10):
    permission_id = create_permission({'name': f"Create Castle{i}", 'codename': f"createCastle{i}"})
    print(f"permission created, id: {permission_id}")

Fief log output:

INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Main Fief workspace already exists
Main Fief user already exists
2023-05-02 19:46:15.955 | INFO     | fief.lifespan:lifespan:31 - Fief Server started - {"version": "0.24.2"}
xxx@xxx:/opt/src# INFO:     127.0.0.1:40426 - "POST /permissions/ HTTP/1.1" 201 Created
INFO:     127.0.0.1:40436 - "POST /permissions/ HTTP/1.1" 201 Created
INFO:     127.0.0.1:40450 - "POST /permissions/ HTTP/1.1" 201 Created
INFO:     127.0.0.1:40462 - "POST /permissions/ HTTP/1.1" 201 Created
INFO:     127.0.0.1:40476 - "POST /permissions/ HTTP/1.1" 500 Internal Server Error
2023-05-02 19:46:43.756 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.825 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.831 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.844 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.878 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.903 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.909 | INFO     | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.937 | INFO     | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}

I see, that each time, when the problem doesn't occur, both 'Start task' and 'Done task' are executed before the next request arrives. The opposite is also true: when the 500 occurs, web hooks appear in the log after one or more following requests. This makes me thinking that there is, probably, some kind of deferred web hook trigger, which doesn't handle properly the case, when many requests arrive one after another.

500 most likely means some exception inside Fief, but log output doesn't show it. I would capture the Exception stack trace as well, if you'll tell me how to enable it.

@frankie567
Copy link
Member

I was able to reproduce it (especially when creating users, not when creating permissions). Here is the error I got:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked

It seems there is a lock on the SQLite database, probably because several processes try to access it concurrently. I'll try to see what can be done about it.

@frankie567
Copy link
Member

I was able to track down the problem: it comes from the AuditLogger sink, which makes requests to the DB in a background thread.

My intuition is that, sometimes, when a subsequent request comes to create a user/permission, the AuditLog from the previous request is currently being written to the DB, which raises the database lock error.

I'll see how to mitigate this. Maybe by deferring the AuditLogger completely in the background worker.

@Kh-Oleg
Copy link
Author

Kh-Oleg commented May 17, 2023

The bug still exists, but is only less frequent now.

I'd suggest reopening the bug and investigating it further, GitHub doesn't allow me to reopen it.

@frankie567 frankie567 reopened this Sep 5, 2023
@matteius
Copy link

I am getting a similar error I think just trying to login to admin now -- I was recently able to login though, still trying to debug it at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants