Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial context updates #93

Open
wants to merge 340 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
340 commits
Select commit Hold shift + click to select a range
83547a6
update property added
pseusys May 23, 2023
89bdf54
typos fixed
pseusys May 24, 2023
6c17b1e
new scheme proposal
pseusys May 24, 2023
e263354
new subscript type (no subscript)
pseusys May 31, 2023
9d00610
private key, 3 default values and test fix
pseusys Jun 1, 2023
ec543d2
sample example added
pseusys Jun 1, 2023
8950e60
example db cleaned
pseusys Jun 1, 2023
157a18a
mongo completed
pseusys Jun 6, 2023
38b06f0
sql operational
pseusys Jun 6, 2023
d1e1b4f
ydb operational (for current test set)
pseusys Jun 8, 2023
acfb571
sql redefinition fixed
pseusys Jun 9, 2023
02eedff
attributes moved to vars
pseusys Jun 11, 2023
9e77d90
type checks and restrictions added
pseusys Jun 13, 2023
fcfaf06
function order fixed
pseusys Jun 13, 2023
885c899
policies tests added
pseusys Jun 13, 2023
6ae19ef
_hilarious_ YDB random bug fixed **again** just for `FUN`
pseusys Jun 13, 2023
155867d
lint applied
pseusys Jun 14, 2023
2a81e93
tests restored
pseusys Jun 14, 2023
cf2e8db
Merge branch 'dev' into feat/partial_context_updates
pseusys Jun 14, 2023
d4ad968
errors fixed
pseusys Jun 14, 2023
ad9eb62
fixed two more errors
pseusys Jun 14, 2023
e57cfd1
docstrings added
pseusys Jun 14, 2023
0dc2070
mongo bug fixed
pseusys Jun 16, 2023
04e27c6
redis optimized
pseusys Jun 16, 2023
bea2155
mongo indexes added
pseusys Jun 16, 2023
9eef2ca
one less query for redis
pseusys Jun 16, 2023
ab0aaf8
sql requests made async
pseusys Jun 20, 2023
98f427d
one other option for SQL storages
pseusys Jun 23, 2023
5588dc6
sqls fixed
pseusys Jun 23, 2023
8121a25
duplicate indexes removed
pseusys Jun 25, 2023
5fe4d31
sqlite async error fixed
pseusys Jun 26, 2023
74e76b5
sync writes
pseusys Jun 27, 2023
b00fc30
async disabling possibility added, query parameters overflow fixed
pseusys Jun 28, 2023
5755d06
sql log read finished
pseusys Jun 28, 2023
5d70793
load test added, all items fixed
pseusys Jun 29, 2023
221fa01
ydb implemented
pseusys Jun 29, 2023
dd5c4d0
mongo finished
pseusys Jun 29, 2023
b44484d
mongo passes all tests correctly
pseusys Jun 29, 2023
ecc92bf
single log behavior added as default
pseusys Jun 29, 2023
8ce83ce
limit removed
pseusys Jul 3, 2023
08f2999
sql reworked
pseusys Jul 5, 2023
77a3d79
overread disabled
pseusys Jul 10, 2023
6bd931c
and now really updated
pseusys Jul 10, 2023
1c5e170
sparse logging
pseusys Jul 10, 2023
84a84ad
double writing disabled
pseusys Jul 11, 2023
9b94df9
faster (probably) serialization setup
pseusys Jul 11, 2023
b32aa73
faster pickle fixed
pseusys Jul 11, 2023
01f8b46
potential data loss prevented
pseusys Jul 12, 2023
c28b792
serializer interface added, datetime args added
pseusys Jul 13, 2023
5cb1b45
mongo ready
pseusys Jul 14, 2023
ccbc07a
redis done + active_ctx returned
pseusys Jul 18, 2023
49435da
ydb ready
pseusys Jul 18, 2023
39d0da7
file-based
pseusys Jul 19, 2023
1ca66ed
with_stem removed
pseusys Jul 19, 2023
9bb3eb7
ydb ??? again??
pseusys Jul 19, 2023
a8c6497
len and prune
pseusys Jul 21, 2023
2bbf6e4
redis delete number of args changed
pseusys Jul 21, 2023
7aefa5b
Update community.rst, revert some changes
pseusys Jul 21, 2023
6fa0542
one line reverted
pseusys Jul 21, 2023
fa9359f
double serialization removed
pseusys Jul 21, 2023
9fdf5bd
no_dependencies_tests_fixed
pseusys Jul 21, 2023
c70157a
serializer changed
pseusys Jul 21, 2023
05f0d94
serializer unchanged (example)
pseusys Jul 21, 2023
95ba296
partial tutorials started
pseusys Jul 30, 2023
cd020c9
context storages made async
pseusys Jul 31, 2023
687ba7e
tutorials added
pseusys Jul 31, 2023
425a744
example context storage removed
pseusys Jul 31, 2023
2403aed
docs added
pseusys Aug 1, 2023
b4546b4
storages docs updated
pseusys Aug 1, 2023
bdda5ff
reviewed problems fixed
pseusys Aug 2, 2023
414e4a0
file-based dbs made sync
pseusys Aug 3, 2023
e5357fc
quickle removed
pseusys Aug 3, 2023
edeb376
Excessive description removed
pseusys Aug 4, 2023
895011c
Merge branch 'dev' into feat/partial_context_updates
pseusys Aug 4, 2023
78d2ccc
migrated to pydantic 2.0
pseusys Aug 4, 2023
d4bff86
Documentation building fixes (#186)
pseusys Aug 4, 2023
821713c
add patch for json context storage
ruthenian8 Aug 4, 2023
fca7c42
json storage fixed
pseusys Aug 4, 2023
33dabc8
Merge branch 'feat/partial_context_updates' of https://github.com/dee…
pseusys Aug 4, 2023
c5ad6d5
test pickle save and load with logging
pseusys Aug 4, 2023
8deaabd
timestamp conversion test for windows
pseusys Aug 7, 2023
cbe7c70
time in nanoseconds for windows
pseusys Aug 7, 2023
b14239e
ok ok windows take this
pseusys Aug 7, 2023
ed888d4
some other idea to trick windows
pseusys Aug 7, 2023
998fb2c
excessive logging removed
pseusys Aug 8, 2023
12f938e
config dicts fixed + module docstrings added
pseusys Aug 9, 2023
dbc8928
linting and formatting fixed
pseusys Aug 9, 2023
ab43a98
s's removed from docstrings
pseusys Aug 10, 2023
cc18acc
Merge branch 'dev' into feat/partial_context_updates
pseusys Aug 10, 2023
7f850ee
type defined
pseusys Aug 11, 2023
9fe28c9
property docstring added
pseusys Aug 14, 2023
6e4eb75
Merge branch 'dev' into feat/partial_context_updates
pseusys Aug 30, 2023
c25c48d
dff installation cell added to tutorial 8
pseusys Aug 30, 2023
6856ee5
shelve improved
pseusys Sep 5, 2023
5314e31
partial review reaction
pseusys Sep 19, 2023
7f1835e
more documentation added
pseusys Sep 20, 2023
cd76105
finished review
pseusys Sep 24, 2023
4238f9b
Merge branch 'dev' into feat/partial_context_updates
RLKRo Mar 22, 2024
50cda47
put benchmark tutorial after partial updates one
RLKRo Mar 22, 2024
4cc055a
Merge branch 'dev' into feat/partial_context_updates
pseusys Jul 4, 2024
7f77c8f
context storages updated
pseusys Jul 4, 2024
2617255
old naming reset
pseusys Jul 4, 2024
4fb8f67
context merge fixed
pseusys Jul 4, 2024
1230d16
context ids removed
pseusys Jul 4, 2024
c3d82da
context equality tested
pseusys Jul 4, 2024
0bd6347
framework data comparison removed
pseusys Jul 4, 2024
4a15bf0
context id removed from everywhere
pseusys Jul 4, 2024
9b3dd80
lint applied
pseusys Jul 4, 2024
4f0562a
documentation building fixed
pseusys Jul 4, 2024
ef0a9ee
RST syntax fixed
pseusys Jul 4, 2024
3d364bc
context dict added
pseusys Jul 29, 2024
e7ad269
async + pydantic
pseusys Jul 30, 2024
be34714
fixes
pseusys Jul 31, 2024
b8701a0
hashes manipulation only on `write_full_diff`
pseusys Jul 31, 2024
a58eace
ctx_dict + ctx updated
pseusys Aug 5, 2024
33f2823
setting removed
pseusys Aug 5, 2024
c4f9fce
sets added
pseusys Aug 6, 2024
e892a52
serialization added, sample context storage class created
pseusys Aug 6, 2024
1b8aa0d
iterative async access made synchronous
pseusys Aug 6, 2024
173b1fe
sql prototype
pseusys Aug 6, 2024
9665038
context API updated proposal
pseusys Aug 7, 2024
3468af5
context schema and serializer removed
pseusys Aug 7, 2024
71bd9f3
context API updated once again
pseusys Aug 7, 2024
2e6b334
review notes fixed
pseusys Aug 8, 2024
830ea40
ContextDictView made mutable
pseusys Aug 8, 2024
5d3dd95
context dict file split
pseusys Aug 8, 2024
f00ba02
turn introduction reverted
pseusys Aug 8, 2024
1af24db
turns separated (again)
pseusys Aug 13, 2024
3616ac0
key deletion now nullifies value
pseusys Aug 13, 2024
81ce7ba
memory storage
pseusys Aug 16, 2024
1f9e653
ctx_dict tests done
pseusys Aug 17, 2024
c981cc5
general context storages tests created
pseusys Aug 27, 2024
5002dda
ctx_dict updated not to use serializer
pseusys Sep 18, 2024
3e6a8f4
merge dev
RLKRo Sep 18, 2024
6991fb6
Merge branch 'refs/heads/dev' into feat/partial_context_updates
RLKRo Sep 18, 2024
5b80818
fix imports in newly added files
RLKRo Sep 19, 2024
96af9bc
hide circular imports behind type checking
RLKRo Sep 19, 2024
000fb0d
fix imports in test files
RLKRo Sep 19, 2024
2c2ab9d
merge context.init into context.connected
RLKRo Sep 19, 2024
2eb5a2c
remove get_last_index imports
RLKRo Sep 19, 2024
06d54b9
update pipeline.context_storage type
RLKRo Sep 19, 2024
f80e6a3
fix bug with setting sequence type values under a single key
RLKRo Sep 19, 2024
c5311f6
revert primary_id renaming
RLKRo Sep 19, 2024
d43752a
memory test (almost!) finished
pseusys Sep 23, 2024
1ae3e4f
ctx_dict tests fixed
pseusys Sep 23, 2024
85315a6
add overload for getitem
RLKRo Sep 23, 2024
351a43e
split typevar definitions
RLKRo Sep 23, 2024
e9eb2fb
remove asyncio mark
RLKRo Sep 23, 2024
6d93399
allow using negative indexes for context dict
RLKRo Sep 23, 2024
e2053dc
add validation on setitem for context dict
RLKRo Sep 24, 2024
acdcd3c
fixes
RLKRo Sep 24, 2024
16a3d77
allow non-str context ids
RLKRo Sep 24, 2024
9a76ae3
add current_turn_id
RLKRo Sep 24, 2024
5e37651
fix tests
RLKRo Sep 24, 2024
d376e49
update doc
RLKRo Sep 24, 2024
256e296
integer keysreversed
pseusys Sep 24, 2024
e2ffa0a
sql storage update function fix
pseusys Sep 24, 2024
9043dca
move context factory and pipeline fixtures to global conftest
RLKRo Sep 24, 2024
d58ce7c
unbound V from BaseModel
RLKRo Sep 24, 2024
6905bcd
remove default marker; return None by default
RLKRo Sep 24, 2024
0ac3c1e
fix key slicing
RLKRo Sep 24, 2024
3956348
use current_turn_id in check_happy_path
RLKRo Sep 24, 2024
d37c4e2
use context_factory to initialize context in non-core tests
RLKRo Sep 24, 2024
2bf82f9
fix: await misc get
RLKRo Sep 24, 2024
8a4d8be
update pipeline tutorials
RLKRo Sep 24, 2024
6404eb4
allow initializing MemoryContextStoraeg via context_storage_factory
RLKRo Sep 25, 2024
240cded
move all db tests into a single parametrized test class
RLKRo Sep 25, 2024
535d524
SQL testing fixed
pseusys Sep 27, 2024
6e0a103
Merge branch 'feat/partial_context_updates' of https://github.com/dee…
pseusys Sep 27, 2024
862e7d3
test_dbs fixed
pseusys Sep 27, 2024
e82d086
file context storages implemented
pseusys Sep 27, 2024
59f91c1
file and sql fixed
pseusys Sep 28, 2024
1c97303
async file dependency removed
pseusys Sep 30, 2024
f5ceb2f
rename delete_main_info to delete_context
RLKRo Sep 30, 2024
cf27afa
fix load_field_items typing
RLKRo Oct 1, 2024
c1a24ee
rewrite db tests
RLKRo Oct 1, 2024
f2ec013
Merge branch 'refs/heads/dev' into feat/partial_context_updates
RLKRo Oct 1, 2024
cb22d12
small None checking update
pseusys Oct 3, 2024
8ba5aed
Merge branch 'feat/partial_context_updates' of https://github.com/dee…
pseusys Oct 3, 2024
d9b95f6
tests updated
pseusys Oct 3, 2024
7277bf9
mongo done
pseusys Oct 3, 2024
e1cb50d
redis done
pseusys Oct 4, 2024
782bf66
ydb finished
pseusys Oct 4, 2024
0fb487b
raise error in abstract method
RLKRo Oct 4, 2024
ff70324
update service tests
RLKRo Oct 7, 2024
b59cf95
Merge remote-tracking branch 'origin/feat/partial_context_updates' in…
RLKRo Oct 7, 2024
d3af3b2
update lock file
RLKRo Oct 7, 2024
e38e2d4
fieldconfig removed
pseusys Oct 10, 2024
de739f2
update benchmark utils
RLKRo Oct 11, 2024
eaa8a87
aiofile reverted
pseusys Oct 13, 2024
53bf877
misc tables removed
pseusys Oct 13, 2024
7629fbc
Merge branch 'feat/partial_context_updates' of https://github.com/dee…
pseusys Oct 13, 2024
757fe48
denchmark awaiting removed
pseusys Oct 17, 2024
a001c27
Merge branch 'refs/heads/dev' into feat/partial_context_updates
RLKRo Oct 18, 2024
96d05dc
update lock file
RLKRo Oct 18, 2024
1430544
fix context size calculation
RLKRo Oct 18, 2024
403e2e1
change model_dump mode
RLKRo Oct 18, 2024
5340256
key filter implementation
pseusys Oct 21, 2024
9aad1bb
Merge branch 'feat/partial_context_updates' of https://github.com/dee…
pseusys Oct 21, 2024
b32b367
ctx_dict hashes update added
pseusys Oct 24, 2024
edc85bd
added and removed sets cleared upon storage
pseusys Oct 24, 2024
e61b1b7
Revert "key filter implementation"
RLKRo Oct 24, 2024
d114d42
sql and file logging added
pseusys Oct 28, 2024
3619125
Merge branch 'feat/partial_context_updates' of https://github.com/dee…
pseusys Oct 28, 2024
5618484
debug logging added
pseusys Oct 28, 2024
5e6e223
use standard logging practices
RLKRo Oct 30, 2024
4323871
make logging more uniform across the methods and collapse long lists
RLKRo Oct 31, 2024
93144df
fix potential error in prefix parsing
RLKRo Oct 31, 2024
83c7b33
Merge branch 'refs/heads/dev' into feat/partial_context_updates
RLKRo Oct 31, 2024
b763f21
create tmp file only for file dbs
RLKRo Nov 1, 2024
69d1520
add test for load_field_items
RLKRo Nov 2, 2024
291396f
test fix: misc no longer context dict
RLKRo Nov 2, 2024
c3d8c73
test fix: load_field_items no longer returns dict
RLKRo Nov 2, 2024
4bb6ca7
test fix: field config was removed
RLKRo Nov 2, 2024
dbbbb28
remove debug artefact
RLKRo Nov 2, 2024
710554c
all user input escapedin ydb
pseusys Nov 6, 2024
20b6b5f
ctx_dict moved
pseusys Nov 8, 2024
2b6eebf
async lock introduced
pseusys Nov 8, 2024
6c458c6
codestyle fixed
pseusys Nov 14, 2024
46e0112
Merge branch 'dev' into feat/partial_context_updates
pseusys Nov 14, 2024
e263fa1
SOME of the errors FIXED!!!
pseusys Nov 20, 2024
1f96f6d
rebuild script updated
pseusys Nov 22, 2024
ce6c8b6
turns added, empty ctx_dict method also added
pseusys Nov 22, 2024
9e7cf47
context creation field set removed
pseusys Nov 22, 2024
c34f8e7
contex storage class splitted
pseusys Nov 22, 2024
1d3859c
rebuild was cleaned (once again)
pseusys Nov 22, 2024
5514c7b
turns added and tested
pseusys Nov 25, 2024
2b9b947
splitted database methods + locks and validations
pseusys Nov 27, 2024
86d745c
insert limit removed
pseusys Nov 27, 2024
214fb92
_locks removed from subclasses
pseusys Nov 27, 2024
5a8d0d5
lazy connection
pseusys Nov 27, 2024
abbd920
uuid length and name changed
pseusys Nov 28, 2024
b9a0680
logs location changed
pseusys Nov 28, 2024
0115b83
none and empty subscript forbidden
pseusys Nov 28, 2024
0587881
names extracted to a special class
pseusys Nov 28, 2024
e756f75
set strings removed
pseusys Nov 28, 2024
61619e3
configuration name changed
pseusys Nov 28, 2024
aad2c49
literal keys instead of strings
pseusys Nov 28, 2024
539005d
loggers from SQL removed
pseusys Nov 29, 2024
2feb094
connect before load in file
pseusys Nov 29, 2024
2ac91a2
logging moved to commect
pseusys Nov 29, 2024
f4e5f33
context dict made abstract
pseusys Nov 29, 2024
68a1c5f
connect moved to pipeline.run
pseusys Nov 29, 2024
8671233
ctx_dict overloads fixed
pseusys Nov 29, 2024
48b6444
configuration renamed
pseusys Nov 29, 2024
e40786c
context_info dataclass added
pseusys Nov 29, 2024
a54df18
test-time comparison fixed
pseusys Nov 29, 2024
49d3bff
lock staticmethod extracted
pseusys Nov 29, 2024
6fd0e1a
initial locking system fixed
pseusys Nov 29, 2024
47edbda
codestyle
pseusys Nov 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
dist/
venv/
build/
dbs/
docs/source/apiref
docs/source/_misc
docs/source/release_notes.rst
Expand Down Expand Up @@ -33,3 +34,4 @@ dbs
benchmarks
benchmark_results_files.json
uploaded_benchmarks
chatsky/utils/logging/*.log
RLKRo marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 5 additions & 0 deletions chatsky/__rebuild_pydantic_models__.py
RLKRo marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,18 @@
from chatsky.core.script import Node
from chatsky.core.pipeline import Pipeline
from chatsky.slots.slots import SlotManager
from chatsky.context_storages import DBContextStorage, MemoryContextStorage
from chatsky.core.ctx_dict import ContextDict
from chatsky.context_storages.file import SerializableStorage
from chatsky.core.context import FrameworkData, ServiceState
from chatsky.core.service import PipelineComponent

ContextDict.model_rebuild()
PipelineComponent.model_rebuild()
Pipeline.model_rebuild()
Script.model_rebuild()
Context.model_rebuild()
ExtraHandlerRuntimeInfo.model_rebuild()
FrameworkData.model_rebuild()
ServiceState.model_rebuild()
SerializableStorage.model_rebuild()
2 changes: 1 addition & 1 deletion chatsky/conditions/standard.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ def __init__(
super().__init__(flow_labels=flow_labels, labels=labels, last_n_indices=last_n_indices)

async def call(self, ctx: Context) -> bool:
labels = list(ctx.labels.values())[-self.last_n_indices :] # noqa: E203
labels = await ctx.labels[-self.last_n_indices :] # noqa: E203
for label in labels:
if label.flow_name in self.flow_labels or label in self.labels:
return True
Expand Down
7 changes: 3 additions & 4 deletions chatsky/context_storages/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
# -*- coding: utf-8 -*-

from .database import DBContextStorage, threadsafe_method, context_storage_factory
from .json import JSONContextStorage, json_available
from .pickle import PickleContextStorage, pickle_available
from .database import DBContextStorage, context_storage_factory
from .file import JSONContextStorage, PickleContextStorage, ShelveContextStorage, json_available, pickle_available
from .sql import SQLContextStorage, postgres_available, mysql_available, sqlite_available, sqlalchemy_available
from .ydb import YDBContextStorage, ydb_available
from .redis import RedisContextStorage, redis_available
from .memory import MemoryContextStorage
from .mongo import MongoContextStorage, mongo_available
from .shelve import ShelveContextStorage
from .protocol import PROTOCOLS, get_protocol_install_suggestion
256 changes: 118 additions & 138 deletions chatsky/context_storages/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,181 +8,153 @@
This class implements the basic functionality and can be extended to add additional features as needed.
"""

import asyncio
import importlib
import threading
from functools import wraps
from abc import ABC, abstractmethod
from typing import Callable, Hashable, Optional
from asyncio import Lock
from importlib import import_module
from pathlib import Path
from typing import Any, Callable, Coroutine, Dict, List, Literal, Optional, Set, Tuple, Union

from .protocol import PROTOCOLS
from chatsky.core import Context

_SUBSCRIPT_TYPE = Union[Literal["__all__"], int, Set[str]]
RLKRo marked this conversation as resolved.
Show resolved Hide resolved
_SUBSCRIPT_DICT = Dict[str, Union[_SUBSCRIPT_TYPE, Literal["__none__"]]]
RLKRo marked this conversation as resolved.
Show resolved Hide resolved

class DBContextStorage(ABC):
r"""
An abstract interface for `chatsky` DB context storages.
It includes the most essential methods of the python `dict` class.
Can not be instantiated.

:param path: Parameter `path` should be set with the URI of the database.
It includes a prefix and the required connection credentials.
Example: postgresql+asyncpg://user:password@host:port/database
In the case of classes that save data to hard drive instead of external databases
you need to specify the location of the file, like you do in sqlite.
Keep in mind that in Windows you will have to use double backslashes '\\'
instead of forward slashes '/' when defining the file path.

"""

def __init__(self, path: str):
class DBContextStorage(ABC):
RLKRo marked this conversation as resolved.
Show resolved Hide resolved
_main_table_name: Literal["main"] = "main"
_turns_table_name: Literal["turns"] = "turns"
_key_column_name: Literal["key"] = "key"
_id_column_name: Literal["id"] = "id"
_current_turn_id_column_name: Literal["current_turn_id"] = "current_turn_id"
_created_at_column_name: Literal["created_at"] = "created_at"
_updated_at_column_name: Literal["updated_at"] = "updated_at"
_misc_column_name: Literal["misc"] = "misc"
_framework_data_column_name: Literal["framework_data"] = "framework_data"
_labels_field_name: Literal["labels"] = "labels"
_requests_field_name: Literal["requests"] = "requests"
_responses_field_name: Literal["responses"] = "responses"
RLKRo marked this conversation as resolved.
Show resolved Hide resolved
_default_subscript_value: int = 3
RLKRo marked this conversation as resolved.
Show resolved Hide resolved

def __init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea about removing asyncio.run from init:
add connect method to DBContextStorage that will be called in pipeline.run and also in every other db method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So have we decided to implement lazy connection? Not connection during pipeline initialization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call it lazy.
The entry point to a chatsky program is usually pipeline.run where interface is connected.
My proposal is to connect db at the same time (in the entry point of a program).

The lazy part of this implementation is connecting db inside methods if it is not already connected which is done for the cases where pipeline.run is, for some reason, not used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant in the pipeline.run method (which currently calls messenger_interface.connect and is an entry point for every chatsky bot), not pipeline._run_pipeline.

Which is what I meant when I said that it isn't that lazy:
DB will be initialized before the first request is received.

self,
path: str,
rewrite_existing: bool = False,
configuration: Optional[_SUBSCRIPT_DICT] = None,
RLKRo marked this conversation as resolved.
Show resolved Hide resolved
):
_, _, file_path = path.partition("://")
configuration = configuration if configuration is not None else dict()
self.full_path = path
"""Full path to access the context storage, as it was provided by user."""
self.path = file_path
"""`full_path` without a prefix defining db used"""
self._lock = threading.Lock()
"""Threading for methods that require single thread access."""

def __getitem__(self, key: Hashable) -> Context:
"""
Synchronous method for accessing stored Context.

:param key: Hashable key used to store Context instance.
:return: The stored context, associated with the given key.
"""
return asyncio.run(self.get_item_async(key))
self.path = Path(file_path)
"""`full_path` without a prefix defining db used."""
self.rewrite_existing = rewrite_existing
"""Whether to rewrite existing data in the storage."""
self._subscripts = dict()
self._sync_lock = Lock()
for field in (self._labels_field_name, self._requests_field_name, self._responses_field_name):
value = configuration.get(field, self._default_subscript_value)
self._subscripts[field] = 0 if value == "__none__" else value
RLKRo marked this conversation as resolved.
Show resolved Hide resolved

@staticmethod
def _synchronously_lock(method: Coroutine):
def setup_lock(condition: Callable[["DBContextStorage"], bool] = lambda _: True):
async def lock(self: "DBContextStorage", *args, **kwargs):
if condition(self):
async with self._sync_lock:
return await method(self, *args, **kwargs)
else:
return await method(self, *args, **kwargs)

return lock

return setup_lock

@staticmethod
def _verify_field_name(method: Callable):
def verifier(self: "DBContextStorage", *args, **kwargs):
field_name = args[1] if len(args) >= 1 else kwargs.get("field_name", None)
RLKRo marked this conversation as resolved.
Show resolved Hide resolved
if field_name is None:
raise ValueError(f"For method {method.__name__} argument 'field_name' is not found!")
elif field_name not in (self._labels_field_name, self._requests_field_name, self._responses_field_name):
raise ValueError(f"Invalid value '{field_name}' for method '{method.__name__}' argument 'field_name'!")
else:
return method(self, *args, **kwargs)

return verifier

@abstractmethod
async def get_item_async(self, key: Hashable) -> Context:
async def load_main_info(self, ctx_id: str) -> Optional[Tuple[int, int, int, bytes, bytes]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create dataclass for main info:
It would look cleaner for the signatures and better annotation for what the values actually mean.

"""
Asynchronous method for accessing stored Context.

:param key: Hashable key used to store Context instance.
:return: The stored context, associated with the given key.
Load main information about the context storage.
"""
raise NotImplementedError

def __setitem__(self, key: Hashable, value: Context):
"""
Synchronous method for storing Context.

:param key: Hashable key used to store Context instance.
:param value: Context to store.
"""
return asyncio.run(self.set_item_async(key, value))

@abstractmethod
async def set_item_async(self, key: Hashable, value: Context):
async def update_main_info(
self, ctx_id: str, turn_id: int, crt_at: int, upd_at: int, misc: bytes, fw_data: bytes
) -> None:
"""
Asynchronous method for storing Context.

:param key: Hashable key used to store Context instance.
:param value: Context to store.
Update main information about the context storage.
"""
raise NotImplementedError

def __delitem__(self, key: Hashable):
"""
Synchronous method for removing stored Context.

:param key: Hashable key used to identify Context instance for deletion.
"""
return asyncio.run(self.del_item_async(key))

@abstractmethod
async def del_item_async(self, key: Hashable):
async def delete_context(self, ctx_id: str) -> None:
"""
Asynchronous method for removing stored Context.

:param key: Hashable key used to identify Context instance for deletion.
Delete context from context storage.
"""
raise NotImplementedError

def __contains__(self, key: Hashable) -> bool:
"""
Synchronous method for finding whether any Context is stored with given key.

:param key: Hashable key used to check if Context instance is stored.
:return: True if there is Context accessible by given key, False otherwise.
"""
return asyncio.run(self.contains_async(key))

@abstractmethod
async def contains_async(self, key: Hashable) -> bool:
async def load_field_latest(self, ctx_id: str, field_name: str) -> List[Tuple[int, bytes]]:
"""
Asynchronous method for finding whether any Context is stored with given key.

:param key: Hashable key used to check if Context instance is stored.
:return: True if there is Context accessible by given key, False otherwise.
Load the latest field data.
"""
raise NotImplementedError

def __len__(self) -> int:
@abstractmethod
async def load_field_keys(self, ctx_id: str, field_name: str) -> List[int]:
"""
Synchronous method for retrieving number of stored Contexts.

:return: The number of stored Contexts.
Load all field keys.
"""
return asyncio.run(self.len_async())
raise NotImplementedError

@abstractmethod
async def len_async(self) -> int:
async def load_field_items(self, ctx_id: str, field_name: str, keys: List[int]) -> List[Tuple[int, bytes]]:
RLKRo marked this conversation as resolved.
Show resolved Hide resolved
"""
Asynchronous method for retrieving number of stored Contexts.

:return: The number of stored Contexts.
Load field items.
"""
raise NotImplementedError

def get(self, key: Hashable, default: Optional[Context] = None) -> Context:
@abstractmethod
async def update_field_items(self, ctx_id: str, field_name: str, items: List[Tuple[int, bytes]]) -> None:
"""
Synchronous method for accessing stored Context, returning default if no Context is stored with the given key.

:param key: Hashable key used to store Context instance.
:param default: Optional default value to be returned if no Context is found.
:return: The stored context, associated with the given key or default value.
Update field items.
"""
return asyncio.run(self.get_async(key, default))

async def get_async(self, key: Hashable, default: Optional[Context] = None) -> Context:
"""
Asynchronous method for accessing stored Context, returning default if no Context is stored with the given key.

:param key: Hashable key used to store Context instance.
:param default: Optional default value to be returned if no Context is found.
:return: The stored context, associated with the given key or default value.
"""
try:
return await self.get_item_async(str(key))
except KeyError:
return default
raise NotImplementedError

def clear(self):
@_verify_field_name
async def delete_field_keys(self, ctx_id: str, field_name: str, keys: List[int]) -> None:
"""
Synchronous method for clearing context storage, removing all the stored Contexts.
Delete field keys.
"""
return asyncio.run(self.clear_async())
await self.update_field_items(ctx_id, field_name, [(k, None) for k in keys])

@abstractmethod
async def clear_async(self):
async def clear_all(self) -> None:
"""
Asynchronous method for clearing context storage, removing all the stored Contexts.
Clear all the chatsky tables and records.
"""
raise NotImplementedError


def threadsafe_method(func: Callable):
"""
A decorator that makes sure methods of an object instance are threadsafe.
"""

@wraps(func)
def _synchronized(self, *args, **kwargs):
with self._lock:
return func(self, *args, **kwargs)

return _synchronized
def __eq__(self, other: Any) -> bool:
RLKRo marked this conversation as resolved.
Show resolved Hide resolved
if not isinstance(other, DBContextStorage):
return False
return (
self.full_path == other.full_path
and self.path == other.path
and self.rewrite_existing == other.rewrite_existing
)


def context_storage_factory(path: str, **kwargs) -> DBContextStorage:
Expand All @@ -209,20 +181,28 @@ def context_storage_factory(path: str, **kwargs) -> DBContextStorage:
json://file.json
When using sqlite backend your prefix should contain three slashes if you use Windows, or four in other cases:
sqlite:////file.db

For MemoryContextStorage pass an empty string as ``path``.

If you want to use additional parameters in class constructors, you can pass them to this function as kwargs.

:param path: Path to the file.
"""
prefix, _, _ = path.partition("://")
if "sql" in prefix:
prefix = prefix.split("+")[0] # this takes care of alternative sql drivers
assert (
prefix in PROTOCOLS
), f"""
URI path should be prefixed with one of the following:\n
{", ".join(PROTOCOLS.keys())}.\n
For more information, see the function doc:\n{context_storage_factory.__doc__}
"""
_class, module = PROTOCOLS[prefix]["class"], PROTOCOLS[prefix]["module"]
target_class = getattr(importlib.import_module(f".{module}", package="chatsky.context_storages"), _class)
if path == "":
module = "memory"
_class = "MemoryContextStorage"
else:
prefix, _, _ = path.partition("://")
if any(prefix.startswith(sql_prefix) for sql_prefix in ("sqlite", "mysql", "postgresql")):
prefix = prefix.split("+")[0] # this takes care of alternative sql drivers
if prefix not in PROTOCOLS:
raise ValueError(
f"""
URI path should be prefixed with one of the following:\n
{", ".join(PROTOCOLS.keys())}.\n
For more information, see the function doc:\n{context_storage_factory.__doc__}
"""
)
_class, module = PROTOCOLS[prefix]["class"], PROTOCOLS[prefix]["module"]
target_class = getattr(import_module(f".{module}", package="chatsky.context_storages"), _class)
return target_class(path, **kwargs)
Loading
Loading