Skip to content
This repository has been archived by the owner on Jun 12, 2018. It is now read-only.

Allow to set the order of endpoints #21

Open
pp23 opened this issue Nov 16, 2015 · 12 comments
Open

Allow to set the order of endpoints #21

pp23 opened this issue Nov 16, 2015 · 12 comments

Comments

@pp23
Copy link

pp23 commented Nov 16, 2015

The endpoint member of a server is actually a std::map, which gets sorted automatically each time after an element was inserted. When starting the server, the endpoint map is copied into a vector without ordering. So the order of the map is kept.

If you have two endpoints "^/abc/?$" and "^/([a-z]+)/?$" in the map, then the more greedy endpoint will be used first because of the order of the map elements. Because the endpoint "abc" is contained in the regexp "[a-z]+", the endpoint functions (onOpen, onMessage, etc.) of "abc" will be never set.

Sorting the regexp-vector after copying the endpoints-map with respect to an endpoint-order set by the user should fix the problem. For instance I have added a member

        int order = 0;

into the public section of the endpoint class. After starting the server and copying the map a sort-function will be applied on the vector:

        std::sort(opt_endpoint.begin(), opt_endpoint.end(), [](const std::pair<boost::regex, Endpoint*>& a,
                                                               const std::pair<boost::regex, Endpoint*>& b)
        {
            return a.second->order < b.second->order;
        });

Do you see any problems adding this feature to the code?

@eidheim
Copy link
Owner

eidheim commented Nov 16, 2015

Are you sure this is needed? The regular expressions are just a general way to set a specific path and/or retrieve some parameters, but I might be wrong and there is actually need of an orderering of the enpoints. Is there a practical example where there would be a good reason to introduce ordering of the endpoints, instead of setting appropriate endpoint paths in the regular expressions?

@pp23
Copy link
Author

pp23 commented Nov 16, 2015

I want to create endpoints at runtime. So after updating to v1.2 I had to use more general regexpressions like "^/([a-zA-Z0-9]/?$". Some endpoints should be set appropriately. The problem was, that all connections get parsed by the greedy regexp because it's at the first position because of sorting. So the appropriate endpoint gets never parsed.

Btw: Is there a way to copy endpoints? So for instance I want to connect to "/abc" I get an endpoint parsed by the greedy regexp. When I want to connect to "/xyz" too it uses the same endpoint of "/abc" with the same onmessage()-functions. Can I get a new endpoint for "/xyz" to set other onmessage-functions than in "/abc"? This corresponds to issue #20

When I want to realize that with v1.2 I had to keep own maps for the connections-paths. I think there is no other way since I cannot add endpoints at runtime into the map, right? That means, I have to search for each incoming message the connection-paths-map for the current connection to process the message. I think it would be more efficient to get for each path a new endpoint processing the messages on this connection-path. So in v1.1 I created for each path a new endpoint which I could use to set specific onMessage-functions only for this path. It would be nice if this will also work in v1.2.x :-)

@eidheim
Copy link
Owner

eidheim commented Nov 17, 2015

Do not worry about performance issues on your own connection-path map, it's pretty fast, especially if you do lookups using unordered_map.

I still think you can add two different paths for these two endpoints, one "^/someendpoint/([a-zA-Z0-9]+)/?$" and one "^/someotherendpoint/([0-9]+)/?$".

What do you think?

@pp23
Copy link
Author

pp23 commented Nov 17, 2015

Well, that's right, setting these two different endpoints would be a good solution.

If I will use a connection-path map, I had to lookup a connection-path each time a message is coming in because of having only one real endpoint. If I could use the endpoint map to set endpoints at runtime I could manage the connections and messages for each endpoint separatly without an additional connection-path lookup. I think this will be faster and easier to handle.

In my application there won't be a lot of concurrent creations of new endpoints. So maybe I make the opt_endpoint-vector accessible and push new endpoints into it through a concurrent queue via a producer/consumer running in my application code? What do you think?

@eidheim
Copy link
Owner

eidheim commented Nov 17, 2015

You only need to check the connection path at onconnect, and do a close there if its not a valid path I guess.

@eidheim
Copy link
Owner

eidheim commented Nov 17, 2015

I would try avoid adding new endpoints, even if you add a mutex and do it safely, it locks down the asio pool.

Edit: changed asio queue to pool, also I guess it would not matter much if it did.

@eidheim
Copy link
Owner

eidheim commented Nov 17, 2015

It would not matter much I think if you do the endpoint path check at onmessage (in case you save a state or something in your application), or you do it using endpoints. The runtime would be the same is my guess.

Edit: or wait, the above is not entirely correct, I see the point of creating endpoints on the fly here, although most likely one would need to check some stored state of the endpoint, so some kind of lookup might be needed anyway?

@pp23
Copy link
Author

pp23 commented Nov 17, 2015

Ok, I will give it a try ;-) I think of a code like this:

You have an endpoint ep="^/a/[0-9]+/?$". Now I want to connect to "/a/1", "/a/2",...
Code would be:

ep.onopen = [](connection){
connections.insert(connection.path, DoSomethingObject);
};

ep.onmessage=[](connection,message){
connections.at(connection.path)->doSomething(message);
};

I think I need to make the connections-map also thread-safe to handle multiple concurrent messages? This would make messaging very slow I guess.

This is the way I would like to do it:

ep.onopen = [](connection){
ep2 = endpoint[connection.path];
ep2.onmessage = [](connection,message){
doSomething(message);
};
};

So each time a new path gets accessed, I create a new endpoint for it managing all connections and incoming messages only for this endpoint. I guess if there will be much more messages than endpoint-creations this would be faster than a lookup in a concurrent connection-map? But I will give the connection-map a chance ;-)

@eidheim
Copy link
Owner

eidheim commented Nov 17, 2015

I think it is very common to do it like this, and the slight delay you get from the lookup is very small compared to the time it takes to transfer data back and forth. Yes, you should add a mutex lock/unlock around connections.insert and connections.at.

@pp23
Copy link
Author

pp23 commented Nov 17, 2015

I will try it...but I still don't really see a problem to replace the std::vector of opt_endpoints with a concurrent vector like

https://www.threadingbuildingblocks.org/docs/help/reference/containers_overview/concurrent_vector.htm

and make it accessible. With a concurrent opt_endpoints vector you just need to lock it when a new endpoint will be created or a new connection is coming in. But with the mutex for the connection-path map I need to lock it always when a new connection gets opened or a new message is coming in. So if there will be many many messages, the connections map will be most of the time locked and no new connections can be inserted anymore.

When I have enough time, I will try both solutions and make some performance tests.

@eidheim
Copy link
Owner

eidheim commented Nov 17, 2015

I would say its a design issue (for instance having a set of defined endpoints with corresponding functions, instead of creating these on the fly) more than creating computationally optimal code. I would focus on the former in this case, since the bottleneck will be reading from file/database, network messages, and processing of messages and responses.

Also, most likely, one would have to access some stored resource like the endpoint path within the same mutex lock/unlock in an onmessage function. I doubt that this lock will affect the server notably, but your tests might prove me wrong:)

By the way, do you know of any other websocket library that makes it possible to create endpoints on the fly? I am still considering this, but am trying to keep the library as simple as possible and still support websocket use that similar libraries support in other languages.

@pp23
Copy link
Author

pp23 commented Nov 17, 2015

Also, most likely, one would have to access some stored resource like the endpoint path within the same mutex lock/unlock in an onmessage function. I doubt that this lock will affect the server notably, but your tests might prove me wrong:)

I'm a bit concerned that a lot of messages could lock the connections-map too long, so that some connections have to be refused. I'll test it and let you know about the results ;-)

By the way, do you know of any other websocket library that makes it possible to create endpoints on the fly? I am still considering this, but am trying to keep the library as simple as possible and still support websocket use that similar libraries support in other languages.

No, I don't. But your lib is the one and only I know for C++ and that one I like since the first time I have used it :-) (maybe because of the possibility to add endpoints at runtime in v1.1... I know, this was not thread-safe but for some first tests this unwanted feature let me smile)

Well, to keep simplicity maybe a little helper class as a proxy for the emplace_back-function and iterator functions of std::vector doing the locking stuff would be helpful. It would be even easier to use the concurrent_vector of tbb, but this needs the tbb-lib and would probably break your only-header-lib concept.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants