-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add optional datetime range filter when parsing large calendar files #474
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #474 +/- ##
============================================
+ Coverage 98.68% 98.76% +0.07%
- Complexity 1796 1820 +24
============================================
Files 65 65
Lines 4272 5184 +912
============================================
+ Hits 4216 5120 +904
- Misses 56 64 +8
Continue to review full report at Codecov.
|
Doing this 'right' might be a more difficult than you think, especially when things like recurrence and exceptions come into play. You might need to do 2 passes of the file. The first to create an index of related events, and a second to do the actual filtering. I'm a little on the fence here. I also feel that the use-case (reducing 30MB memory to 5MB) is not incredibly significant if this is mostly used for one-off purposes. It's definitely a significant number in terms of memory usage, but is this something you're dealing with very often? |
Basically what I'm working on is a personalised teacher student booking site. I dont particularly want to go down the route of self hosting a caldav server but that might be the. Ext best option. The plan was to allow teachers (which theres only 2 of at the moment) to pass a public ical link to the server to be parsed to find the freebusy times for when students can create booking. Most of the calendars I'm dealing with are hosted on the apple servers so I'm struggling to find a sensible way to query these with caldav requests. As mentioned one of the calendars has over 2000 events and once parsed with reader::read and everything is put into memory it around 30MB. Once reduced down to the events I'm interested in (I was wrong about 5MB that was also including another calendar) it's closer to 1.5MB. |
I guess where I'm sort of coming from is that even at the 30MB point, you can still run about >100 in parallel in theory. If it takes 5 seconds to parse one of these larger ones, 1200 this system should be able to handle 100K of these calendar updates per hour. I'm just wondering: is that really your bottleneck? |
You're right its probably not a bottleneck at all. I just thought a method like this could've been a quick win. I was assuming the functionality would more or less be the same as the |
@8633brown there is a difference with the FreeBusyIterator, because this object actually right now needs the full calendar in memory to do its thing. It specifically uses Specifically one issue here is recurring events and exceptions. To calculate all instances of a recurrence, you need to need to have all VEVENT objects that have a matching UID. UID's are not unique and when multiple events share a UID it means they belong together. There's no guarantee that events with the same UID appear in sequence. |
Ahh interesting. So when calling the event iterator within the isInTimeRange function on this line, if I was to call this on a recurring event without all other occurrences with the same uid it could give me the wrong result? As isInTimeRange only passes a single VEvent to the event iterator? Would it be an option to not filter recurring events. And only filter events which dont have an RRule? |
I don't consider myself a maintainer anymore, so ultimately it's up to others. However, I would lean towards 'no' because I feel it's a bit of an edge case, and somewhat imperfect due to the 'recurrence exception' issue. I can see a need for this at some point, but maybe there's other ways to solve it. The VCardSplitter class for example stemmed from a similar issue with large VCard exports. This was a bit more grounded in real issues, because addressbook exports can be really massive due to pictures being stored in vcards. Possible in iCalendar too, but more rare. For what it's worth, CalDAV does have a similar problem and also has a popular 'filter by timerange' feature. This is solved in sabre/dav by first creating indexes on events and finding out the 'first time an event occurs' and the 'last time an event occurs', and after that unprecise filter is applied, events are filtered again with a slower precise filter that expands every occurrence. |
closes #473. managed to figure out the
Component\VEvent::isInTimeRange
method and figured it was the right method for this task.