Recommended data structures/algorithms for checking peoples' availability schedules

Question

I'm working on a platform that allows assigning users to events manually. Every user provides their general availability (Mondays 2PM - 8PM, Tuesdays not at all, Wednesdays 3:30PM-7PM, and so on). Multiple events can take place in parallel and can overlap. The restrictions for finding available users for a particular event are:

The event must fully fall into the user's defined availability slots.
A user can attend only one event at a time, so no overlap.

At the moment, we run through all the users, their availability slots, and their assigned events in order to determine whether they're available for a specific event. This search can obviously take some time. Is there a "good" way to perform this kind of computation? What are the optimal data structures here? (If it makes a difference, we're talking about a Java Spring service on top of a PostgreSQL DB. But all of this can be challenged if required.)

It feels like a special case of the Nurse Scheduling problem. I haven't found a better solution yet other than comparing bit by bit and braking early in case of a clash.

EDIT: Some have been asking for example numbers. The largest ones we have so far are around 3500 users for 600 events, with each event requiring between 1 and 10 users. We are expecting scenarios of up to 15000 users for 5000 events.

EDIT 2: As for the availability slots: Users specify their availability times per weekday in blocks to half hour granularity, and there's an option to mark specific dates as completely unavailable (intended primarily for holidays). Events are scheduled with start and end times up to a granularity of 5 minutes.

EDIT 3: To clarify the idea: We've got many "assignee" users that can be assigned to events by a much smaller number of "master" users. This works by the master users selecting an event which they want to assign one or more assignees to. They're being presented with a selection of available assignees to choose from. The point of this question is determining this selection in an efficient manner. Aside from this, master users can select a group of events and open this up for assignees to self-assign. For this purpose, assignee users are presented with a selection of events open for self-assignment that fit their availability.

EDIT 4: Since this has been asked, I'm concerned with the issue of finding available assignees for particular events. The DB model for the base data is straightforward enough.

Ray, give us some realistic numbers. How many users are we talking of - 10, 1000, 110 million? How many events? How many average availability slots per user? What is the time granularity - 15, 30 or single minutes? How many event queries per hour do you expect? How long does a query currently take in average? You will need these numbers not just for us, but also for yourself, to compare the effectivity of any measures you try. — Doc Brown
– Doc Brown, Commented Nov 26 at 6:27
Note, this is NOT a scheduling NP hard problem! As each assignment is done manually, the system does not need to decide which event to assign a user to. All decisions are made by dispatcher during previous assignments, or are being done now given available users. I.e. the computational task is to show available users, not to optimize their assignments. — Basilevs
– Basilevs, Commented Nov 26 at 12:02
After rereading the question, I'm a little confused. You start with "allows assigning users to events manually" and then you say "we run through all the users, their availability slots, and their assigned events". Are you trying to list events for a user to choose or is this some sort of bulk assignment process? — JimmyJames
– JimmyJames, Commented Nov 26 at 22:06
Great update. Thanks. So if I understand correctly, given an event, you are trying to find an available set of people that can be assigned to that event? — JimmyJames
– JimmyJames, Commented Nov 27 at 3:46

Arseni Mourzenko · Accepted Answer · 2025-11-28 00:12:22Z

One way to do it is to represent each user and each event as a bitmap. Essentially, each half hour corresponds to a bit, the entire day corresponding to 6 bytes. 0 means the person is available, and 1—that the person is not available. Same for the events.

Now, all operations become operations on bitmaps (XOR, etc.), which are usually very fast and still relatively space efficient—you can use SIMD, or do the computation in GPU—essentially rely to any techniques that are used when working with graphics. Chances are, you can just load everything in memory at once, do the computation you need, then save the result.

Now, if you decide to go this way, there are two important points:

Do it as an alternative to the naive, most basic implementation. This way, you'll be able to (1) measure the performance of the original implementation to decide if it is acceptable, and if not, (2) compare the performance of both implementations to see how much you saved.
Decide at which point you need to go from the actual structured data to bitmaps. Very likely, you want the data to be stored in the database in a human-readable format: as highlighted by Flater in the comment below, bitmaps are not well fit for things like recurring schedules (and obviously all but human-readable: looking at, say, 0xC4399F2A03, it is not necessarily obvious that the person is available tomorrow from 3 p.m. to 3:30 p.m., but then busy until 5 p.m. This means that you'll need to convert the data back and forth. This process would also take time, that is important to measure as well.

I was curious to see if this approach is viable, so I decided to make a quick test. As I was poking with code, I made a few discoveries that could be useful to share.

But let's start with the actual code. I wanted to start with something small, just to see if it works correctly: fifteen users, five events, and only one day span instead of a week. And I'm doing it in Python—which appeared to be handy for one particular aspect.

import random
import time

half_hours = 1 * 24 * 2  # 48 bits.
assert half_hours % 8 == 0
bitmap_size = int(half_hours / 8)  # 6 bytes.
total_users = 15
total_events = 5

To create sample data, there is no need to be particularly realistic: users' schedules are completely random (say, the person is available from 3 a.m. to 3:30 a.m., then busy for half an hour, then available for two hours, then busy... and so on), while meetings are always consecutive blocs that span 30 minutes to 3 hours and start at a random time.

def create_users():
    for _ in range(total_users):
        yield int.from_bytes(random.randbytes(bitmap_size))

def create_events():
    for _ in range(total_events):
        duration = random.randint(1, 6)  # 30 minutes to 3 hours.
        pattern = {
            1: 0b1,
            2: 0b11,
            3: 0b111,
            4: 0b1111,
            5: 0b11111,
            6: 0b111111,
        }[duration]
        start_time = random.randint(0, half_hours - duration)
        yield pattern << start_time

Here's, finally, the thing that takes those users and events, and determines, for each event, how many participants could join.

def b(value):
    return bin(value)[2:].rjust(half_hours, '0')

def run():
    users_slots = list(create_users())
    events_slots = list(create_events())

    print("Users:")
    for u in users_slots:
        print(b(u))

    users_per_event = []

    for e in events_slots:
        count = 0
        for u in users_slots:
            if u & e == e:
                count += 1
        users_per_event.append((e, count))

    most_popular_counter = 0
    for _, count in users_per_event:
        if count > most_popular_counter:
            most_popular_counter = count

    print(f'Luckiest event has {most_popular_counter} potential participant(s).')

    print("Events:")
    for e, count in users_per_event:
        print(f'{b(e)}: {count}')

if __name__ == '__main__':
    run()

Here's an example of an output. Naturally, the data being random, each time the script runs, the binary patterns are different, and so would be the counters.

Users:
010101110001001101000011010110110110100001101000
001011111110111111101001010011011001110011111000
111010010101101000110001010011010010010111111101
111001101001100010000000000011001001100101001000
011011110100001011100100101100011110111001000010
010001110010111011001100110000111100101101011010
001111110111100011001110101100101100111110000010
111100111111100101111111101110010001010111101110
000100101000011101100010110101110010100111101111
010100111000001110110010110011111111110000100100
000000100111001011111101001100001001110101010101
111001011010000011010000001111110010100000100101
100110111000110100111011111111000101110000100011
010000111110000001001001101100101011010011010111
010100100101000100110101111101110001111000111011
Luckiest event has 3 potential participant(s).
Events:
000000000000000000000011000000000000000000000000: 3
000000000000000000000000000000000011111000000000: 0
000000000000000011100000000000000000000000000000: 3
000000000000000000000001111100000000000000000000: 2
000000000000000011111000000000000000000000000000: 1

The number at the right of each event indicates how much users can attend this event. It is relatively easy to see how it works visually. Taking, for instance, the third event, one can spot that the second user in the list has all ones matching the ones of the event. So does the fifth user, and another one towards the middle of the list. The algorithm seems to be working.

Now, let's modify the script a bit, by:

Increasing the span from one day to a week, as well as increasing the number of users and events.
Removing the binary output.
Adding the thing that measures how long the code spends doing stuff.

half_hours = 7 * 24 * 2  # 336 bits.
assert half_hours % 8 == 0
bitmap_size = int(half_hours / 8)  # 42 bytes.
total_users = 15000
total_events = 5000

[...]

def run():
    users_slots = list(create_users())
    events_slots = list(create_events())

    start = time.time()

    users_per_event = []

    for e in events_slots:
        count = 0
        for u in users_slots:
            if u & e == e:
                count += 1
        users_per_event.append((e, count))

    end = time.time()

    print(f'Spent {end - start:.3f} seconds.')

    most_popular_counter = 0
    for _, count in users_per_event:
        if count > most_popular_counter:
            most_popular_counter = count

    print(f'Luckiest event has {most_popular_counter} potential participant(s).')

Running it on an Intel i5-10600 3.30 GHz CPU, using a single core, the output is:

Spent 3.466 seconds.
Luckiest event has 7680 potential participant(s).

Observations:

Three and a half seconds (YMMV) is not too bad for a naive approach that don't use SIMD or even parallel computing.
I tried a parallel approach with concurrent.futures.ProcessPoolExecutor, but it led to the time increasing up to seven seconds, because the large list of users is copied. One needs to use shared memory or find another approach, but I didn't have the patience to poke with it long enough.
I'm surprised how easy it is to work with data, actually. Originally, I didn't agree with Hans-Martin Mosner's comment, as I was believing that bitmaps should be very constrained—limited to the actual computations, but should be converted to human readable format as soon as possible. After playing with the Python script, I would rather think that binary approach is quite clear. If enough quality tooling is developed around it, it is definitively a viable approach even considering bitmaps being stored in the database.
The algorithm, too, is at its simplest. I mean, it's literally u & e == e!
I'm cheating a bit, since Python has no maximum limit for integers. This being said, manipulation of bytes in other languages should work similarly.

OP is using examples of recurring schedules, which doesn't naturally jive with your suggestion of a pre-baked bit string, given that a recurring availability schedule would need to be applied dynamically to any and all future dates/events. Not saying it can't be done, but this answer requires more of an elaboration on that point for it to meaningfully answer the OP's question. — Flater
– Flater, Commented Nov 26 at 23:21
In addition to being a very viable solution when keeping the bitmaps in memory, this would likely even work well in the database as PostgreSQL has bit string data types with boolean operations (including aggregate functions) which may lend themselves well to combining recurrent schedules, assigned events and holiday exemptions for users through SQL statements. The list of people available to be booked on an event could be returned from a single database query. — Hans-Martin Mosner
– Hans-Martin Mosner, Commented Nov 27 at 10:24
This sounds intriguing and is one of these crazy tricks I come across very rarely which is why I didn't consider it even remotely. Just for my understanding: As the events can be scheduled to a granularity of 5 minute slots, we'd require a bit for each of that, right? So that's 12*24 = 288 bits or 36 bytes. That's a considerably larger number than with the half hour example. But it would still fit into a long in Java, so it should be technically possible. I'll consider it (probably only for computation/caching purposes, persisting this sounds awkward). Thanks. :-) — Ray
– Ray, Commented Nov 27 at 17:02
@Ray: with a granularity of 30 minute slots, you're at 336 bits for a week, or 42 bytes. If you need 5 minute slots, that's 2016 bits (12×24×7), or 252 bytes. I also added an example in my answer, with the actual performance data. — Arseni Mourzenko
– Arseni Mourzenko, Commented Nov 28 at 0:16

Doc Brown · Accepted Answer · 2025-11-27 06:05:40Z

2

At the moment, we run through all the users, their availability slots, and their assigned events in order to determine whether they're available for a specific event.

The first thing you want to get out of this process is the repeated checking of the assigned events of a user:

A user has availability slots (time intervals, pairs of from-to values).
When a users gets assigned to an event, his/hers availability slots are reduced (maybe one availability slot will be split). As a result, a user has a changed set of availability slots. Fullstop. No further need to look where these availability slots were coming from.
The availability slots of a user don't overlap, so they can be strictly ordered in a simple array. This makes it possible to look for the starting time of the specific event by a binary search and find out if it is falling into one of the availability slots (since availability slots don't overlap, this results in one slot at maximum).
It remains to check whether the end time of the specific event fits into the found availability slot, or not.

Average order of running time per single query: in theory, this will be O(log(A) x U), where A is the average number of availability slots per user, and U the number of users. Beware, when you need to retrieve all the availability slots first for each user, for each query, the retrieval itself becomes O(A), and the binary search will gain you nothing. Ideally, you will cache the availability slots for the users once beforehand and then run multiple event queries. If you can't, the running time order becomes O(A x U), which is what you seem to have now.

As long as you don't have millions of users where only a small percentage are eligible for a certain event, I don't see any need for further optimizing the data structures. Just run through each user, apply the test above (which should be quick), and return the found users. Whenever an event gets assigned to a user, update the availability slots of that user.

According to your edit: it seems you do not even need to run your queries against the full set of all users, only against smaller subsets. That makes it even more feasible to determine availble users by just iterating over the candidates.

edited Nov 27 at 6:05

answered Nov 26 at 6:48

Doc Brown

221k35 gold badges410 silver badges625 bronze badges

This would require converting the availability data into individual dates covering the range from the first to the last events. Still sounds worthwhile exploring. As I'm on the JVM, I'd prefer a TreeSet over a simple array, but otherwise I'll experiment with this idea. Thanks. :-)

Ray
– Ray

2025-11-27 16:27:30 +00:00
Commented Nov 27 at 16:27
@Ray - Not necessarily - all that is required is to sort the data via some total ordering; your library probably offers a sort function that accepts as an optional argument a comparison function that determines the order. You can either sort the data in place in an array, which may provide some performance benefits due to cache locality. If that's not convenient for some reason, this may not be the best approach, but you can also leave the data as is, and sort an array of pointers or indices, again via a custom comparison function that gets the actual objects and compares them.

Filip Milovanović
– Filip Milovanović

2025-11-27 16:39:23 +00:00
Commented Nov 27 at 16:39
@Ray: how is your availability data modeled currently (if not by individual dates, or date pairs)?

Doc Brown
– Doc Brown

2025-11-27 16:40:39 +00:00
Commented Nov 27 at 16:40
I thought I mentioned this in the post: Per user, it's a set of days of the week with time slots + blocked individual dates (e.g. for vacations). So an assignee can configure something like: "I'm available every Monday from 14:00 - 21:00, every Tuesday from 10:00-12:00 and 14:30-21:00, not at all on Wednesdays, ... (and so on for the remaining weekdays). On top of that, I'm on vacation from the 27 of November to the 4th of December this year." Does this make sense?

Ray
– Ray

2025-11-27 16:49:22 +00:00
Commented Nov 27 at 16:49
1

@Ray: you mentioned your users can specifiy their availability time this way, not that the data is actually modeled that way. Yes, this makes perfectly sense. Converting these descriptions to time intervals may simplify the test whether a user is available for a certain event. Still, I think you need to verify if the combined timed for the conversion plus the test(s) are really faster than using the availability descriptions directly.

Doc Brown
– Doc Brown

2025-11-27 19:07:21 +00:00
Commented Nov 27 at 19:07

Add a comment |

Basilevs · Accepted Answer · 2025-11-28 15:38:34Z

2

_{Doing search service side is a very bad idea. You would have to keep assignments synchronized with DB doing error-prone cache invalidation and potentially moving significant volumes of data around.}

Do not do full scans service-side. Use DB to do searches:

CREATE TABLE workers
(
  name VARCHAR NOT NULL PRIMARY KEY
)
CREATE TABLE assignments
(
  worker VARCHAR NOT NULL REFERENCES workers(name),
  during tsrange NOT NULL,
  CONSTRAINT overlapping_times EXCLUDE USING GIST (
        worker WITH =,
        during WITH &&
    )
)
INSERT INTO workers VALUES
  ('John'),
  ('Mary')
INSERT INTO assignments VALUES
  ('John', '[2021-01-01 00:00:01, 2021-01-01 00:09:01)'),
  ('John', '[2021-01-01 00:13:01, 2021-01-01 00:14:01)'),
  ('Mary', '[2021-01-01 00:05:01, 2021-01-01 00:16:01)')

Find busy workers:

SELECT worker from assignments WHERE during && tsrange('[2021-01-01 00:10:01, 2021-01-01 00:11:01)')

worker
Mary

Find available workers:

SELECT name FROM workers LEFT JOIN assignments
  ON assignments.worker = workers.name AND during && tsrange('[2021-01-01 00:10:01, 2021-01-01 00:11:01)')
  WHERE assignments.worker IS NULL

name
John

fiddle

I'm not sure which additional indexes would be optimal for this case, but btree-gist module surely has necessary operators to build them. There is a chance, that all indexes that might be needed are already created by CONSTRAINT directives.

edited Nov 28 at 15:38

answered Nov 28 at 15:08

Basilevs

4,5071 gold badge20 silver badges33 bronze badges

2

I like this answer, especially because I have learned something new about PostgreSQL specific data types, constraints and indexes. Whether it is applicable, however, depends on whether the OP can manage it to restructure their current db model using tsrange for availability intervals. This isn't obvious, and the OP, though we asked them more than once about it, stayed very vague in the clarifications.

Doc Brown
– Doc Brown

2025-11-29 09:38:57 +00:00
Commented 2 days ago
@DocBrown GIST indexes work without ranges too: Postgres date overlapping constraint

Basilevs
– Basilevs

2025-11-29 10:02:41 +00:00
Commented 2 days ago
I don't see where this addresses the requirement that "The event must fully fall into the user's defined availability slots." It just seems to address the scheduling constraint. Am I missing something?

JimmyJames
– JimmyJames

2025-11-30 17:40:06 +00:00
Commented 23 hours ago
@JimmyJames can be either busy or available at any given moment. Availabily slots are mapped to absence of assingments. Assignments may include, lunch, vacation, PTO or just "Unavailable".

Basilevs
– Basilevs

2025-11-30 17:46:30 +00:00
Commented 23 hours ago
That still doesn't seem to address the issue, though. The first requirement is that the assignments must fit into one or more given availability ranges. The question doesn't specify what that looks like but I understand it might be something like Fridays from 04:00-22:00 or, January 26 though May 26. For the former, would you add a 'busy' entry for every week from Friday 22:00 through next Friday 04:00?

JimmyJames
– JimmyJames

2025-11-30 18:01:36 +00:00
Commented 22 hours ago

| Show 7 more comments

JimmyJames · Accepted Answer · 2025-12-01 16:36:08Z

I've been overthinking this problem. I think there's a simple answer that is easy to implement and should perform more than adequately for your needs.

This is easily solved using Postgres. The high-level logical structure is this:

I have my doubts about whether it's worthwhile but if you want to normalize this structure, you can add another table:

And then what you want is the difference of two sets:

users with availability that contains the event
users with assignments that conflict with the event

That is, you want the list of users for 1 without any of the users from 2.

Postgres has an EXCEPT clause for this purpose. It works similarly to a UNION clause in that you use it between two compatible select statements. But instead of combining the results, the results from the second query are removed. With the simple design above (the first diagram) the query would look like this:

SELECT a.id
FROM availability a
WHERE a.start <= :event_start AND :event_start < a.end
  AND a.start < :event_end AND :event_end <= a.end 
EXCEPT
SELECT a.id
FROM assignment a
WHERE (:event_start <= a.start_date AND a.start_date < :event_end)
   OR (:event_start <= a.end_date AND a.end_date < :event_end)
   OR (a.start <= :event_start AND :event_start < a.end
     AND a.start < :event_end AND :event_end <= a.end)

The above is more clearly written as:

SELECT av.id
FROM availability av
WHERE (:event_start BETWEEN av.start AND av.end)
  AND (:event_end BETWEEN av.start AND av.end) 
EXCEPT
SELECT as.id
FROM assignment as
WHERE (as.start_date BETWEEN :event_start AND :event_end)
   OR (as.start_end BETWEEN :event_start AND :event_end)
   OR ((:event_start BETWEEN as.start AND as.end)
    AND (:event_end BETWEEN as.start AND as.end))

Note: The BETWEEN operator treats the endpoint as inclusive. The first SQL query above (without between) treats the end as exclusive. I would generally prefer exclusive endpoints but this fact might tip the balance in favor of inclusive end dates.

For performance, you should add range searchable indexes to these tables. Some index types will only work with exact matches and will not be used in this kind of query. If you have the appropriate indexes on this, it should execute efficiently. You can also use a NOT IN condition instead of an EXCEPT clause if you prefer. I expect the general performance of the two should be the same. Consult a DBA or the DBA stack for more detailed performance guidance for Postgres.

Technically, that last OR condition only needs to check for either the start or the end of the new event being found with an assigned period. I'm leaving both for clarity.

This answer uses scalar types for simplicity. You should consider the range types in Basilev's answer. They look to be especially useful for constraints around this kind of problem. And if you do go with a single ranged value for the periods, I don't think there's any reason use a fourth table as in the second relationship diagram above.

Kain0_0 · Accepted Answer · 2025-11-28 05:01:40Z

Interesting problem. My university used simulated annealing to shift lectures, rooms, times, and students around to form a schedule which they manually tweaked before publishing for the semester.

Sounds to me like you want this to be somewhat more dynamic though.

A spatial tree feels right for this. An R*-tree would be ideal but others can do the job. That's a region Tree with Leaf-node walking.

We need two of them for a given period of time (a day, a week, a month, up to you really). The first stores the events, the second attendee's availability.

Both tree will start with a single node containing no entries, marked for the entire time that the tree represents. This is so that latter we don't have to check if nodes have consecutive time stamps, we know that the structure is dense, therefore the next node is always for the slot of time after this one.

When a user says I'm available between time x and time y, you insert them into the tree for that region of time. We follow the tree down break apart the node found if it doesn't start at the same time the attendee starts being available. Add them into the node, and each subsequent leaf node until you find the node that matches the attendees end of availability. (Break it apart if that node extends past it).

Repeat for each interval of each users time. Given your system is probably online this allows users to add new availability easily at their leisure. Bulk inserts are also possible.

Removing time is similar, just avoid the merge step. Because time is always marching forward you'll pay more to remove unnecessary nodes. Perhaps rebuild the tree offline if there has been some threshold of deletes reached, or if a tree built for say two years from now becomes next months tree.

Do all of the above for the events as well.

To find out the list of available attendees for an event. lookup up the attendees tree for the start of the time period, and grab a list of each leaf from there to leaf containing the end time. Intersect the lists on those nodes Whoever is left is available.

To find the events available, grab the events tree Per interval of available time -> Find the first leaf node that starts after the attendees becomes available, and union with each next node until it ends after the attendee becomes unavailable. Grab the nodes on either side of the union and subtract them from the union. This is the list of available events for that attendee.

Why?

For the events finding attendees: the fact that attendees are listed as available in a region of time is all that is needed. Intersecting with each leaf node that makes up that time ensures they are available for the entire block of time. If that's a single node great.
For attendees finding events: the event needs to be present at least once in the period of time they are available. But the event might start earlier or end later, that is why we subtract the node starting before and the node ending after the attendee's availability - even if those nodes are also partially in the region of availability with the attendee. Their presence there means they must start before or end after.

An optimisation might be to make the tree for each set of PreReqs that an Antendee has, or an Event needs. This allows you to reduce the sets to be explored upfront. The downside is that the event/attendee will need to be added to multiple trees for the same time period as attendees should always show up in the no prereqs tree.

Another optimisation is to make the lists of events users a bit table, and have a secondary entry table to properly reference them. This allows for quick bit ops for intersection, union, and subtraction. It also make node splitting a faster op (copy binary blob, versus duplicating references). It does add in a translation step of mask to actual data, but that isn't much of a slow down.

This works great for one off events, but you probably want to offer some better services like highlighting if there is enough event capacity for those who wish to attend, and how flexible your attendees are.

So Attendees wishlist a event type.

An organiser wants to know if a given set of times satisfies demand. When an attendee wishes for the event type we add their availabilities to a new tree just for the event type. The organiser can now see this tree plotted out as a graph over a day/week/month and can position events over periods of time where availability is enough to warrant it.

Even better with a set of classes in mind we can ask for lists of attendees. We just count how many times an attendee is available for an entire class and gives us a flexibility measure for each attendee. The organiser knows who to prioritise, or how good their offering of times is.

You should be able to do something similar for the availability of events given conflicting wishes. To guide an attendee into choosing events that give them more flexibility later.

Stack Exchange Network

Recommended data structures/algorithms for checking peoples' availability schedules

5 Answers 5

Your Answer

Hot Network Questions

Recommended data structures/algorithms for checking peoples' availability schedules

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions