Five-signal retrieval formula that blends semantic similarity, recency, importance, intent match, and entity overlap for optimal AI context.
Every piece of context retrieved for the AI is scored using five signals: semantic similarity (Vectorize cosine distance), recency (exponential decay from last update), importance (type-weighted — spend data scores 0.85, board data 0.55), intent match (how well the item matches the classified intent), and entity overlap (shared clients, projects, or tasks with the query). The signals are combined using per-intent weighting profiles.
Different query types weight the signals differently. Financial queries emphasise recency (0.30 weight, 7-day half-life) because spend data becomes stale quickly. Process queries emphasise semantic similarity (0.35 weight, 90-day half-life) because documentation stays relevant longer. Search queries emphasise entity overlap (0.35 weight) to surface exact matches. All profile weights sum to 1.0.
To prevent the AI context from being dominated by a single type of information, a diversity penalty applies -0.08 per item beyond 3 of the same type. If the top results are all "spend" items, the fourth and fifth spend items get progressively penalised, letting other types (tasks, briefs, clients) surface. This produces more balanced and useful AI responses.
Items sourced from the database are reranked by Vectorize cosine similarity to the user query. A single Vectorize query generates embeddings and attaches a semanticScore to each item. This hybrid approach — database retrieval plus vector reranking — delivers better results than either approach alone, with graceful degradation if Vectorize is unavailable.