Formulas and Definitions

There are multiple metrics used both in information retrieval and recommender systems that are analagous to standard metrics. Precision and recall at k (also called precision@k and recall@k) both answer the simple question of “whats the precision/recall if I retrieve k documents using my system”.

$$ p@k = \frac{|\{\text{relevant documents}\}\cap\{\text{top k retrieved documents}\}|}{k} $$

But with both of these metrics, the constant k needs to be known. A user that only interacted 3 times with items will have a maximum p@5 of 3/5, but for a user with hundreds of interactions, scoring a good p@5 would be too easy for our system. Furthermore, if you have hundreds of interactions for every user, the recall will be pretty low.

If we could just vary the k for each user… Thats when R-precission comes in handy. Is like precision@k, but the k is different for each user, and is equal to the number of relevant items for the user. The difference between the old simple recall and r-precision is that the number of documents to retrieve is equal to the number of relevant documents.

$$ r-precision = \frac{\left|\{\text{relevant documents}\}\cap \{\text{top R retrieved documents}\}\right| }{R} $$

Where \(R = |\{\text{relevant documents}\}|\)

Implementation and PyTorch Geometric code

Let’s imagine the graph is implemented as a tensor edge_index of size [2,n_users], where edge_index[0] is the source of each edge, and edge_index[1] is the destination. We also have a model with a method model.recommend (like LightGCN), that, given a value k returns the top k recommendations of nodes.

The code of the following function is thoroughly commented to make it easier to understand. It receives a k and returns both the precision and recall at k, and the R-precision.

I assume that the model and the graph are out of the scope of the function (they are global variables, or this functions is inside a bigger function).

@torch.no_grad()
def prec_rec(k: int):
    # gt: ground truth (all edges)
    gt_index = original['voter', 'votes', 'proposal'].edge_index
    edge_index = validation['voter', 'votes', 'proposal'].edge_index

    # First, we will need to obtain the R value for each node
    # In graph terms, this is just the degree of the graph
    # (the number of items each user interacted with)
    R = item_count = PyG.utils.degree(gt_index[0], num_nodes=n_users)
    # Then, we get the top max(R) recomendations. This is a bit
    # expensive but less than sorting all the recommendations
    topr = model.recommend(edge_index, src_index=users, dst_index=items, k=int(R.max()), sorted=True)
    
    # We transform the pair of vertices format to a 
    # bipartite adjacency matrix
    ground_truth = torch.full((n_users, n_items), False, dtype=torch.bool, device=device)
    ground_truth[gt_index[0], gt_index[1] - n_users] = True

    # Then, we gather the results of that matrix using
    # the top recommendations obtained before
    isin_rmat = ground_truth.gather(1, topr - n_users)
    # For p@k and r@k we just need the first k recommendations
    isin_mat = isin_rmat[:, :k]

    # We calculate mean precision and recall using the formulas
    prec = (isin_mat.sum(dim=-1) / k).sum() / n_users
    rec = (isin_mat.sum(dim=-1) / item_count).sum() / n_users

    # We can't do isin_rmat[:, :R] because R is not an scalar
    # My solution is to create a mask with as much ones as R
    msk = torch.arange(1, R.max()+1, device=device) > R.unsqueeze(1)
    isin_rmat[msk] = 0
    # Calculate the mean R-precision using the formula
    rprec = (isin_rmat.sum(dim=-1) / R).sum() / n_users

    # Finally, we convert the 1-d one item tensors to float
    return float(prec), float(rec), float(rprec)

Even if you don’t use PyTorch Geometric and you prefer other library, the code should be useful. Just accomodate the edge index and create a similar recommend method and you will be able to calculate r-precision.

Sources and more information