|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
task_categories: |
|
|
- text-retrieval |
|
|
language: |
|
|
- en |
|
|
extra_gated_fields: |
|
|
Full Name: text |
|
|
Affiliation (Organization/University): text |
|
|
Designation/Status in Your Organization: text |
|
|
Country: country |
|
|
I want to use this model for (please provide the reason(s)): text |
|
|
IL-PCSR models are free for research use but NOT for commercial use; do you agree if you are provided with the model, you will NOT use for any commercial purposes? Also do you agree that you will not be sharing this dataset further or uploading it anywhere else on the internet: checkbox |
|
|
DISCLAIMER The dataset is released for research purposes only and authors do not take any responsibility for any damage or loss arising due to usage of data or any system/model developed using the dataset: checkbox |
|
|
tags: |
|
|
- legal |
|
|
- indian law |
|
|
- legal retrieval |
|
|
- statute retrieval |
|
|
- precedent retrieval |
|
|
--- |
|
|
# IL-PCSR (Indian Legal — Precedent & Statute Retrieval) |
|
|
|
|
|
**Ensemble Model:** A hybrid approach combining lexical features (BM25 5-gram) with semantic/distributional features (Para-GNN) with dynamic weighting between features, that is effective for both legal statute as well as prior case retrieval. |
|
|
|
|
|
## Summary of Model files |
|
|
|
|
|
We have 5 files for 3 different types of models: |
|
|
|
|
|
- `only_secs_model.bin, only_precs_model.bin` — separate models for LSR and PCR, ft. independently |
|
|
- `multi_task_model.bin` — single model for both LSR and PCR, ft. together in a multi-task setup |
|
|
- `pipeline_secs.bin, pipeline_precs.bin` — separate models for LSR and PCR obtained via transfer learning (`pipeline_secs.bin` is obtained by LSR training on `only_precs_model.bin`, i.e., transfer PCR --> LSR, and vice versa) |
|
|
|
|
|
All of these models have been trained with summaries of queries and precedents, and not full documents. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use |
|
|
|
|
|
All of the examples assume you have access (i.e., gate accepted). You need to use `huggingface_hub` to download the model contents to a local file, after which it can be loaded like any standard PyTorch model. |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download to local file |
|
|
file_path = hf_hub_download( |
|
|
repo_id="Exploration-Lab/IL-PCSR-Models", |
|
|
filename="multitask_model.bin" |
|
|
) |
|
|
|
|
|
print("Model weights downloaded to:", file_path) |
|
|
|
|
|
import torch |
|
|
|
|
|
# Load the state dict in pytorch |
|
|
trained_state_dict = torch.load(file_path, map_location=torch.device('cpu')) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{il-pcsr2025, |
|
|
title = "IL-PCSR: Legal Corpus for Prior Case and Statute Retrieval", |
|
|
author = "Paul, Shounak and Ghumare, Dhananjay and Goyal, Pawan and Ghosh, Saptarshi and Modi, Ashutosh" |
|
|
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing", |
|
|
month = nov, |
|
|
year = "2025", |
|
|
address = "Suzhou, China", |
|
|
publisher = "Association for Computational Linguistics", |
|
|
note = "To Appear" |
|
|
} |
|
|
``` |