Smart Contract Intent Detection with Pre-trained Programming Language Model
Paper
•
2508.20086
•
Published
SmartBERT V3 is a pre-trained programming language model, initialized with CodeBERT-base-mlm. It has been further trained on SmartBERT V2 with an additional 64,000 smart contracts, to enhance its robustness in representing smart contract code at the function level.
from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline
model = RobertaForMaskedLM.from_pretrained('web3se/SmartBERT-v3')
tokenizer = RobertaTokenizer.from_pretrained('web3se/SmartBERT-v3')
code_example = "function totalSupply() external view <mask> (uint256);"
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
outputs = fill_mask(code_example)
print(outputs)
All newline (\n) and tab (\t) characters in the function code were replaced with a single space to ensure consistency in the input data format.
training_args = TrainingArguments(
output_dir=OUTPUT_DIR,
overwrite_output_dir=True,
num_train_epochs=20,
per_device_train_batch_size=64,
save_steps=10000,
save_total_limit=2,
evaluation_strategy="steps",
eval_steps=10000,
resume_from_checkpoint=checkpoint
)
To train and deploy the SmartBERT V3 model for Web API services, please refer to our GitHub repository: web3se-lab/SmartBERT.
@article{huang2025smart,
title={Smart Contract Intent Detection with Pre-trained Programming Language Model},
author={Huang, Youwei and Li, Jianwen and Fang, Sen and Li, Yao and Yang, Peng and Hu, Bin},
journal={arXiv preprint arXiv:2508.20086},
year={2025}
}
Base model
microsoft/codebert-base-mlm