Please enable JavaScript.
Coggle requires JavaScript to display documents.
Commit Messages Completion paper - Coggle Diagram
Commit Messages Completion paper
Intro
Contributions
New task?
Dataset
Baselines?
Zero-shot English GPT-2
Vanilla Transformer trained on the available Commit Messages dataset
Metrics? (Actually taken from Google Compose and IntelliCode Compose)
Propose model improvements
Personalization with history
Using only changed lines (Practically useful for context size decreasing)
Initializing model from pretrained CodeBERT and GPT-2
Text
There is a task of generation of commit messages
Seems like we still pretty far from good quality that suitable for production
Also, there are many completion systems that useful for end users: code completion, documentation completion, and so on
We can actually reformulate generation task to completion task and gain profit from it
For completion there is much lower boundary of quality where it starts to be useful
Main part
Task definition
Integration?
KInference
Partial server-side
Privacy concerns
Explain architecture here
Using only changed lines for context size reduction
Dataset should be multi-lingual
Methodology and results
Metrics
Completion of commit messages is pretty similar to google compose, so we can use metrics from there
Maybe add single-token scenario and measure standard completion metrics like accuracy and MRR?
Hmm, is the PrefixMatch metric from google compose trying to capture the same thing?
Data
Remind the described above dataset
Deduplication
How to deduplicate Commit Messages? TODO: check how they did this in Generation paper
TODO: revisit this part
Split by projects into train/val/test
In this project we used "incomplete context" technique and IDK where to place this part...
Can we use history to improve quality?
Some words about diversity of styles in commit messages and benefits of personalization in different tasks
Propose a method of personalization: using history of user's commit messages as part of the prompt
RQ (ver. a): can we improve quality of completion using history of user's commit messages as part of prompt?
RQ (ver. b): is personalization benefitial for quality of commit message completion?
Are the pretrained models helpful?
Some words about benefits of using a pretrained models and how to use them
RQ: is using pre-trained models (CodeBERT and GPT-2) as encoder and decoder in Commit Message Completion task beneficial in terms of quality / steps to converge?
Baselines
Describe why zero shot with english model is suitable here
Should we place a model trained on the old dataset here???
Vanilla seq2seq Transformer trained on our data
Dataset
Highlight differences and why the current dataset isn't suitable for the Completion task
Describe what's the difference between the Generation and the Completion tasks
Describe why the current dataset is suitable for the Generation task
Describe our dataset: how collected, stats, link
TODO: multi-lingual dataset???
Conclusion
Threats to validity
Absolute values of metrics are low
Maybe we used wrong pre-trained models (now PLBART is available and more suitable)