Improved Fine-Tuned Reinforcement Learning From Human Feedback Using Prompting Methods for News Summarization

Author
Keywords
Abstract
ChatGPT uses a generative pretrained transformer neural network model, which is under the larger umbrella of generative models. One major boom after ChatGPT is the advent of prompt engineering, which is the most critical part of ChatGPT that utilizes Large Language Models (LLM) and helps ChatGPT provide the desired outputs based on the style and tone of interactions carried out with it. Reinforcement learning from human feedback (RLHF) was used as the major aspect for fine-tuning LLM-based models. This work proposes a human selection strategy that is incorporated in the RLHF process to prevent undesirable consequences of the rightful choice of human reviewers for feedback. H-Rouge is a new metric proposed for humanized AI systems. A detailed evaluation of State-of-the-art summarization algorithms and prompt-based methods have been provided as part of the article. The proposed methods have introduced a strategy for human selection of RLHF models which employs multi-objective optimization to balance various goals encountered during the process with H-Rouge. This article will help nuance readers conduct research in the field of text summarization to start with prompt engineering in the summarization field, and future work will help them proceed in the right direction of research.
Year of Publication
In Press
Journal
International Journal of Interactive Multimedia and Artificial Intelligence
Volume
In press
Start Page
1
Issue
In press
Number
In press
Number of Pages
1-9
Date Published
02/2025
ISSN Number
1989-1660
URL
DOI
Attachment