Black-Box Büyük Dil Modellerinde Çıktı Kalitesini Artırmak için Pre-Informing Yaklaşımı ve Mevcut İstem Tekniklerinin Sentezi

Muhammed Akif YENİKAYA; Muhammed Lütfü ODABAŞOĞLU

doi:10.63556/tisej.2026.1731

Authors

Muhammed Akif YENİKAYA Kafkas University https://orcid.org/0000-0002-3624-722X
Muhammed Lütfü ODABAŞOĞLU Kafkas University https://orcid.org/0009-0001-7809-9157

DOI:

https://doi.org/10.63556/tisej.2026.1731

Keywords:

Large Language Models, Prompt Design, Prompt Engineering, Synthetic Data Generation, Black-box LLMs

Abstract

This study proceeds from the premise that the effectiveness of Large Language Models (LLMs) in synthetic data generation depends not only on model capacity but also on the quality of human-guided prompting strategies. Within this framework, it reviews twenty-three prompt-writing techniques reported in the literature and proposes a hybrid approach, pre-informing, which aims to improve output quality through structured contextual preparation before the main instruction. The study comparatively evaluates zero-shot and pre-informing prompting across 20 education-domain text-generation tasks, using 3 black-box LLMs: ChatGPT 5.2 Standard, Gemini 3 Fast, and Claude Sonnet 2.6. This design yields a total of 120 generated outputs.

The outputs are evaluated through a common quantitative framework consisting of Word Count, Unique Word Count, Average Sentence Length, and Technical Terms Count. The internal coherence of this indicator set is supported by a standardized Cronbach’s alpha of .887. Descriptive comparisons show that the pre-informed condition yields higher values across all four indicators. These differences are further confirmed by Wilcoxon signed-rank tests, which show statistically significant improvements across all four measures. In addition, Friedman test results indicate that the magnitude of improvement differs significantly across models, with Claude showing the largest overall gains, Gemini the most limited gains, and ChatGPT generally occupying an intermediate position.

The findings indicate that pre-informing is associated with stronger output quality as reflected in measurable indicators, and that its effects are observable across multiple task types rather than being limited to isolated examples. At the same time, the results show that the effectiveness of pre-informing is not uniform across black-box LLMs and remains partly model-sensitive. The study positions pre-informing as a structured and reproducible prompt-based framework for improving quality-related output characteristics and underscores the continuing importance of human guidance in shaping LLM outputs.

References

Branwen, G. (2020). GPT-3 creative fiction. Retrieved April 3, 2025, from https://gwern.net/GPT-3

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

Cao, J., Li, M., Wen, M., & Cheung, S. C. (2025). A study on prompt design, advantages and limitations of chatgpt for deep learning program repair. Automated Software Engineering, 32(1), 1-29.

Caruccio, L., Cirillo, S., Polese, G., Solimando, G., Sundaramurthy, S., & Tortora, G. (2024). Claude 2.0 large language model: Tackling a real-world classification problem with a new iterative prompt engineering approach. Intelligent Systems with Applications, 21, 200336.

DAIR.AI. Elements of a prompt. 2023. https://www.promptingguide.ai/introduction/elements

Dauphin, B., & Siefert, C. (2025). From Llama to language: prompt-engineering allows general-purpose artificial intelligence to rate narratives like expert psychologists. Frontiers in Artificial Intelligence, 8, 1398885.

Diao, S., Wang, P., Lin, Y., & Zhang, T. (2023). Active prompting with chain-of-thought for large language models. arXiv. arXiv preprint arXiv:2302.12246.

Elnashar, A., White, J., & Schmidt, D. C. (2025). Enhancing structured data generation with GPT-4o evaluating prompt efficiency across prompt styles. Frontiers in Artificial Intelligence, 8, 1558938.

Gao, A. (2023). Prompt engineering for large language models. Available at SSRN 4504303.

Gero, K. I., Liu, V., & Chilton, L. (2022, June). Sparks: Inspiration for science writing using language models. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (pp. 1002-1019).

Giray, L. (2023). Prompt engineering with ChatGPT: a guide for academic writers. Annals of biomedical engineering, 51(12), 2629-2633.

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2020). Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.

Huang, J., Gu, S. S., Hou, L., Wu, Y., Wang, X., Yu, H., & Han, J. (2022). Large language models can self-improve. arXiv preprint arXiv:2210.11610.

Huang, J., Yang, D. M., Rong, R., Nezafati, K., Treager, C., Chi, Z., ... & Xie, Y. (2024). A critical assessment of using ChatGPT for extracting structured data from clinical notes. npj Digital Medicine, 7(1), 106.

Ibrahim, K. A., Luk, P. C. K., Luo, Z., Ng, S. Y., & Harrison, L. (2025). Revolutionizing power electronics design through large language models: Applications and future directions. Computers and Electrical Engineering, 123, 110248.

Jung, S. G., Salminen, J., Aldous, K. K., & Jansen, B. J. (2025). PersonaCraft: Leveraging language models for data-driven persona development. International Journal of Human-Computer Studies, 197, 103445.

Kadavath, S., Conerly, T., Askell, A., Henighan, T., Drain, D., Perez, E., ... & Kaplan, J. (2022). Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.

Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. (2024). AI literacy and its implications for prompt engineering strategies. Computers and Education: Artificial Intelligence, 6, 100225.

Lee, U., Jung, H., Jeon, Y., Sohn, Y., Hwang, W., Moon, J., & Kim, H. (2024). Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in english education. Education and Information Technologies, 29(9), 11483-11515.

Lee, J. H., & Shin, J. (2024). How to optimize prompting for large language models in clinical research. Korean Journal of Radiology, 25(10), 869.

Lin, X., Chen, W., Zhou, Z., Li, J., Zhao, Y., & Zhang, X. (2025). A five-dimensional digital twin framework driven by large language models-enhanced RL for CNC systems. Robotics and Computer-Integrated Manufacturing, 95, 103009.

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9), 1-35.

Liu, V., & Chilton, L. B. (2022, April). Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI conference on human factors in computing systems (pp. 1-23).

Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., ... & Clark, P. (2023). Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36, 46534-46594.

Ma, F., Li, D., Liu, Y., Lan, D., & Pang, Z. (2025). STEP: A structured prompt optimization method for SCADA system tag generation using LLMs. Journal of Industrial Information Integration, 45, 100832.

Maharjan, J., Garikipati, A., Singh, N. P., Cyrus, L., Sharma, M., Ciobanu, M., ... & Das, R. (2024). OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Scientific Reports, 14(1), 14156.

Marvin, G., Hellen, N., Jjingo, D., & Nakatumba-Nabende, J. (2023, June). Prompt engineering in large language models. In International conference on data intelligence and cognitive informatics (pp. 387-402). Singapore: Springer Nature Singapore.

Maus, N., Chao, P., Wong, E., & Gardner, J. (2023). Adversarial prompting for black box foundation models. arXiv preprint arXiv:2302.04237, 1(2).

Michelet, G., & Breitinger, F. (2024). ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (local) large language models. Forensic Science International: Digital Investigation, 48, 301683.

Perković, G., Drobnjak, A., & Botički, I. (2024, May). Hallucinations in llms: Understanding and addressing challenges. In 2024 47th MIPRO ICT and Electronics Convention (MIPRO) (pp. 2084-2088). IEEE.

Ren, Z., Ju, X., Chen, X., & Qu, Y. (2025). Improving distributed learning-based vulnerability detection via multi-modal prompt tuning. Journal of Systems and Software, 226, 112442.

Reynolds, L., & McDonell, K. (2021, May). Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI conference on human factors in computing systems (pp. 1-7).

Sabbatella, A., Ponti, A., Giordani, I., Candelieri, A., & Archetti, F. (2024). Prompt optimization in large language models. Mathematics, 12(6), 929.

Schulhoff, S., Pinto, J., Khan, A., Bouchard, L. F., Si, C., Anati, S., ... & Boyd-Graber, J. (2023, December). Ignore this title and hackaprompt: Exposing systemic vulnerabilities of llms through a global scale prompt hacking competition. Association for Computational Linguistics (ACL).

Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., ... & Resnik, P. (2024). The prompt report: A systematic survey of prompting techniques. arXiv preprint arXiv:2406.06608, 5.

Sivarajkumar, S., Kelley, M., Samolyk-Mazzanti, A., Visweswaran, S., & Wang, Y. (2024). An empirical evaluation of prompting strategies for large language models in zero-shot clinical natural language processing: algorithm development and validation study. JMIR Medical Informatics, 12, e55318.

Savelka, J., & Ashley, K. D. (2023). The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts. Frontiers in Artificial Intelligence, 6, 1279794.

Strobelt, H., Webson, A., Sanh, V., Hoover, B., Beyer, J., Pfister, H., & Rush, A. M. (2022). Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE transactions on visualization and computer graphics, 29(1), 1146-1156.

Taveekitworachai, P., Abdullah, F., Dewantoro, M. F., Thawonmas, R., Togelius, J., & Renz, J. (2023). ChatGPT4PCG competition: character-like level generation for science birds. arXiv cs.

Team, L. (2020). World Creation by Analogy. https://aidungeon.medium.com/world-creation-by-analogy-f26e3791d35f

Tian, Y., Liu, A., Dai, Y., Nagato, K., & Nakao, M. (2024). Systematic synthesis of design prompts for large language models in conceptual design. CIRP Annals, 73(1), 85-88.

Tromly, K. (2001). Renewable energy: An overview. energy efficiency and renewable energy clearinghouse (EREC) Brochure. Department of Energy: USA, 200.

Twidell, J. (2021). Renewable energy resources. Routledge.

Vogelsang, A. (2024). From specifications to prompts: On the future of generative large language models in requirements engineering. IEEE Software, 41(5), 9-13.

Wan, Y., Chen, Z., Liu, Y., Chen, C., & Packianather, M. (2025). Empowering LLMs by hybrid retrieval-augmented generation for domain-centric Q&A in smart manufacturing. Advanced Engineering Informatics, 65, 103212.

Wang, B., Deng, X., & Sun, H. (2022). Iteratively prompt pre-trained language models for chain of thought. arXiv preprint arXiv:2203.08383.

Wang, M. H., Jiang, X., Zeng, P., Li, X., Chong, K. K. L., Hou, G., ... & Pan, Y. (2025). Balancing accuracy and user satisfaction: the role of prompt engineering in AI-driven healthcare solutions. Frontiers in Artificial Intelligence, 8, 1517918.

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., ... & Fedus, W. (2022). Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., ... & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.

Wu, T., Terry, M., & Cai, C. J. (2022, April). Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems (pp. 1-22).

Xue, T., Wang, Z., Wang, Z., Han, C., Yu, P., & Ji, H. (2023). Rcot: Detecting and rectifying factual inconsistency in reasoning by reversing chain-of-thought. arXiv preprint arXiv:2305.11499.

Yang, Z., Li, L., Wang, J., Lin, K., Azarnasab, E., Ahmed, F., ... & Wang, L. (2023). Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381.

Yao, Y., Li, Z., & Zhao, H. (2023). Beyond chain-of-thought, effective graph-of-thought reasoning in language models. arXiv preprint arXiv:2305.16582.

Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023, April). Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI conference on human factors in computing systems (pp. 1-21).

Zhang, K., Zhou, F., Wu, L., Xie, N., & He, Z. (2024). Semantic understanding and prompt engineering for large-scale traffic data imputation. Information Fusion, 102, 102038.

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., ... & Chi, E. (2022). Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.

Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022, November). Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations.

The Pre-Informing Approach and a Synthesis of Existing Prompting Techniques for Improving Output Quality in Black-Box Large Language Models

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Published by

Language

Information

Indexing

Keywords