Black-Box Büyük Dil Modellerinde Çıktı Kalitesini Artırmak için Pre-Informing Yaklaşımı ve Mevcut İstem Tekniklerinin Sentezi

Yazarlar

DOI:

https://doi.org/10.63556/tisej.2026.1731

Anahtar Kelimeler:

Büyük Dil Modelleri- İstem Tasarımı- İstem Mühendisliği- Sentetik Veri Üretimi- Black-box LLM’ler

Özet

Bu çalışma, Büyük Dil Modellerinin (LLM) sentetik veri üretimindeki etkinliğinin yalnızca model kapasitesine değil, insan yönlendirmeli istem (prompt) yapısının niteliğine de bağlı olduğu kabulünden hareketle, literatürde yer alan yirmi üç istem yazım tekniğini incelemekte ve ana yönerge öncesinde yapılandırılmış bağlamsal hazırlık yoluyla çıktı kalitesini artırmayı amaçlayan pre-informing adlı hibrit bir yaklaşımı önermektedir. Çalışma, zero-shot ve pre-informing istemleme yaklaşımlarını, eğitim alanına ait 20 metin üretim görevi kapsamında ve 3 kapalı kutu LLM (ChatGPT 5.2 Standard, Gemini 3 Fast ve Claude Sonnet 2.6) kullanarak karşılaştırmalı olarak değerlendirmektedir. Bu tasarım toplam 120 çıktı üretmektedir.

Üretilen çıktılar, Kelime Sayısı (Word Count), Benzersiz Kelime Sayısı (Unique Word Count), Ortalama Cümle Uzunluğu (Average Sentence Length) ve Teknik Terim Sayısı (Technical Terms Count) göstergelerinden oluşan ortak bir nicel çerçeve aracılığıyla değerlendirilmektedir. Bu gösterge setinin içsel tutarlılığı, .887’lik standartlaştırılmış Cronbach’s alpha katsayısı ile desteklenmektedir. Betimsel karşılaştırmalar, pre-informed koşulunun dört göstergenin tamamında daha yüksek değerler ürettiğini göstermektedir. Bu farklar ayrıca, dört ölçütün tamamında istatistiksel olarak anlamlı iyileşmelere işaret eden Wilcoxon işaretli sıralar testleri ile doğrulanmaktadır. Buna ek olarak, Friedman testi sonuçları iyileşme büyüklüğünün modeller arasında anlamlı biçimde farklılaştığını; Claude’un en yüksek genel kazanımları, Gemini’nin en sınırlı kazanımları gösterdiğini ve ChatGPT’nin genel olarak orta bir konumda yer aldığını ortaya koymaktadır.

Bulgular, pre-informing yaklaşımının, ölçülebilir göstergelere yansıdığı biçimiyle daha yüksek çıktı kalitesiyle ilişkili olduğunu ve etkilerinin yalnızca tekil örneklerle sınırlı kalmayıp birden fazla görev türünde gözlemlenebildiğini göstermektedir. Aynı zamanda sonuçlar, pre-informing’in etkisinin kapalı kutu LLM’ler arasında eşit dağılmadığını ve kısmen modele duyarlı kaldığını ortaya koymaktadır. Çalışma, pre-informing’i kaliteyle ilişkili çıktı özelliklerini iyileştirmeye yönelik yapılandırılmış ve tekrarlanabilir bir istem temelli çerçeve olarak konumlandırmakta ve LLM çıktılarının şekillenmesinde insan yönlendirmesinin süregelen önemini vurgulamaktadır.

Kaynaklar

Branwen, G. (2020). GPT-3 creative fiction. Retrieved April 3, 2025, from https://gwern.net/GPT-3

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

Cao, J., Li, M., Wen, M., & Cheung, S. C. (2025). A study on prompt design, advantages and limitations of chatgpt for deep learning program repair. Automated Software Engineering, 32(1), 1-29.

Caruccio, L., Cirillo, S., Polese, G., Solimando, G., Sundaramurthy, S., & Tortora, G. (2024). Claude 2.0 large language model: Tackling a real-world classification problem with a new iterative prompt engineering approach. Intelligent Systems with Applications, 21, 200336.

DAIR.AI. Elements of a prompt. 2023. https://www.promptingguide.ai/introduction/elements

Dauphin, B., & Siefert, C. (2025). From Llama to language: prompt-engineering allows general-purpose artificial intelligence to rate narratives like expert psychologists. Frontiers in Artificial Intelligence, 8, 1398885.

Diao, S., Wang, P., Lin, Y., & Zhang, T. (2023). Active prompting with chain-of-thought for large language models. arXiv. arXiv preprint arXiv:2302.12246.

Elnashar, A., White, J., & Schmidt, D. C. (2025). Enhancing structured data generation with GPT-4o evaluating prompt efficiency across prompt styles. Frontiers in Artificial Intelligence, 8, 1558938.

Gao, A. (2023). Prompt engineering for large language models. Available at SSRN 4504303.

Gero, K. I., Liu, V., & Chilton, L. (2022, June). Sparks: Inspiration for science writing using language models. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (pp. 1002-1019).

Giray, L. (2023). Prompt engineering with ChatGPT: a guide for academic writers. Annals of biomedical engineering, 51(12), 2629-2633.

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2020). Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.

Huang, J., Gu, S. S., Hou, L., Wu, Y., Wang, X., Yu, H., & Han, J. (2022). Large language models can self-improve. arXiv preprint arXiv:2210.11610.

Huang, J., Yang, D. M., Rong, R., Nezafati, K., Treager, C., Chi, Z., ... & Xie, Y. (2024). A critical assessment of using ChatGPT for extracting structured data from clinical notes. npj Digital Medicine, 7(1), 106.

Ibrahim, K. A., Luk, P. C. K., Luo, Z., Ng, S. Y., & Harrison, L. (2025). Revolutionizing power electronics design through large language models: Applications and future directions. Computers and Electrical Engineering, 123, 110248.

Jung, S. G., Salminen, J., Aldous, K. K., & Jansen, B. J. (2025). PersonaCraft: Leveraging language models for data-driven persona development. International Journal of Human-Computer Studies, 197, 103445.

Kadavath, S., Conerly, T., Askell, A., Henighan, T., Drain, D., Perez, E., ... & Kaplan, J. (2022). Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.

Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. (2024). AI literacy and its implications for prompt engineering strategies. Computers and Education: Artificial Intelligence, 6, 100225.

Lee, U., Jung, H., Jeon, Y., Sohn, Y., Hwang, W., Moon, J., & Kim, H. (2024). Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in english education. Education and Information Technologies, 29(9), 11483-11515.

Lee, J. H., & Shin, J. (2024). How to optimize prompting for large language models in clinical research. Korean Journal of Radiology, 25(10), 869.

Lin, X., Chen, W., Zhou, Z., Li, J., Zhao, Y., & Zhang, X. (2025). A five-dimensional digital twin framework driven by large language models-enhanced RL for CNC systems. Robotics and Computer-Integrated Manufacturing, 95, 103009.

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9), 1-35.

Liu, V., & Chilton, L. B. (2022, April). Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI conference on human factors in computing systems (pp. 1-23).

Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., ... & Clark, P. (2023). Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36, 46534-46594.

Ma, F., Li, D., Liu, Y., Lan, D., & Pang, Z. (2025). STEP: A structured prompt optimization method for SCADA system tag generation using LLMs. Journal of Industrial Information Integration, 45, 100832.

Maharjan, J., Garikipati, A., Singh, N. P., Cyrus, L., Sharma, M., Ciobanu, M., ... & Das, R. (2024). OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Scientific Reports, 14(1), 14156.

Marvin, G., Hellen, N., Jjingo, D., & Nakatumba-Nabende, J. (2023, June). Prompt engineering in large language models. In International conference on data intelligence and cognitive informatics (pp. 387-402). Singapore: Springer Nature Singapore.

Maus, N., Chao, P., Wong, E., & Gardner, J. (2023). Adversarial prompting for black box foundation models. arXiv preprint arXiv:2302.04237, 1(2).

Michelet, G., & Breitinger, F. (2024). ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (local) large language models. Forensic Science International: Digital Investigation, 48, 301683.

Perković, G., Drobnjak, A., & Botički, I. (2024, May). Hallucinations in llms: Understanding and addressing challenges. In 2024 47th MIPRO ICT and Electronics Convention (MIPRO) (pp. 2084-2088). IEEE.

Ren, Z., Ju, X., Chen, X., & Qu, Y. (2025). Improving distributed learning-based vulnerability detection via multi-modal prompt tuning. Journal of Systems and Software, 226, 112442.

Reynolds, L., & McDonell, K. (2021, May). Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI conference on human factors in computing systems (pp. 1-7).

Sabbatella, A., Ponti, A., Giordani, I., Candelieri, A., & Archetti, F. (2024). Prompt optimization in large language models. Mathematics, 12(6), 929.

Schulhoff, S., Pinto, J., Khan, A., Bouchard, L. F., Si, C., Anati, S., ... & Boyd-Graber, J. (2023, December). Ignore this title and hackaprompt: Exposing systemic vulnerabilities of llms through a global scale prompt hacking competition. Association for Computational Linguistics (ACL).

Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., ... & Resnik, P. (2024). The prompt report: A systematic survey of prompting techniques. arXiv preprint arXiv:2406.06608, 5.

Sivarajkumar, S., Kelley, M., Samolyk-Mazzanti, A., Visweswaran, S., & Wang, Y. (2024). An empirical evaluation of prompting strategies for large language models in zero-shot clinical natural language processing: algorithm development and validation study. JMIR Medical Informatics, 12, e55318.

Savelka, J., & Ashley, K. D. (2023). The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts. Frontiers in Artificial Intelligence, 6, 1279794.

Strobelt, H., Webson, A., Sanh, V., Hoover, B., Beyer, J., Pfister, H., & Rush, A. M. (2022). Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE transactions on visualization and computer graphics, 29(1), 1146-1156.

Taveekitworachai, P., Abdullah, F., Dewantoro, M. F., Thawonmas, R., Togelius, J., & Renz, J. (2023). ChatGPT4PCG competition: character-like level generation for science birds. arXiv cs.

Team, L. (2020). World Creation by Analogy. https://aidungeon.medium.com/world-creation-by-analogy-f26e3791d35f

Tian, Y., Liu, A., Dai, Y., Nagato, K., & Nakao, M. (2024). Systematic synthesis of design prompts for large language models in conceptual design. CIRP Annals, 73(1), 85-88.

Tromly, K. (2001). Renewable energy: An overview. energy efficiency and renewable energy clearinghouse (EREC) Brochure. Department of Energy: USA, 200.

Twidell, J. (2021). Renewable energy resources. Routledge.

Vogelsang, A. (2024). From specifications to prompts: On the future of generative large language models in requirements engineering. IEEE Software, 41(5), 9-13.

Wan, Y., Chen, Z., Liu, Y., Chen, C., & Packianather, M. (2025). Empowering LLMs by hybrid retrieval-augmented generation for domain-centric Q&A in smart manufacturing. Advanced Engineering Informatics, 65, 103212.

Wang, B., Deng, X., & Sun, H. (2022). Iteratively prompt pre-trained language models for chain of thought. arXiv preprint arXiv:2203.08383.

Wang, M. H., Jiang, X., Zeng, P., Li, X., Chong, K. K. L., Hou, G., ... & Pan, Y. (2025). Balancing accuracy and user satisfaction: the role of prompt engineering in AI-driven healthcare solutions. Frontiers in Artificial Intelligence, 8, 1517918.

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., ... & Fedus, W. (2022). Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., ... & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382.

Wu, T., Terry, M., & Cai, C. J. (2022, April). Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems (pp. 1-22).

Xue, T., Wang, Z., Wang, Z., Han, C., Yu, P., & Ji, H. (2023). Rcot: Detecting and rectifying factual inconsistency in reasoning by reversing chain-of-thought. arXiv preprint arXiv:2305.11499.

Yang, Z., Li, L., Wang, J., Lin, K., Azarnasab, E., Ahmed, F., ... & Wang, L. (2023). Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381.

Yao, Y., Li, Z., & Zhao, H. (2023). Beyond chain-of-thought, effective graph-of-thought reasoning in language models. arXiv preprint arXiv:2305.16582.

Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023, April). Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI conference on human factors in computing systems (pp. 1-21).

Zhang, K., Zhou, F., Wu, L., Xie, N., & He, Z. (2024). Semantic understanding and prompt engineering for large-scale traffic data imputation. Information Fusion, 102, 102038.

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., ... & Chi, E. (2022). Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.

Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022, November). Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations.

Yayınlanmış

20-03-2026

Nasıl Atıf Yapılır

YENİKAYA, M. A., & ODABAŞOĞLU, M. L. (2026). Black-Box Büyük Dil Modellerinde Çıktı Kalitesini Artırmak için Pre-Informing Yaklaşımı ve Mevcut İstem Tekniklerinin Sentezi. Üçüncü Sektör Sosyal Ekonomi Dergisi, 61(1), 1243–1270. https://doi.org/10.63556/tisej.2026.1731

Sayı

Bölüm

Araştırma Makalesi

Benzer Makaleler

<< < 50 51 52 53 54 55 56 57 58 59 > >> 

Bu makale için ayrıca gelişmiş bir benzerlik araması başlat yapabilirsiniz.