-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit eeafd69
Showing
18 changed files
with
1,037 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
source "https://rubygems.org" | ||
|
||
git_source(:github) {|repo_name| "https://github.com/#{repo_name}" } | ||
|
||
gem 'jekyll' | ||
|
||
group :jekyll_plugins do | ||
gem 'github-pages' | ||
gem 'jekyll-remote-theme' | ||
gem 'jekyll-include-cache' | ||
gem 'webrick' | ||
end | ||
|
||
# gem "rails" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# PMLR 264 | ||
|
||
To suggest fixes to this volume please make a pull request containing the changes requested and a justification for the changes. | ||
|
||
To edit the details of this conference work edit the [_config.yml](./_config.yml) file and submit a pull request. | ||
|
||
To make changes to the individual paper details, edit the associated paper file in the [./_posts](./_posts) subdirectory. | ||
|
||
For details of how to publish in PMLR please check https://proceedings.mlr.press/faq.html | ||
|
||
For details of what is required to submit a proceedings please check https://proceedings.mlr.press/spec.html | ||
|
||
|
||
|
||
Published as Volume 264 by the Proceedings of Machine Learning Research on 28 January 2025. | ||
|
||
Volume Edited by: | ||
* Sheng Li | ||
* Zhongmin Cui | ||
* Jiasen Lu | ||
* Deborah Harris | ||
* Shumin Jing | ||
|
||
Series Editors: | ||
* Neil D. Lawrence |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
--- | ||
booktitle: Proceedings of Large Foundation Models for Educational Assessment | ||
shortname: FM-EduAssess2024 | ||
year: '2024' | ||
volume: '264' | ||
start: &1 2024-12-15 | ||
end: 2024-12-16 | ||
published: 2025-01-28 | ||
sections: | ||
- name: Preface | ||
title: Preface | ||
- name: Contributed Papers | ||
title: Contributed Papers | ||
layout: proceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: FM-EduAssess2024 | ||
month: 0 | ||
cycles: false | ||
bibtex_editor: Li, Sheng and Cui, Zhongmin and Lu, Jiasen and Harris, Deborah and | ||
Jing, Shumin | ||
editor: | ||
- given: Sheng | ||
family: Li | ||
- given: Zhongmin | ||
family: Cui | ||
- given: Jiasen | ||
family: Lu | ||
- given: Deborah | ||
family: Harris | ||
- given: Shumin | ||
family: Jing | ||
title: Proceedings of Machine Learning Research | ||
description: | | ||
Proceedings of Large Foundation Models for Educational Assessment | ||
Held in Vancouver, BC, Canada on 15-16 December 2024 | ||
Published as Volume 264 by the Proceedings of Machine Learning Research on 28 January 2025. | ||
Volume Edited by: | ||
Sheng Li | ||
Zhongmin Cui | ||
Jiasen Lu | ||
Deborah Harris | ||
Shumin Jing | ||
Series Editors: | ||
Neil D. Lawrence | ||
date_str: 15--16 Dec | ||
url: https://proceedings.mlr.press | ||
author: | ||
name: PMLR | ||
baseurl: "/v264" | ||
twitter_username: MLResearchPress | ||
github_username: mlresearch | ||
markdown: kramdown | ||
exclude: | ||
- README.md | ||
- Gemfile | ||
- ".gitignore" | ||
plugins: | ||
- jekyll-feed | ||
- jekyll-seo-tag | ||
- jekyll-remote-theme | ||
remote_theme: mlresearch/jekyll-theme | ||
style: pmlr | ||
permalink: "/:title.html" | ||
ghub: | ||
edit: true | ||
repository: v264 | ||
display: | ||
copy_button: | ||
bibtex: true | ||
endnote: true | ||
apa: true | ||
comments: false | ||
volume_type: Volume | ||
volume_dir: v264 | ||
email: '' | ||
conference: | ||
name: Large Foundation Models for Educational Assessment | ||
url: https://neurips2024edu.github.io/ | ||
location: Vancouver, BC, Canada | ||
dates: | ||
- *1 | ||
- 2024-12-16 | ||
analytics: | ||
google: | ||
tracking_id: UA-92432422-1 | ||
orig_bibfile: "/Users/neil/mlresearch/v264/FMEduAssess2024_corrected.bib" | ||
# Site settings | ||
# Original source: /Users/neil/mlresearch/v264/FMEduAssess2024_corrected.bib |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
--- | ||
title: A Large Foundation Model for Assessing Spatially Distributed Personality Traits | ||
abstract: We explored emulating textually encoded personality information in a large | ||
language model. Given its predominant empirical validation, we chose the five-factor | ||
model of personality compiled for a broad range of natural languages. Our study | ||
assessed personality traits from a multicultural viewpoint over a diverse set of | ||
thirty universal contexts. Thus, contributing to the wider comprehension of generalizing | ||
relationships among personality traits across cultures. We administered psychometric | ||
tests to the language model, examined links between location and personality, and | ||
cross validated measures at various levels of trait hierarchy. | ||
section: Contributed Papers | ||
layout: inproceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: bleiweiss25a | ||
month: 0 | ||
tex_title: A Large Foundation Model for Assessing Spatially Distributed Personality | ||
Traits | ||
firstpage: 173 | ||
lastpage: 185 | ||
page: 173-185 | ||
order: 173 | ||
cycles: false | ||
bibtex_author: Bleiweiss, Avi | ||
author: | ||
- given: Avi | ||
family: Bleiweiss | ||
date: 2025-01-28 | ||
address: | ||
container-title: Proceedings of Large Foundation Models for Educational Assessment | ||
volume: '264' | ||
genre: inproceedings | ||
issued: | ||
date-parts: | ||
- 2025 | ||
- 1 | ||
- 28 | ||
pdf: https://raw.githubusercontent.com/mlresearch/v264/main/assets/bleiweiss25a/bleiweiss25a.pdf | ||
extras: [] | ||
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
title: 'MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question | ||
Generation' | ||
abstract: Automatic question generation is a critical task that involves evaluating | ||
question quality by considering factors such as engagement, pedagogical value, and | ||
the ability to stimulate critical thinking. These aspects require human-like understanding | ||
and judgment, which automated systems currently lack. However, human evaluations | ||
are costly and impractical for large-scale samples of generated questions. Therefore, | ||
we propose a novel system, MIRROR (Multi-LLM Iterative Review and Response for Optimized | ||
Rating), which leverages large language models (LLMs) to automate the evaluation | ||
process for questions generated by automated question generation systems. We experimented | ||
with several state-of-the-art LLMs, such as GPT-4, Gemini, and Llama2-70b. We observed | ||
that the scores of human evaluation metrics, namely relevance, appropriateness, | ||
novelty, complexity, and grammaticality, improved when using the feedback-based | ||
approach called MIRROR, tending to be closer to the human baseline scores. Furthermore, | ||
we observed that Pearson’s correlation coefficient between GPT-4 and human experts | ||
improved when using our proposed feedback-based approach, MIRROR, compared to direct | ||
prompting for evaluation. Error analysis shows that our proposed approach, MIRROR, | ||
significantly helps to improve relevance and appropriateness. | ||
section: Contributed Papers | ||
layout: inproceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: deroy25a | ||
month: 0 | ||
tex_title: 'MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question | ||
Generation' | ||
firstpage: 3 | ||
lastpage: 32 | ||
page: 3-32 | ||
order: 3 | ||
cycles: false | ||
bibtex_author: Deroy, Aniket and Maity, Subhankar and Sarkar, Sudeshna | ||
author: | ||
- given: Aniket | ||
family: Deroy | ||
- given: Subhankar | ||
family: Maity | ||
- given: Sudeshna | ||
family: Sarkar | ||
date: 2025-01-28 | ||
address: | ||
container-title: Proceedings of Large Foundation Models for Educational Assessment | ||
volume: '264' | ||
genre: inproceedings | ||
issued: | ||
date-parts: | ||
- 2025 | ||
- 1 | ||
- 28 | ||
pdf: https://raw.githubusercontent.com/mlresearch/v264/main/assets/deroy25a/deroy25a.pdf | ||
extras: [] | ||
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
--- | ||
title: 'Towards Scalable Automated Grading: Leveraging Large Language Models for Conceptual | ||
Question Evaluation in Engineering' | ||
abstract: This study explores the feasibility of using large language models (LLMs), | ||
specifically GPT-4o (ChatGPT), for automated grading of conceptual questions in | ||
an undergraduate Mechanical Engineering course. We compared the grading performance | ||
of GPT-4o with that of human teaching assistants (TAs) on ten quiz problems from | ||
the MEEN 361 course at Texas AThis study explores the feasibility of using large | ||
language models (LLMs), specifically GPT-4o (ChatGPT), for automated grading of | ||
conceptual questions in an undergraduate Mechanical Engineering course. We compared | ||
the grading performance of GPT-4o with that of human teaching assistants (TAs) on | ||
ten quiz problems from the MEEN 361 course at Texas A&M University, each answered | ||
by approximately 225 students. Both the LLM and TAs followed the same instructor-provided | ||
rubric to ensure grading consistency. We evaluated performance using Spearman’s | ||
rank correlation coefficient and Root Mean Square Error (RMSE) to assess the alignment | ||
between rankings and the accuracy of scores assigned by GPT-4o and TAs under zero- | ||
and few-shot grading settings. In the zero-shot setting, GPT-4o demonstrated a strong | ||
correlation with TA grading, with Spearman’s rank correlation coefficient exceeding | ||
0.6 in seven out of ten datasets and reaching a high of 0.9387. Our analysis reveals | ||
that GPT-4o performs well when grading criteria are straightforward but struggles | ||
with nuanced answers, particularly those involving synonyms not present in the rubric. | ||
The model also tends to grade more stringently in ambiguous cases compared to human | ||
TAs. Overall, ChatGPT shows promise as a tool for grading conceptual questions, | ||
offering scalability and consistency. | ||
section: Contributed Papers | ||
layout: inproceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: gao25a | ||
month: 0 | ||
tex_title: 'Towards Scalable Automated Grading: Leveraging Large Language Models for | ||
Conceptual Question Evaluation in Engineering' | ||
firstpage: 186 | ||
lastpage: 206 | ||
page: 186-206 | ||
order: 186 | ||
cycles: false | ||
bibtex_author: Gao, Rujun and Guo, Xiaosu and Li, Xiaodi and Narayanan, Arun Balajiee | ||
Lekshmi and Thomas, Naveen and Srinivasa, Arun R. | ||
author: | ||
- given: Rujun | ||
family: Gao | ||
- given: Xiaosu | ||
family: Guo | ||
- given: Xiaodi | ||
family: Li | ||
- given: Arun Balajiee Lekshmi | ||
family: Narayanan | ||
- given: Naveen | ||
family: Thomas | ||
- given: Arun R. | ||
family: Srinivasa | ||
date: 2025-01-28 | ||
address: | ||
container-title: Proceedings of Large Foundation Models for Educational Assessment | ||
volume: '264' | ||
genre: inproceedings | ||
issued: | ||
date-parts: | ||
- 2025 | ||
- 1 | ||
- 28 | ||
pdf: https://raw.githubusercontent.com/mlresearch/v264/main/assets/gao25a/gao25a.pdf | ||
extras: [] | ||
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
--- | ||
title: 'Gemini Pro Defeated by GPT-4V: Evidence from Education' | ||
abstract: This study compared the classification performance of Gemini Pro and GPT-4V | ||
in educational settings. Employing visual question-answering (VQA) techniques, the | ||
study examined both models’ ability to read text-based rubrics and automatically | ||
score student-drawn models in science education. We employed quantitative and qualitative | ||
analyses using a dataset derived from student-drawn scientific models and NERIF | ||
(Notation-Enhanced Rubrics for Image Feedback) prompting methods. The findings reveal | ||
that GPT-4V significantly outperforms Gemini Pro regarding scoring accuracy and | ||
quadratic weighted kappa. The qualitative analysis shows that the differences may | ||
be due to the models’ ability to process fine-grained texts in images and overall | ||
image classification performance. Even adapting the NERIF approach by further de-sizing | ||
the input images, Gemini Pro seems unable to perform as well as GPT-4V. The findings | ||
suggest GPT-4V’s superior capability in handling complex multimodal educational | ||
tasks. The study concludes that while both models represent advancements in AI, | ||
GPT-4V’s higher performance makes it a more suitable tool for educational applications | ||
involving multimodal data interpretation. | ||
section: Contributed Papers | ||
layout: inproceedings | ||
series: Proceedings of Machine Learning Research | ||
publisher: PMLR | ||
issn: 2640-3498 | ||
id: lee25a | ||
month: 0 | ||
tex_title: 'Gemini Pro Defeated by GPT-4V: Evidence from Education' | ||
firstpage: 33 | ||
lastpage: 60 | ||
page: 33-60 | ||
order: 33 | ||
cycles: false | ||
bibtex_author: Lee, Gyeonggeon and Shi, Lehong and Latif, Ehsan and Zhai, Xiaoming | ||
author: | ||
- given: Gyeonggeon | ||
family: Lee | ||
- given: Lehong | ||
family: Shi | ||
- given: Ehsan | ||
family: Latif | ||
- given: Xiaoming | ||
family: Zhai | ||
date: 2025-01-28 | ||
address: | ||
container-title: Proceedings of Large Foundation Models for Educational Assessment | ||
volume: '264' | ||
genre: inproceedings | ||
issued: | ||
date-parts: | ||
- 2025 | ||
- 1 | ||
- 28 | ||
pdf: https://raw.githubusercontent.com/mlresearch/v264/main/assets/lee25a/lee25a.pdf | ||
extras: [] | ||
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/ | ||
--- |
Oops, something went wrong.