Skip to content

Commit

Permalink
Add pages for volume v264
Browse files Browse the repository at this point in the history
  • Loading branch information
lawrennd committed Jan 28, 2025
0 parents commit eeafd69
Show file tree
Hide file tree
Showing 18 changed files with 1,037 additions and 0 deletions.
101 changes: 101 additions & 0 deletions FMEduAssess2024.bib

Large diffs are not rendered by default.

101 changes: 101 additions & 0 deletions FMEduAssess2024_corrected.bib

Large diffs are not rendered by default.

15 changes: 15 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
source "https://rubygems.org"

git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }

gem 'jekyll'

group :jekyll_plugins do
gem 'github-pages'
gem 'jekyll-remote-theme'
gem 'jekyll-include-cache'
gem 'webrick'
end

# gem "rails"

25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# PMLR 264

To suggest fixes to this volume please make a pull request containing the changes requested and a justification for the changes.

To edit the details of this conference work edit the [_config.yml](./_config.yml) file and submit a pull request.

To make changes to the individual paper details, edit the associated paper file in the [./_posts](./_posts) subdirectory.

For details of how to publish in PMLR please check https://proceedings.mlr.press/faq.html

For details of what is required to submit a proceedings please check https://proceedings.mlr.press/spec.html



Published as Volume 264 by the Proceedings of Machine Learning Research on 28 January 2025.

Volume Edited by:
* Sheng Li
* Zhongmin Cui
* Jiasen Lu
* Deborah Harris
* Shumin Jing

Series Editors:
* Neil D. Lawrence
93 changes: 93 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
booktitle: Proceedings of Large Foundation Models for Educational Assessment
shortname: FM-EduAssess2024
year: '2024'
volume: '264'
start: &1 2024-12-15
end: 2024-12-16
published: 2025-01-28
sections:
- name: Preface
title: Preface
- name: Contributed Papers
title: Contributed Papers
layout: proceedings
series: Proceedings of Machine Learning Research
publisher: PMLR
issn: 2640-3498
id: FM-EduAssess2024
month: 0
cycles: false
bibtex_editor: Li, Sheng and Cui, Zhongmin and Lu, Jiasen and Harris, Deborah and
Jing, Shumin
editor:
- given: Sheng
family: Li
- given: Zhongmin
family: Cui
- given: Jiasen
family: Lu
- given: Deborah
family: Harris
- given: Shumin
family: Jing
title: Proceedings of Machine Learning Research
description: |
Proceedings of Large Foundation Models for Educational Assessment
Held in Vancouver, BC, Canada on 15-16 December 2024
Published as Volume 264 by the Proceedings of Machine Learning Research on 28 January 2025.
Volume Edited by:
Sheng Li
Zhongmin Cui
Jiasen Lu
Deborah Harris
Shumin Jing
Series Editors:
Neil D. Lawrence
date_str: 15--16 Dec
url: https://proceedings.mlr.press
author:
name: PMLR
baseurl: "/v264"
twitter_username: MLResearchPress
github_username: mlresearch
markdown: kramdown
exclude:
- README.md
- Gemfile
- ".gitignore"
plugins:
- jekyll-feed
- jekyll-seo-tag
- jekyll-remote-theme
remote_theme: mlresearch/jekyll-theme
style: pmlr
permalink: "/:title.html"
ghub:
edit: true
repository: v264
display:
copy_button:
bibtex: true
endnote: true
apa: true
comments: false
volume_type: Volume
volume_dir: v264
email: ''
conference:
name: Large Foundation Models for Educational Assessment
url: https://neurips2024edu.github.io/
location: Vancouver, BC, Canada
dates:
- *1
- 2024-12-16
analytics:
google:
tracking_id: UA-92432422-1
orig_bibfile: "/Users/neil/mlresearch/v264/FMEduAssess2024_corrected.bib"
# Site settings
# Original source: /Users/neil/mlresearch/v264/FMEduAssess2024_corrected.bib
42 changes: 42 additions & 0 deletions _posts/2025-01-28-bleiweiss25a.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: A Large Foundation Model for Assessing Spatially Distributed Personality Traits
abstract: We explored emulating textually encoded personality information in a large
language model. Given its predominant empirical validation, we chose the five-factor
model of personality compiled for a broad range of natural languages. Our study
assessed personality traits from a multicultural viewpoint over a diverse set of
thirty universal contexts. Thus, contributing to the wider comprehension of generalizing
relationships among personality traits across cultures. We administered psychometric
tests to the language model, examined links between location and personality, and
cross validated measures at various levels of trait hierarchy.
section: Contributed Papers
layout: inproceedings
series: Proceedings of Machine Learning Research
publisher: PMLR
issn: 2640-3498
id: bleiweiss25a
month: 0
tex_title: A Large Foundation Model for Assessing Spatially Distributed Personality
Traits
firstpage: 173
lastpage: 185
page: 173-185
order: 173
cycles: false
bibtex_author: Bleiweiss, Avi
author:
- given: Avi
family: Bleiweiss
date: 2025-01-28
address:
container-title: Proceedings of Large Foundation Models for Educational Assessment
volume: '264'
genre: inproceedings
issued:
date-parts:
- 2025
- 1
- 28
pdf: https://raw.githubusercontent.com/mlresearch/v264/main/assets/bleiweiss25a/bleiweiss25a.pdf
extras: []
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/
---
55 changes: 55 additions & 0 deletions _posts/2025-01-28-deroy25a.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: 'MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question
Generation'
abstract: Automatic question generation is a critical task that involves evaluating
question quality by considering factors such as engagement, pedagogical value, and
the ability to stimulate critical thinking. These aspects require human-like understanding
and judgment, which automated systems currently lack. However, human evaluations
are costly and impractical for large-scale samples of generated questions. Therefore,
we propose a novel system, MIRROR (Multi-LLM Iterative Review and Response for Optimized
Rating), which leverages large language models (LLMs) to automate the evaluation
process for questions generated by automated question generation systems. We experimented
with several state-of-the-art LLMs, such as GPT-4, Gemini, and Llama2-70b. We observed
that the scores of human evaluation metrics, namely relevance, appropriateness,
novelty, complexity, and grammaticality, improved when using the feedback-based
approach called MIRROR, tending to be closer to the human baseline scores. Furthermore,
we observed that Pearson’s correlation coefficient between GPT-4 and human experts
improved when using our proposed feedback-based approach, MIRROR, compared to direct
prompting for evaluation. Error analysis shows that our proposed approach, MIRROR,
significantly helps to improve relevance and appropriateness.
section: Contributed Papers
layout: inproceedings
series: Proceedings of Machine Learning Research
publisher: PMLR
issn: 2640-3498
id: deroy25a
month: 0
tex_title: 'MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question
Generation'
firstpage: 3
lastpage: 32
page: 3-32
order: 3
cycles: false
bibtex_author: Deroy, Aniket and Maity, Subhankar and Sarkar, Sudeshna
author:
- given: Aniket
family: Deroy
- given: Subhankar
family: Maity
- given: Sudeshna
family: Sarkar
date: 2025-01-28
address:
container-title: Proceedings of Large Foundation Models for Educational Assessment
volume: '264'
genre: inproceedings
issued:
date-parts:
- 2025
- 1
- 28
pdf: https://raw.githubusercontent.com/mlresearch/v264/main/assets/deroy25a/deroy25a.pdf
extras: []
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/
---
67 changes: 67 additions & 0 deletions _posts/2025-01-28-gao25a.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: 'Towards Scalable Automated Grading: Leveraging Large Language Models for Conceptual
Question Evaluation in Engineering'
abstract: This study explores the feasibility of using large language models (LLMs),
specifically GPT-4o (ChatGPT), for automated grading of conceptual questions in
an undergraduate Mechanical Engineering course. We compared the grading performance
of GPT-4o with that of human teaching assistants (TAs) on ten quiz problems from
the MEEN 361 course at Texas AThis study explores the feasibility of using large
language models (LLMs), specifically GPT-4o (ChatGPT), for automated grading of
conceptual questions in an undergraduate Mechanical Engineering course. We compared
the grading performance of GPT-4o with that of human teaching assistants (TAs) on
ten quiz problems from the MEEN 361 course at Texas A&M University, each answered
by approximately 225 students. Both the LLM and TAs followed the same instructor-provided
rubric to ensure grading consistency. We evaluated performance using Spearman’s
rank correlation coefficient and Root Mean Square Error (RMSE) to assess the alignment
between rankings and the accuracy of scores assigned by GPT-4o and TAs under zero-
and few-shot grading settings. In the zero-shot setting, GPT-4o demonstrated a strong
correlation with TA grading, with Spearman’s rank correlation coefficient exceeding
0.6 in seven out of ten datasets and reaching a high of 0.9387. Our analysis reveals
that GPT-4o performs well when grading criteria are straightforward but struggles
with nuanced answers, particularly those involving synonyms not present in the rubric.
The model also tends to grade more stringently in ambiguous cases compared to human
TAs. Overall, ChatGPT shows promise as a tool for grading conceptual questions,
offering scalability and consistency.
section: Contributed Papers
layout: inproceedings
series: Proceedings of Machine Learning Research
publisher: PMLR
issn: 2640-3498
id: gao25a
month: 0
tex_title: 'Towards Scalable Automated Grading: Leveraging Large Language Models for
Conceptual Question Evaluation in Engineering'
firstpage: 186
lastpage: 206
page: 186-206
order: 186
cycles: false
bibtex_author: Gao, Rujun and Guo, Xiaosu and Li, Xiaodi and Narayanan, Arun Balajiee
Lekshmi and Thomas, Naveen and Srinivasa, Arun R.
author:
- given: Rujun
family: Gao
- given: Xiaosu
family: Guo
- given: Xiaodi
family: Li
- given: Arun Balajiee Lekshmi
family: Narayanan
- given: Naveen
family: Thomas
- given: Arun R.
family: Srinivasa
date: 2025-01-28
address:
container-title: Proceedings of Large Foundation Models for Educational Assessment
volume: '264'
genre: inproceedings
issued:
date-parts:
- 2025
- 1
- 28
pdf: https://raw.githubusercontent.com/mlresearch/v264/main/assets/gao25a/gao25a.pdf
extras: []
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/
---
54 changes: 54 additions & 0 deletions _posts/2025-01-28-lee25a.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: 'Gemini Pro Defeated by GPT-4V: Evidence from Education'
abstract: This study compared the classification performance of Gemini Pro and GPT-4V
in educational settings. Employing visual question-answering (VQA) techniques, the
study examined both models’ ability to read text-based rubrics and automatically
score student-drawn models in science education. We employed quantitative and qualitative
analyses using a dataset derived from student-drawn scientific models and NERIF
(Notation-Enhanced Rubrics for Image Feedback) prompting methods. The findings reveal
that GPT-4V significantly outperforms Gemini Pro regarding scoring accuracy and
quadratic weighted kappa. The qualitative analysis shows that the differences may
be due to the models’ ability to process fine-grained texts in images and overall
image classification performance. Even adapting the NERIF approach by further de-sizing
the input images, Gemini Pro seems unable to perform as well as GPT-4V. The findings
suggest GPT-4V’s superior capability in handling complex multimodal educational
tasks. The study concludes that while both models represent advancements in AI,
GPT-4V’s higher performance makes it a more suitable tool for educational applications
involving multimodal data interpretation.
section: Contributed Papers
layout: inproceedings
series: Proceedings of Machine Learning Research
publisher: PMLR
issn: 2640-3498
id: lee25a
month: 0
tex_title: 'Gemini Pro Defeated by GPT-4V: Evidence from Education'
firstpage: 33
lastpage: 60
page: 33-60
order: 33
cycles: false
bibtex_author: Lee, Gyeonggeon and Shi, Lehong and Latif, Ehsan and Zhai, Xiaoming
author:
- given: Gyeonggeon
family: Lee
- given: Lehong
family: Shi
- given: Ehsan
family: Latif
- given: Xiaoming
family: Zhai
date: 2025-01-28
address:
container-title: Proceedings of Large Foundation Models for Educational Assessment
volume: '264'
genre: inproceedings
issued:
date-parts:
- 2025
- 1
- 28
pdf: https://raw.githubusercontent.com/mlresearch/v264/main/assets/lee25a/lee25a.pdf
extras: []
# Format based on Martin Fenner's citeproc: https://blog.front-matter.io/posts/citeproc-yaml-for-bibliographies/
---
Loading

0 comments on commit eeafd69

Please sign in to comment.