Skip to content

Feature: DBLP service #2339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9d23a44
feat: add dockerfile for dblp service
akshatchhabra Sep 12, 2024
a8fa500
feat: add dblp publication service files
akshatchhabra Sep 12, 2024
62418ca
Merge branch 'master' into feature/dblp-service
akshatchhabra Sep 12, 2024
a0441de
Merge branch 'master' into feature/dblp-service
akshatchhabra Oct 7, 2024
b985658
docs: Add README
akshatchhabra Oct 10, 2024
661d97d
fix: fix paths
akshatchhabra Oct 10, 2024
32c8e38
fix: stream json data
akshatchhabra Oct 11, 2024
5f42cb6
Merge branch 'master' into feature/dblp-service
akshatchhabra Oct 11, 2024
1615240
fix: skip users with no modified publications
akshatchhabra Oct 18, 2024
82656df
feat: get dblp.xml and dblp.dtd
akshatchhabra Oct 18, 2024
1d4d108
fix: increase max heap size
akshatchhabra Oct 18, 2024
449e174
Merge branch 'master' into feature/dblp-service
akshatchhabra Oct 18, 2024
9eaf27f
fix: remove file from gitignore
akshatchhabra Oct 18, 2024
f4e5290
Merge branch 'feature/dblp-service' of github.com:openreview/openrevi…
akshatchhabra Oct 18, 2024
b9aaa31
fix: use dblp files from bucket
akshatchhabra Oct 21, 2024
90920bc
Merge branch 'master' into feature/dblp-service
akshatchhabra Jan 17, 2025
3f7c17b
fix: request failing and escaped xml
akshatchhabra Jan 28, 2025
d9bea87
test: update dblp publication service
akshatchhabra Jan 28, 2025
2a09bcd
feat: update dblp publication service
akshatchhabra Jan 28, 2025
e286e00
Merge branch 'master' into feature/dblp-service
akshatchhabra Jan 28, 2025
8d76419
fix: use new payload
akshatchhabra Feb 13, 2025
e3d442e
fix: change return payload schema
akshatchhabra Feb 13, 2025
005a6a4
Merge branch 'feature/dblp-service' of github.com:openreview/openrevi…
akshatchhabra Feb 13, 2025
c02c752
test: tests for different cases for importing dblp publications
akshatchhabra Feb 19, 2025
b37e9e6
test: fix dblp service tests
akshatchhabra Feb 20, 2025
fb7ab17
test: fix get_note_edits param
akshatchhabra Feb 21, 2025
6f05197
test: update assert statements
akshatchhabra Feb 21, 2025
01cebdd
Merge branch 'master' into feature/dblp-service
carlosmondra Mar 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 35 additions & 3 deletions openreview/profile/management.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import os
import requests
import openreview
from openreview.stages import *
import xml.etree.ElementTree as ET

class ProfileManagement():

Expand Down Expand Up @@ -1239,9 +1241,39 @@ def set_merge_profiles_invitations(self):
)
)


@classmethod
def upload_dblp_publications(ProfileManagenment, client, url):
def update_dblp_publications(client, date):
res = requests.get(
f"dblp.openreview.net/dblp-records/{date}", timeout=(10, 600)
)
recently_modified_publications = res.json()

requests.get(url)
for auth in recently_modified_publications:
publications = recently_modified_publications[auth]
for pub_xml in publications:
# parse xml
xml_tree = ET.fromstring(pub_xml)
title = xml_tree.find("title").text
authors = [author.text for author in xml_tree.findall("author")]
venue = xml_tree.find("journal").text



client.post_note_edit(
invitation="DBLP.org/-/Record",
signatures=["DBLP.org"],
content={"xml": {"value": pub_xml}},
note=openreview.api.Note(
content={
"title": {
"value": title,
},
"authors": {
"value": authors,
},
"venue": {
"value": venue,
},
}
),
)
3 changes: 3 additions & 0 deletions services/dblp_publication_service/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# DBLP Publication Service

This service fetches dblp publications added after a date provided by the user.
130 changes: 130 additions & 0 deletions services/dblp_publication_service/app/DblpParser.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
//
// Copyright (c)2015, dblp Team (University of Trier and
// Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik GmbH)
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are met:
//
// (1) Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// (2) Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// (3) Neither the name of the dblp team nor the names of its contributors
// may be used to endorse or promote products derived from this software
// without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
// ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
// WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
// DISCLAIMED. IN NO EVENT SHALL DBLP TEAM BE LIABLE FOR ANY
// DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
// (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
// ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
package app;

import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.time.LocalDate;

import org.dblp.mmdb.Field;
import org.dblp.mmdb.Person;
import org.dblp.mmdb.PersonName;
import org.dblp.mmdb.Publication;
import org.dblp.mmdb.RecordDb;
import org.dblp.mmdb.RecordDbInterface;
import org.dblp.mmdb.TableOfContents;
import org.dblp.mmdb.Mmdb;
import org.xml.sax.SAXException;

import com.google.gson.Gson;
// import com.google.gson.GsonBuilder;

@SuppressWarnings("javadoc")
class DblpParser {

public static void main(String[] args) {

// we need to raise entityExpansionLimit because the dblp.xml has millions of
// entities
System.setProperty("entityExpansionLimit", "10000000");

if (args.length != 3) {
System.err.format("Usage: java %s <dblp-xml-file> <dblp-dtd-file> <last-retrieved-date>\n",
DblpParser.class.getName());
System.exit(0);
}
String dblpXmlFilename = args[0];
String dblpDtdFilename = args[1];
LocalDate date = LocalDate.parse(args[2]);

HashMap<String, List<String>> recentlyModifiedHashMap = new HashMap<String, List<String>>();

System.out.println("building the dblp main memory DB ...");
RecordDbInterface dblp;
try {
dblp = new RecordDb(dblpXmlFilename, dblpDtdFilename, false);
} catch (final IOException ex) {
System.err.println("cannot read dblp XML: " + ex.getMessage());
return;
} catch (final SAXException ex) {
System.err.println("cannot parse XML: " + ex.getMessage());
return;
}
System.out.format("MMDB ready: %d publs, %d pers\n\n", dblp.numberOfPublications(), dblp.numberOfPersons());

System.out.println("Finding all publications modified after " + date.toString());
Collection<Person> allPeople = dblp.getPersons();

for (Person person : allPeople) {
// get the latest mdate, if it is greater than the date we get, retrieve all
// records for that person.
LocalDate mDate = LocalDate.parse(person.getAggregatedMdate());
if (mDate.equals(date) || mDate.isAfter(date)) {
List<Publication> publications = person.getPublications();
for (Publication p : publications) {
LocalDate publicationMDate = LocalDate.parse(p.getMdate());
if (publicationMDate.equals(date) || publicationMDate.isAfter(date)) {
if (recentlyModifiedHashMap.containsKey(p.getXml())) {
recentlyModifiedHashMap.get(p.getXml()).add(person.getPid());
} else {
List<String> modifiedPublicationAuthors = new ArrayList<String>();
modifiedPublicationAuthors.add(person.getPid());
recentlyModifiedHashMap.put(p.getXml(), modifiedPublicationAuthors);
}
}
}
}
}

// System.out.println(recentlyModifiedHashMap.toString());

String filePath = "app/data/recentlyModified.json";
System.out.println("Writing publications modified after " + date.toString() + " to " + filePath);

String jsonString = new Gson().toJson(recentlyModifiedHashMap);

try (FileWriter file = new FileWriter(filePath)) {
file.write(jsonString);
System.out.println("Successfully wrote JSON string to file: " + filePath);
} catch (IOException e) {
e.printStackTrace();
}

System.out.println("done.");
}
}
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# This script will create an endpoint that returns the publications of a user
from typing import Union
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from app.dblp_publication_model import generate_data

app = FastAPI()

@app.get("/")
def read_root():
return {"Hello": "World"}

@app.get("/dblp-records/{date}")
def latest_dblp_records(date: str):
return StreamingResponse(generate_data(date), media_type="text/json")
82 changes: 82 additions & 0 deletions services/dblp_publication_service/app/dblp_publication_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import os
import json
import subprocess
import requests
import gzip
import html


def json_generator(data):
first_key = True
yield "{" # Start of the JSON object
for key, articles in data.items(): # Iterate over each key in the dictionary
if not first_key:
yield ", " # Add a comma between keys if not the first key
else:
first_key = False
yield f'"{key}": [' # Yield the key with quotes
for i, article in enumerate(articles):
if i > 0:
yield ", " # Add comma between articles
yield json.dumps(html.unescape(article.encode('utf-8').decode('unicode_escape'))) # Stream each article as a JSON string
yield "]"
yield "}" # End of the JSON object


def decompress_and_save_gz(gz_file_path, output_file_path):
print(f"Decompressing {gz_file_path}")
with gzip.open(gz_file_path, 'rb') as gz_file:
with open(output_file_path, 'wb') as output_file:
output_file.write(gz_file.read())
print(f"Saved decompressed file as {output_file_path}")

def get_dblp_file(url, filename):
directory = os.path.dirname(filename)
if not os.path.exists(directory):
os.makedirs(directory)
print(f"Downloading {filename}")
res = requests.get(url)
if res.status_code == 200:
with open(filename, "wb") as file:
file.write(res.content)
print(f"{filename} downloaded successfully.")
else:
print(f"Failed to download {filename}. Status code: {res.status_code}")


def generate_data(date):
get_dblp_file("https://storage.googleapis.com/openreview-public/DBLP/dblp.dtd", "app/data/dblp.dtd")
get_dblp_file("https://storage.googleapis.com/openreview-public/DBLP/dblp.xml.gz", "app/data/dblp.xml.gz")
decompress_and_save_gz("app/data/dblp.xml.gz", "app/data/dblp.xml")
try:
# call the java code
# java code creates the file

# Step 1: Compile the Java program
javac_command = ["javac", "-cp", "app/libs/*", "app/DblpParser.java"]
subprocess.run(javac_command, check=True)

# Step 2: Run the Java program with the necessary arguments
java_command = [
"java",
"-Xmx24G",
"-cp",
".:app/libs/*",
"app.DblpParser",
"app/data/dblp.xml",
"app/data/dblp.dtd",
date,
]
subprocess.run(java_command, check=True)

# Step 3: Open the recentlyModified.json file
json_file_path = os.path.join("app/data", "recentlyModified.json")

with open("app/data/recentlyModified.json", "r") as f:
json_data = json.load(f)
return json_generator(json_data)

except subprocess.CalledProcessError as e:
print(f"An error occurred while running Java commands: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
Binary file not shown.
Binary file not shown.
41 changes: 41 additions & 0 deletions services/dblp_publication_service/dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Use the official Ubuntu base image
FROM ubuntu:20.04

# Set the environment to non-interactive (prevents user input during package installation)
ENV DEBIAN_FRONTEND=noninteractive

# Update the package list and install Python, Java, and other necessary tools
RUN apt-get update && \
apt-get install -y \
python3 \
python3-pip \
openjdk-8-jdk \
curl \
unzip \
&& apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Set environment variables for Java
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
ENV PATH="$JAVA_HOME/bin:$PATH"

# Verify installations (optional)
RUN java -version && python3 --version && pip3 --version

# Set the working directory inside the container
WORKDIR /app

# Copy requirements file if Python dependencies are needed
COPY requirements.txt .

# Install Python dependencies
RUN pip3 install --no-cache-dir -r requirements.txt

# Copy the rest of the application code into the container
COPY . .

# Expose the port the app will run on (change as needed)
EXPOSE 8080

# Define the command to run the application (update this based on your project)
CMD ["uvicorn", "app.dblp_publication_controller:app", "--host", "0.0.0.0", "--port", "8080"]
36 changes: 36 additions & 0 deletions services/dblp_publication_service/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
annotated-types==0.7.0
anyio==4.4.0
certifi==2024.7.4
charset-normalizer==3.4.0
click==8.1.7
dnspython==2.6.1
email_validator==2.2.0
fastapi==0.112.2
fastapi-cli==0.0.5
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.2
idna==3.8
Jinja2==3.1.4
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
pydantic==2.8.2
pydantic_core==2.20.1
Pygments==2.18.0
python-dotenv==1.0.1
python-multipart==0.0.9
PyYAML==6.0.2
requests==2.32.3
rich==13.8.0
shellingham==1.5.4
sniffio==1.3.1
starlette==0.38.2
typer==0.12.5
typing_extensions==4.12.2
urllib3==2.2.3
uvicorn==0.30.6
uvloop==0.20.0
watchfiles==0.24.0
websockets==13.0.1
Loading