Skip to content

Commit 5c661d6

Browse files
committed
updated README.md
1 parent 08bcab2 commit 5c661d6

File tree

3 files changed

+201
-17
lines changed

3 files changed

+201
-17
lines changed

README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -48,13 +48,13 @@ To demonstrate the use of the package, we consider a dataset with two levels of
4848

4949
We further consider a pretrained 'main' model, for example one which employed the small version of [Meta's _DINO_V2_ architecture](https://dinov2.metademolab.com/) and was fine-tuned on ImageNet50 - a subset of the [ImageNet1K dataset](https://www.image-net.org/index.php) with 50 classes (which can be found [here](https://huggingface.co/datasets/lab-v2/ImageNet50)), which we want to analyze its ability to classify both levels of the hierarchy. An instance of such model (which can be found [here](https://huggingface.co/lab-v2/dinov2_vits14_imagenet_lr1e-06_BCE)) has the following performance:
5050

51-
Fine-grain prior combined accuracy: <span style="color:green">76.57</span>% , fine-grain prior combined macro f1: <span style="color:green">76.1</span>%\
52-
Fine-grain prior combined macro precision: <span style="color:green">76.96</span>% , fine-grain prior combined macro recall: <span style="color:green">76.57</span>%
51+
Fine-grain prior combined accuracy: <code style="color:green">76.57%</code> , fine-grain prior combined macro f1: <code style="color:green">76.1%</code>\
52+
Fine-grain prior combined macro precision: <code style="color:green">76.96%</code> , fine-grain prior combined macro recall: <code style="color:green">76.57%</code>
5353

54-
Coarse-grain prior combined accuracy: <span style="color:green">87.14</span>%, coarse-grain prior combined macro f1: <span style="color:green">85.77</span>%\
55-
Coarse-grain prior combined macro precision: <span style="color:green">87.36</span>%, coarse-grain prior combined macro recall: <span style="color:green">84.64</span>%
54+
Coarse-grain prior combined accuracy: <code style="color:green">87.14%</code>, coarse-grain prior combined macro f1: <code style="color:green">85.77%</code>\
55+
Coarse-grain prior combined macro precision: <code style="color:green">87.36%</code>, coarse-grain prior combined macro recall: <code style="color:green">84.64%</code>
5656

57-
Total prior inconsistencies <span style="color:red">133/2100</span> (<span style="color:red">6.33</span>%)
57+
Total prior inconsistencies <code style="color:red">133/2100</code> (<code style="color:red">6.33%</code>)
5858

5959
We also consider a 'secondary' model (which can be found [here](https://huggingface.co/lab-v2/dinov2_vitl14_imagenet_lr1e-06_BCE)), which employed the large version of the DINO_V2 architecture and was also fine-tuned on the ImageNet50 dataset, along with binary models which were trained on each class of the dataset.
6060
Consider the following code snippet to run the `run_experiment` function from PyEDCR.py:
@@ -81,15 +81,15 @@ run_experiment(config=imagenet_config)
8181

8282
The code will initiate the rule learning pipeline, use the rules learned to mark errors in the predictions of the main model, and print out the performance metrics of the algorithm on the error class after running the f-EDR algorithm, which in this case will be:
8383

84-
```
85-
error_accuracy: 89.0%
86-
error_balanced_accuracy: 84.23%
87-
error_precision: 81.65%
88-
error_recall: 74.31%
89-
error_f1: 77.81%
90-
recovered_constraints_precision: 100.0%
91-
recovered_constraints_recall: 59.36%
92-
recovered_constraints_f1_score: 74.5%
84+
```python
85+
error_accuracy: 89.0
86+
error_balanced_accuracy: 84.23
87+
error_precision: 81.65
88+
error_recall: 74.31
89+
error_f1: 77.81
90+
recovered_constraints_precision: 100.0
91+
recovered_constraints_recall: 59.36
92+
recovered_constraints_f1_score: 74.5
9393
```
9494

9595
For further details about the rule learning algorithm, and noise tolerance experiments, please refer to the [paper](https://arxiv.org/abs/2407.15192).
@@ -99,7 +99,7 @@ For further details about the rule learning algorithm, and noise tolerance exper
9999
This research was funded by ARO grant W911NF-24-1-0007.
100100

101101
<p align="center">
102-
<a href="https://arl.devcom.army.mil/who-we-are/aro/">
102+
<a href="https://scai.engineering.asu.edu/">
103103
<img src="https://cdn.shopify.com/s/files/1/1095/6418/files/ASU-sun-devils-new-logo.jpg?v=1481918145" height="150" alt=""/>
104104
</a>
105105
&emsp;

src/PyEDCR/utils/google_sheets_api.py

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
import os
2+
import typing
3+
import time
4+
import numpy as np
5+
6+
import google_auth_oauthlib.flow
7+
import google.auth.transport.requests
8+
import google.oauth2.credentials
9+
import googleapiclient.discovery
10+
import googleapiclient.errors
11+
12+
from src.PyEDCR.utils import paths
13+
14+
15+
with open(fr'{paths.CREDENTIALS_FOLDER}/spreadsheet_id.txt', "r") as file:
16+
# Read the first line and strip any extra whitespace or newline characters
17+
spreadsheet_id = file.readline().strip()
18+
19+
20+
def initiate_api() -> googleapiclient.discovery.Resource:
21+
creds = None
22+
# The file token.json stores the user's access and refresh tokens, and is
23+
# created automatically when the authorization flow completes for the first time.
24+
scopes = ["https://www.googleapis.com/auth/spreadsheets"]
25+
if os.path.exists("../../../token.json"):
26+
creds = (google.oauth2.credentials.Credentials.from_authorized_user_file(filename="../../../token.json",
27+
scopes=scopes))
28+
29+
if not creds or not creds.valid:
30+
if creds and creds.expired and creds.refresh_token:
31+
creds.refresh(google.auth.transport.requests.Request())
32+
else:
33+
flow = google_auth_oauthlib.flow.InstalledAppFlow.from_client_secrets_file(
34+
client_secrets_file="../../../credentials/credentials.json",
35+
scopes=scopes)
36+
creds = flow.run_local_server(port=0)
37+
# Save the credentials for the next run
38+
with open("../../../token.json", "w") as token:
39+
token.write(creds.to_json())
40+
41+
service = googleapiclient.discovery.build(serviceName="sheets",
42+
version="v4",
43+
credentials=creds)
44+
sheet = service.spreadsheets()
45+
46+
return sheet
47+
48+
49+
__sheet: googleapiclient.discovery.Resource = initiate_api()
50+
51+
52+
def get_sheet_tab_name(main_model_name: str,
53+
data_str: str,
54+
secondary_model_name: str = None,
55+
binary: bool = False) -> str:
56+
models_dict = {'vit_b_16': 'VIT_b_16',
57+
'dinov2_vits14': 'DINO V2 VIT14_s',
58+
'dinov2_vitl14': 'DINO V2 VIT14_l',
59+
'tresnet_m': 'Tresnet M',
60+
'vit_l_16': 'VIT_l_16'}
61+
data_dict = {'military_vehicles': 'Military Vehicles',
62+
'imagenet': 'ImageNet',
63+
'openimage': 'OpenImage',
64+
'coco': 'COCO'}
65+
main_model_name_str = models_dict[main_model_name]
66+
data_set_str = data_dict[data_str]
67+
68+
secondary_model_str = ((' with ' + models_dict[secondary_model_name])
69+
if secondary_model_name is not None else '')
70+
binary_str = ' with Binary' if binary else ''
71+
72+
return f"{main_model_name_str} on {data_set_str}{binary_str}{secondary_model_str}"
73+
74+
75+
def exponential_backoff(func: typing.Callable) -> typing.Callable:
76+
"""Decorator to retry with exponential backoff when rate limited."""
77+
78+
def wrapper(*args, **kwargs):
79+
wait = 30 # Start with 30 seconds
80+
while True:
81+
try:
82+
return func(*args, **kwargs)
83+
except googleapiclient.errors.HttpError as e:
84+
error_code = e.resp.status
85+
if error_code == 429:
86+
print(f"Rate limit exceeded, waiting {wait} seconds...")
87+
time.sleep(wait)
88+
wait *= 1.1 # Exponential backoff
89+
else:
90+
print(e)
91+
time.sleep(60)
92+
93+
return wrapper
94+
95+
96+
@exponential_backoff
97+
def update_sheet(range_: str,
98+
body: typing.Dict[str, typing.List[typing.List[typing.Union[float, str]]]]):
99+
"""Function to update Google Sheet and handle retries on rate limits."""
100+
101+
result = __sheet.values().update(
102+
spreadsheetId=spreadsheet_id,
103+
range=range_,
104+
valueInputOption='USER_ENTERED',
105+
body=body).execute()
106+
107+
print(f"{result.get('updatedCells')} cell updated to {range_}")
108+
109+
110+
@exponential_backoff
111+
def find_empty_rows_in_column(sheet_tab_name: str,
112+
column_letter: str):
113+
# Fetch the column data
114+
values = __sheet.values().get(spreadsheetId=spreadsheet_id,
115+
range=f'{sheet_tab_name}!{column_letter}:{column_letter}').execute().get('values', [])
116+
117+
total_value_num = len(values)
118+
119+
# Identify empty rows
120+
empty_row_indices = []
121+
for index, value in enumerate(values, start=1): # Starts counting from 1 (Google Sheets row numbers)
122+
if not value: # If the list is empty, the row is empty
123+
empty_row_indices.append(index)
124+
125+
return empty_row_indices, total_value_num
126+
127+
128+
@exponential_backoff
129+
def get_values_from_columns(sheet_tab_name: str,
130+
column_letters: typing.List[str]):
131+
ranges = [f'{sheet_tab_name}!{letter}2:{letter}' for letter in column_letters]
132+
response = __sheet.values().batchGet(
133+
spreadsheetId=spreadsheet_id,
134+
ranges=ranges
135+
).execute()
136+
137+
return [np.array([e[0].strip('%') if e[0] != 'None' else 0
138+
for e in response_i.get('values', []) if e[0] != '#N/A'],
139+
dtype=float) for response_i in response['valueRanges']]
140+
141+
142+
@exponential_backoff
143+
def get_maximal_epsilon(sheet_tab_name: str):
144+
# Specify the separate ranges to fetch
145+
data_range_b_to_e = f'{sheet_tab_name}!B2:E'
146+
data_range_g = f'{sheet_tab_name}!G2:G'
147+
column_a_range = f'{sheet_tab_name}!A2:A'
148+
149+
# Fetch the data using batchGet
150+
response = __sheet.values().batchGet(
151+
spreadsheetId=spreadsheet_id,
152+
ranges=[data_range_b_to_e, data_range_g, column_a_range]
153+
).execute()
154+
155+
# Extract the values for each range
156+
data_values_b_to_e = response['valueRanges'][0].get('values', [])
157+
data_values_g = response['valueRanges'][1].get('values', [])
158+
column_a_values = response['valueRanges'][2].get('values', [])
159+
160+
# Standardize the length of each row
161+
max_length_b_to_e = max((len(row) for row in data_values_b_to_e), default=0)
162+
data_values_b_to_e = [row + [None] * (max_length_b_to_e - len(row)) for row in data_values_b_to_e]
163+
164+
max_length_g = max((len(row) for row in data_values_g), default=0)
165+
data_values_g = [row + [None] * (max_length_g - len(row)) for row in data_values_g]
166+
167+
# Convert data to NumPy arrays, handling percentages and missing values
168+
data_array_b_to_e = np.array(
169+
[[float(item.strip('%')) if isinstance(item, str) and item else 0 for item in row] for row in
170+
data_values_b_to_e])
171+
data_array_g = np.array([[float(row[0]) if row and row[0] else 0] for row in data_values_g])
172+
173+
# Concatenate columns B-E with column G
174+
data_array = np.hstack((data_array_b_to_e, data_array_g))
175+
176+
# Calculate the sum of each row using NumPy's sum function along axis 1 (rows)
177+
row_sums = np.sum(data_array, axis=1)
178+
179+
# Find the index of the row with the maximum sum
180+
max_index = np.argmax(row_sums)
181+
182+
# Retrieve the value from column A for the row with the maximum sum
183+
if max_index < len(column_a_values):
184+
return column_a_values[max_index][0]
185+
else:
186+
return None

src/PyEDCR/utils/paths.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
import pathlib
22

3-
from google.auth.environment_vars import CREDENTIALS
4-
53
ROOT_PATH = pathlib.Path(__file__).parent.parent.parent.parent
64
DATA_FOLDER = rf'{ROOT_PATH}/data'
75
RESULTS_FOLDER = rf'{ROOT_PATH}/results'

0 commit comments

Comments
 (0)