-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to API #76
base: main
Are you sure you want to change the base?
Update to API #76
Conversation
WalkthroughThe changes introduce a new GET endpoint in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant API
participant StaffService
participant Transfermarkt
User->>API: GET /{club_id}/staffs
API->>StaffService: get_club_staffs(club_id)
StaffService->>Transfermarkt: Request staff data
Transfermarkt-->>StaffService: Return staff data
StaffService-->>API: Return staff information
API-->>User: Return staff information
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
🧹 Outside diff range and nitpick comments (4)
app/utils/regex.py (1)
7-7
: LGTM: New regex pattern for staff ID extraction.The new
REGEX_STAFF_ID
pattern correctly extracts numeric staff IDs from URL paths. However, there are two minor suggestions for improvement:
- Consider using double quotes for consistency with other regex patterns in this file.
- Add a newline at the end of the file.
Here's a suggested modification:
-REGEX_STAFF_ID: str = r'/small/(\d+)-' +REGEX_STAFF_ID: str = r"/small/(\d+)-" +This change addresses both the quote style and the missing newline.
🧰 Tools
🪛 Ruff
7-7: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
7-7: No newline at end of file
Add trailing newline
(W292)
app/api/endpoints/clubs.py (2)
Line range hint
1-7
: Update imports to support robust implementationTo support the suggested improvements for the new endpoint, consider updating the imports section as follows:
from typing import Optional from fastapi import APIRouter, HTTPException from app.services.clubs.players import TransfermarktClubPlayers from app.services.clubs.profile import TransfermarktClubProfile from app.services.clubs.search import TransfermarktClubSearch from app.services.clubs.staff import TransfermarktClubStaffs # New import from app.models.staff import StaffResponse # New import (create this Pydantic model)These additional imports will support the use of a dedicated staff class, error handling, and a specific response model for the new endpoint.
Line range hint
1-37
: Consider overall API improvementsThe new endpoint integrates well with the existing structure. However, there are some overall improvements to consider for the entire API:
Implement consistent error handling across all endpoints. This includes handling potential exceptions and returning appropriate HTTP status codes.
Introduce Pydantic models for the responses of all endpoints. This will improve type hinting, validation, and API documentation.
Consider implementing authentication or rate limiting to protect the API from abuse.
Here's a high-level suggestion for implementing these improvements:
- Create a base exception handler that can be used across all endpoints.
- Define Pydantic models for each endpoint's response.
- Implement an authentication middleware using FastAPI's built-in security utilities.
Example of a base exception handler:
from fastapi import HTTPException from starlette.requests import Request from starlette.responses import JSONResponse @app.exception_handler(Exception) async def generic_exception_handler(request: Request, exc: Exception): return JSONResponse( status_code=500, content={"message": "An unexpected error occurred"}, )This approach will enhance the robustness and consistency of your API.
app/services/clubs/staff.py (1)
48-57
: Ensure consistent naming conventions for dictionary keys.In the staff dictionary, the key
"lastClub"
is in camelCase, whereas other keys are in snake_case. For consistency, consider using the same naming convention for all keys.Option 1: Convert all keys to snake_case:
"age": age, "nationality": nationality, "appointed": appointed, "contract": contract, - "lastClub": last_club, + "last_club": last_club,Option 2: Convert all keys to camelCase:
- "id": idx, + "staffId": idx, "name": name, "job": job, "age": age, "nationality": nationality, "appointed": appointed, "contract": contract, "lastClub": last_club,Choose the convention that aligns with the rest of your codebase.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- app/api/endpoints/clubs.py (1 hunks)
- app/services/clubs/profile.py (1 hunks)
- app/services/clubs/staff.py (1 hunks)
- app/utils/regex.py (1 hunks)
- app/utils/xpath.py (5 hunks)
🧰 Additional context used
🪛 Ruff
app/services/clubs/staff.py
6-6:
app.utils.utils.extract_from_url
imported but unusedRemove unused import:
app.utils.utils.extract_from_url
(F401)
app/utils/regex.py
7-7: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
7-7: No newline at end of file
Add trailing newline
(W292)
app/utils/xpath.py
99-99: Line too long (137 > 120)
(E501)
138-138: Line too long (126 > 120)
(E501)
139-139: Line too long (126 > 120)
(E501)
187-187: Line too long (128 > 120)
(E501)
188-188: Line too long (126 > 120)
(E501)
189-189: Line too long (128 > 120)
(E501)
190-190: Line too long (151 > 120)
(E501)
192-192: Line too long (122 > 120)
(E501)
193-193: Line too long (127 > 120)
(E501)
196-196: Line too long (132 > 120)
(E501)
197-197: Line too long (123 > 120)
(E501)
219-219: No newline at end of file
Add trailing newline
(W292)
🔇 Additional comments (7)
app/utils/regex.py (2)
5-5
: LGTM: Improved flexibility for country ID matching.The update to
REGEX_COUNTRY_ID
now allows matching one or more digits instead of a single digit. This change improves the flexibility of the pattern and can accommodate a wider range of country IDs.
Line range hint
1-7
: Overall assessment: Improved regex patterns with minor style suggestions.The changes to this file enhance the flexibility of country ID matching and introduce a new pattern for staff ID extraction. These modifications align well with the PR objectives of updating the API. The suggested minor style improvements will maintain consistency within the file.
🧰 Tools
🪛 Ruff
7-7: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
7-7: No newline at end of file
Add trailing newline
(W292)
app/services/clubs/profile.py (2)
89-89
: 🛠️ Refactor suggestionConsider using UTC time for
updatedAt
The change to set
updatedAt
to the current time is good for tracking when the profile was last updated. However,datetime.now()
returns the local time, which may cause inconsistencies in a distributed system or when dealing with different time zones.Consider using
datetime.utcnow()
instead to ensure consistency:- self.response["updatedAt"] = datetime.now() + self.response["updatedAt"] = datetime.utcnow()This change will ensure that the timestamp is always in UTC, regardless of the server's local time zone.
Likely invalid or redundant comment.
86-87
: LGTM! New league fields added consistently.The new fields "position" and "inLeagueSince" have been added to the
self.response["league"]
dictionary in a manner consistent with the existing code structure. The use ofself.get_text_by_xpath()
with constants fromClubs.Profile
aligns with the coding style of the rest of the method.To ensure these new fields are properly utilized, let's verify their usage in the codebase:
app/services/clubs/staff.py (3)
79-81
: Ensureself.response
is initialized before use.The
self.response
dictionary is being updated, but there is no initialization within this class. Verify thatself.response
is initialized in theTransfermarktBase
class.If not initialized in the base class, you can initialize it in
__post_init__
:def __post_init__(self) -> None: self.URL = self.URL.format(club_id=self.club_id) + self.response = {} self.page = self.request_url_page()
36-46
: Verify that all extracted lists have the same length before zipping.When creating the staff list,
zip()
will truncate to the shortest input iterable. If any list is shorter due to missing or extra data, some staff members' information might be incomplete or omitted. Consider adding a check to ensure all lists have the same length.Add a length check before zipping:
list_lengths = list(map(len, [ staffs_ids, staffs_names, staffs_jobs, staffs_ages, staffs_nationalities, staffs_appointed, staffs_contracts, staffs_last_club, ])) if len(set(list_lengths)) != 1: raise ValueError(f"Data lists have inconsistent lengths: {list_lengths}")
40-40
:⚠️ Potential issueUse the correct XPath for staff job titles.
The variable
staffs_jobs
is currently usingClubs.Players.PAGE_INFOS
, which may not be appropriate for staff data. Ensure that the correct XPath fromClubs.Staff
is used to extract staff job titles accurately.Consider updating the XPath as follows:
-staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS) +staffs_jobs = self.page.xpath(Clubs.Staff.JOB_TITLE)Run the following script to verify the available XPath expressions for staff:
✅ Verification successful
Use the correct XPath for staff job titles.
The proposed change to use
Clubs.Staff.JOB
is appropriate and ensures that the correct XPath is used for extracting staff job titles.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: List available XPaths under Clubs.Staff grep -A 5 'class Staff' app/utils/xpath.pyLength of output: 543
@router.get("/{club_id}/staffs") | ||
def get_club_staffs(club_id: str) -> dict: | ||
tfmkt = TransfermarktClubPlayers(club_id=club_id) | ||
club_staffs = tfmkt.get_club_staffs() | ||
return club_staffs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using a dedicated TransfermarktClubStaffs class
The new endpoint for retrieving club staffs is a good addition and follows the existing pattern. However, there are a few points to consider:
-
The endpoint is using
TransfermarktClubPlayers
to get staff information. It would be more appropriate to use a dedicatedTransfermarktClubStaffs
class for better separation of concerns. -
There's no error handling or input validation for the
club_id
. Consider adding try-except blocks and validating the input to ensure robustness. -
The return type is a generic
dict
. For better type hinting and API documentation, consider creating a Pydantic model for the staff data structure.
Here's a suggested refactoring:
from fastapi import APIRouter, HTTPException
from app.services.clubs.staff import TransfermarktClubStaffs
from app.models.staff import StaffResponse # Create this Pydantic model
@router.get("/{club_id}/staffs", response_model=StaffResponse)
def get_club_staffs(club_id: str) -> StaffResponse:
try:
tfmkt = TransfermarktClubStaffs(club_id=club_id)
club_staffs = tfmkt.get_club_staffs()
return StaffResponse(**club_staffs)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail="An unexpected error occurred")
This refactoring introduces error handling, uses a dedicated staff class, and improves type hinting with a Pydantic model.
|
||
from app.services.base import TransfermarktBase | ||
from app.utils.regex import REGEX_STAFF_ID | ||
from app.utils.utils import clean_response, extract_from_url, safe_regex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import extract_from_url
.
The extract_from_url
function is imported but not used anywhere in the code. Removing unused imports helps keep the codebase clean and maintainable.
Apply this diff to remove the unused import:
-from app.utils.utils import clean_response, extract_from_url, safe_regex
+from app.utils.utils import clean_response, safe_regex
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from app.utils.utils import clean_response, extract_from_url, safe_regex | |
from app.utils.utils import clean_response, safe_regex |
🧰 Tools
🪛 Ruff
6-6:
app.utils.utils.extract_from_url
imported but unusedRemove unused import:
app.utils.utils.extract_from_url
(F401)
""" | ||
self.response["id"] = self.club_id | ||
self.response["staffs"] = self.__parse_club_staffs() | ||
self.response["updatedAt"] = datetime.now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use timezone-aware datetime for updatedAt
.
Storing timezone-aware timestamps can prevent potential issues related to timezone discrepancies. It's recommended to use UTC for consistent time representation.
Apply this diff to make updatedAt
timezone-aware:
-from datetime import datetime
+from datetime import datetime, timezone
self.response["updatedAt"] = datetime.now()
+self.response["updatedAt"] = datetime.now(timezone.utc)
Committable suggestion was skipped due to low confidence.
club_id: str = None | ||
URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate club_id
during initialization.
The club_id
attribute defaults to None
, but it is essential for formatting the URL in __post_init__
. If club_id
is None
, formatting the URL will raise a TypeError
. Consider adding a validation check to ensure club_id
is provided.
Apply this diff to add validation:
def __post_init__(self) -> None:
+ if not self.club_id:
+ raise ValueError("The 'club_id' must be provided and cannot be None.")
self.URL = self.URL.format(club_id=self.club_id)
self.page = self.request_url_page()
Alternatively, make club_id
a required field without a default value:
- club_id: str = None
+ club_id: str
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
club_id: str = None | |
URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1" | |
club_id: str | |
URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1" | |
def __post_init__(self) -> None: | |
if not self.club_id: | |
raise ValueError("The 'club_id' must be provided and cannot be None.") | |
self.URL = self.URL.format(club_id=self.club_id) | |
self.page = self.request_url_page() |
CLUB_NAME = "//header//h1//text()" | ||
ID = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@rowspan='2']/img/@src" | ||
NAME = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/a/text()" | ||
JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line exceeds maximum allowed length (137 > 120 characters).
Line 99 is too long, exceeding the project's maximum line length of 120 characters. This can affect code readability and maintainability.
Consider splitting the XPath expression into multiple lines for better readability:
- JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"
+ JOB = (
+ "//div[@class='large-12 columns']//table[@class='inline-table']"
+ "//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()" | |
JOB = ( | |
"//div[@class='large-12 columns']//table[@class='inline-table']" | |
"//td[@class='hauptlink']/following-sibling::tr[1]/td/text()" | |
) |
🧰 Tools
🪛 Ruff
99-99: Line too long (137 > 120)
(E501)
@@ -124,6 +135,8 @@ class Profile: | |||
LEAGUE_COUNTRY_ID = "//div[@class='data-header__club-info']//img[contains(@class, 'flaggenrahmen')]//@data-src" | |||
LEAGUE_COUNTRY_NAME = "//div[@class='data-header__club-info']//img[contains(@class, 'flaggenrahmen')]//@title" | |||
LEAGUE_TIER = "//div[@class='data-header__club-info']//strong//text()//following::span[1]/a/text()[2]" | |||
LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line exceeds maximum allowed length (126 > 120 characters).
Line 138 exceeds the maximum line length of 120 characters, which may hinder code readability.
Consider reformatting the line:
- LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"
+ LEAGUE_POSITION = (
+ "//div[@class='data-header__club-info']//following::span[3]//text()"
+ "//following::span[1]/a/text()[1]"
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]" | |
LEAGUE_POSITION = ( | |
"//div[@class='data-header__club-info']//following::span[3]//text()" | |
"//following::span[1]/a/text()[1]" | |
) |
🧰 Tools
🪛 Ruff
138-138: Line too long (126 > 120)
(E501)
NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()" | ||
NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines exceed maximum allowed length in Competitions.Profile
class.
- Line 192: 122 characters
- Line 193: 127 characters
Consider breaking these lines to enhance readability. For instance, line 193 can be refactored:
- NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()"
+ NUMBER_OF_FOREIGNERS = (
+ "//div[@class='data-header__info-box']//div[@class='data-header__details']"
+ "//li[3]/span/a/text()"
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()" | |
NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()" | |
NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()" | |
NUMBER_OF_FOREIGNERS = ( | |
"//div[@class='data-header__info-box']//div[@class='data-header__details']" | |
"//li[3]/span/a/text()" | |
) |
🧰 Tools
🪛 Ruff
192-192: Line too long (122 > 120)
(E501)
193-193: Line too long (127 > 120)
(E501)
AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()" | ||
AVERAGE_AGE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[2]/span/text()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines exceed maximum allowed length in Competitions.Profile
class.
- Line 196: 132 characters
- Line 197: 123 characters
Please refactor these lines for better readability. Example for line 196:
- AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()"
+ AVERAGE_MARKET_VALUE = (
+ "//div[@class='data-header__info-box']//div[@class='data-header__details']"
+ "//ul[2]//li[1]/span/text()"
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()" | |
AVERAGE_AGE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[2]/span/text()" | |
AVERAGE_MARKET_VALUE = ( | |
"//div[@class='data-header__info-box']//div[@class='data-header__details']" | |
"//ul[2]//li[1]/span/text()" | |
) | |
AVERAGE_AGE = ( | |
"//div[@class='data-header__info-box']//div[@class='data-header__details']" | |
"//ul[2]//li[2]/span/text()" | |
) |
🧰 Tools
🪛 Ruff
196-196: Line too long (132 > 120)
(E501)
197-197: Line too long (123 > 120)
(E501)
REIGNING_CHAMPION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]" | ||
RECORD_HOLDING_CHAMPION = "//div[@class='data-header__club-info']//following::span[4]//text()//following::a/text()[1]" | ||
UEFA_COEFFICIENT_POSITION = "//div[@class='data-header__club-info']//following::span[5]//text()//following::a/text()[1]" | ||
UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines exceed maximum allowed length in Competitions.Profile
class.
Lines 187 to 190 exceed the maximum line length of 120 characters:
- Line 187: 128 characters
- Line 188: 126 characters
- Line 189: 128 characters
- Line 190: 151 characters
Please split these XPath expressions into multiple lines. For example, line 190 can be refactored as:
- UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]"
+ UEFA_COEFFICIENT_POINT = (
+ "//div[@class='data-header__club-info']//following::span[5]//text()"
+ "//following::span[1]//following::span[1]/text()[1]"
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
REIGNING_CHAMPION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]" | |
RECORD_HOLDING_CHAMPION = "//div[@class='data-header__club-info']//following::span[4]//text()//following::a/text()[1]" | |
UEFA_COEFFICIENT_POSITION = "//div[@class='data-header__club-info']//following::span[5]//text()//following::a/text()[1]" | |
UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]" | |
REIGNING_CHAMPION = ( | |
"//div[@class='data-header__club-info']//following::span[3]//text()" | |
"//following::span[1]/a/text()[1]" | |
) | |
RECORD_HOLDING_CHAMPION = ( | |
"//div[@class='data-header__club-info']//following::span[4]//text()" | |
"//following::a/text()[1]" | |
) | |
UEFA_COEFFICIENT_POSITION = ( | |
"//div[@class='data-header__club-info']//following::span[5]//text()" | |
"//following::a/text()[1]" | |
) | |
UEFA_COEFFICIENT_POINT = ( | |
"//div[@class='data-header__club-info']//following::span[5]//text()" | |
"//following::span[1]//following::span[1]/text()[1]" | |
) |
🧰 Tools
🪛 Ruff
187-187: Line too long (128 > 120)
(E501)
188-188: Line too long (126 > 120)
(E501)
189-189: Line too long (128 > 120)
(E501)
190-190: Line too long (151 > 120)
(E501)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
app/services/clubs/staff.py (1)
56-56
: Inconsistent naming convention forlastClub
The key
"lastClub"
uses camelCase, whereas the other keys use snake_case. For consistency, consider renaming"lastClub"
to"last_club"
.Apply this diff to make the naming consistent:
- "lastClub": last_club, + "last_club": last_club,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- app/services/clubs/staff.py (1 hunks)
🧰 Additional context used
🪛 Ruff
app/services/clubs/staff.py
6-6:
app.utils.utils.extract_from_url
imported but unusedRemove unused import:
app.utils.utils.extract_from_url
(F401)
🔇 Additional comments (1)
app/services/clubs/staff.py (1)
6-6
: Remove unused importextract_from_url
.The
extract_from_url
function is imported but not used anywhere in the code. Removing unused imports helps keep the codebase clean and maintainable.🧰 Tools
🪛 Ruff
6-6:
app.utils.utils.extract_from_url
imported but unusedRemove unused import:
app.utils.utils.extract_from_url
(F401)
app/services/clubs/staff.py
Outdated
staffs_ids = [ | ||
safe_regex(ids, REGEX_STAFF_ID, "id") for ids in self.get_list_by_xpath(Clubs.Staff.ID) | ||
] | ||
staffs_names = self.page.xpath(Clubs.Staff.NAME) | ||
staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS) | ||
staffs_ages = self.page.xpath(Clubs.Staff.AGE) | ||
staffs_nationalities = self.page.xpath(Clubs.Staff.NATIONALITIES) | ||
staffs_appointed = self.page.xpath(Clubs.Staff.APPOINTED) | ||
staffs_contracts = self.page.xpath(Clubs.Staff.CONTRACT) | ||
staffs_last_club = self.page.xpath(Clubs.Staff.LAST_CLUB) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure extracted lists have equal lengths before zipping
When zipping the extracted staff data lists, if the lists have unequal lengths, it could lead to data misalignment or runtime errors. Consider adding a check to verify that all lists are of the same length before proceeding.
Apply this diff to add a length check:
# Add this check before zipping the lists
list_lengths = [
len(staffs_ids),
len(staffs_names),
len(staffs_jobs),
len(staffs_ages),
len(staffs_nationalities),
len(staffs_appointed),
len(staffs_contracts),
len(staffs_last_club),
]
if len(set(list_lengths)) != 1:
raise ValueError("Extracted staff data lists have inconsistent lengths.")
Also applies to: 58-68
app/services/clubs/staff.py
Outdated
safe_regex(ids, REGEX_STAFF_ID, "id") for ids in self.get_list_by_xpath(Clubs.Staff.ID) | ||
] | ||
staffs_names = self.page.xpath(Clubs.Staff.NAME) | ||
staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential incorrect XPath reference: Clubs.Players.PAGE_INFOS
In the context of parsing club staff information, staffs_jobs
is assigned using Clubs.Players.PAGE_INFOS
. This might be incorrect. Consider using Clubs.Staff.JOB
or the appropriate attribute under Clubs.Staff
to ensure accurate data extraction.
Apply this diff to correct the XPath reference:
-staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)
+staffs_jobs = self.page.xpath(Clubs.Staff.JOB)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS) | |
staffs_jobs = self.page.xpath(Clubs.Staff.JOB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- app/services/clubs/staff.py (1 hunks)
🧰 Additional context used
🪛 Ruff
app/services/clubs/staff.py
6-6:
app.utils.utils.extract_from_url
imported but unusedRemove unused import:
app.utils.utils.extract_from_url
(F401)
🔇 Additional comments (2)
app/services/clubs/staff.py (2)
1-82
: Overall, well-structured implementation with minor improvements suggested.The
TransfermarktClubStaffs
class provides a solid foundation for retrieving and parsing football club staff information. The suggested improvements (removing unused imports, validatingclub_id
, ensuring data integrity in parsing, and using timezone-aware timestamps) will enhance the robustness and reliability of the code.Great job on implementing this new feature! The code is well-organized and follows good practices for web scraping and data parsing.
🧰 Tools
🪛 Ruff
6-6:
app.utils.utils.extract_from_url
imported but unusedRemove unused import:
app.utils.utils.extract_from_url
(F401)
6-6
:⚠️ Potential issueRemove unused import
extract_from_url
.The
extract_from_url
function is imported but not used in this file. Removing unused imports helps maintain a clean and efficient codebase.Apply this diff to remove the unused import:
-from app.utils.utils import clean_response, extract_from_url, safe_regex +from app.utils.utils import clean_response, safe_regexLikely invalid or redundant comment.
🧰 Tools
🪛 Ruff
6-6:
app.utils.utils.extract_from_url
imported but unusedRemove unused import:
app.utils.utils.extract_from_url
(F401)
club_id: str = None | ||
URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate club_id
during initialization.
The club_id
attribute defaults to None
, but it's essential for formatting the URL in __post_init__
. If club_id
is None
, formatting the URL will raise a TypeError
.
Consider adding a validation check in __post_init__
to ensure club_id
is provided:
def __post_init__(self) -> None:
if not self.club_id:
raise ValueError("The 'club_id' must be provided and cannot be None.")
self.URL = self.URL.format(club_id=self.club_id)
# ... rest of the method
Alternatively, make club_id
a required field without a default value:
club_id: str
def __parse_club_staffs(self) -> list[dict]: | ||
""" | ||
Parse staff information from the webpage and return a list of dictionaries, each representing a staff member. | ||
|
||
Returns: | ||
list[dict]: A list of staff information dictionaries. | ||
""" | ||
staffs_ids = [ | ||
safe_regex(ids, REGEX_STAFF_ID, "id") for ids in self.get_list_by_xpath(Clubs.Staff.ID) | ||
] | ||
staffs_names = self.page.xpath(Clubs.Staff.NAME) | ||
staffs_jobs = self.page.xpath(Clubs.Staff.JOB) | ||
staffs_ages = self.page.xpath(Clubs.Staff.AGE) | ||
staffs_nationalities = self.page.xpath(Clubs.Staff.NATIONALITIES) | ||
staffs_appointed = self.page.xpath(Clubs.Staff.APPOINTED) | ||
staffs_contracts = self.page.xpath(Clubs.Staff.CONTRACT) | ||
staffs_last_club = self.page.xpath(Clubs.Staff.LAST_CLUB) | ||
|
||
return [ | ||
{ | ||
"id": idx, | ||
"name": name, | ||
"job": job, | ||
"age": age, | ||
"nationality": nationality, | ||
"appointed": appointed, | ||
"contract": contract, | ||
"lastClub": last_club, | ||
} | ||
for idx, name, job, age, nationality, appointed, contract, last_club in zip( # noqa: E501 | ||
staffs_ids, | ||
staffs_names, | ||
staffs_jobs, | ||
staffs_ages, | ||
staffs_nationalities, | ||
staffs_appointed, | ||
staffs_contracts, | ||
staffs_last_club, | ||
) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure extracted lists have equal lengths before zipping.
When zipping the extracted staff data lists, if the lists have unequal lengths, it could lead to data misalignment or runtime errors.
Consider adding a check to verify that all lists are of the same length before proceeding:
def __parse_club_staffs(self) -> list[dict]:
# ... existing code ...
list_lengths = [
len(staffs_ids),
len(staffs_names),
len(staffs_jobs),
len(staffs_ages),
len(staffs_nationalities),
len(staffs_appointed),
len(staffs_contracts),
len(staffs_last_club),
]
if len(set(list_lengths)) != 1:
raise ValueError("Extracted staff data lists have inconsistent lengths.")
# ... rest of the method
This check will ensure data integrity and prevent potential runtime errors.
def get_club_staffs(self) -> dict: | ||
""" | ||
Retrieve and parse staff information for the specified football club. | ||
|
||
Returns: | ||
dict: A dictionary containing the club's unique identifier, staff information, and the timestamp of when | ||
the data was last updated. | ||
""" | ||
self.response["id"] = self.club_id | ||
self.response["staffs"] = self.__parse_club_staffs() | ||
self.response["updatedAt"] = datetime.now() | ||
|
||
return clean_response(self.response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use timezone-aware datetime for updatedAt
.
Using timezone-aware timestamps can prevent potential issues related to timezone discrepancies. It's recommended to use UTC for consistent time representation across different systems.
Apply this diff to make updatedAt
timezone-aware:
-from datetime import datetime
+from datetime import datetime, timezone
def get_club_staffs(self) -> dict:
# ... existing code ...
- self.response["updatedAt"] = datetime.now()
+ self.response["updatedAt"] = datetime.now(timezone.utc)
# ... rest of the method
This change ensures that the timestamp is consistently represented in UTC, regardless of the system's local time settings.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def get_club_staffs(self) -> dict: | |
""" | |
Retrieve and parse staff information for the specified football club. | |
Returns: | |
dict: A dictionary containing the club's unique identifier, staff information, and the timestamp of when | |
the data was last updated. | |
""" | |
self.response["id"] = self.club_id | |
self.response["staffs"] = self.__parse_club_staffs() | |
self.response["updatedAt"] = datetime.now() | |
return clean_response(self.response) | |
from datetime import datetime, timezone | |
def get_club_staffs(self) -> dict: | |
""" | |
Retrieve and parse staff information for the specified football club. | |
Returns: | |
dict: A dictionary containing the club's unique identifier, staff information, and the timestamp of when | |
the data was last updated. | |
""" | |
self.response["id"] = self.club_id | |
self.response["staffs"] = self.__parse_club_staffs() | |
self.response["updatedAt"] = datetime.now(timezone.utc) | |
return clean_response(self.response) |
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
These changes enhance user experience by providing more detailed information about clubs and their staff.