Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to API #76

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Update to API #76

wants to merge 8 commits into from

Conversation

Krayir5
Copy link

@Krayir5 Krayir5 commented Oct 6, 2024

Summary by CodeRabbit

  • New Features

    • Added a new endpoint to retrieve staff information for specific clubs.
    • Enhanced club profile data with additional fields: position and league tenure.
    • Introduced a class for structured retrieval of club staff data.
  • Improvements

    • Expanded data extraction capabilities for clubs and competitions, including detailed staff and competition attributes.
  • Bug Fixes

    • Updated regular expression patterns for improved data matching.

These changes enhance user experience by providing more detailed information about clubs and their staff.

Copy link

coderabbitai bot commented Oct 6, 2024

Walkthrough

The changes introduce a new GET endpoint in app/api/endpoints/clubs.py for retrieving staff information associated with a specific club. The TransfermarktClubStaffs class is added in app/services/clubs/staff.py to handle the parsing of staff data from the Transfermarkt website. Additionally, the TransfermarktClubProfile class in app/services/clubs/profile.py is updated to include new fields in the club profile response. Several modifications are made to XPath expressions in app/utils/xpath.py and regex patterns in app/utils/regex.py to support these new functionalities.

Changes

File Change Summary
app/api/endpoints/clubs.py Added new GET route /{club_id}/staffs and method get_club_staffs(club_id: str) -> dict.
app/services/clubs/profile.py Updated get_club_profile method to include new fields: position and inLeagueSince, and set updatedAt to current datetime.
app/services/clubs/staff.py Introduced TransfermarktClubStaffs class with methods for parsing staff data and retrieving it.
app/utils/regex.py Updated REGEX_COUNTRY_ID to match multiple digits; added new REGEX_STAFF_ID.
app/utils/xpath.py Added Staff class with new attributes for staff data extraction; updated existing attributes in Players and Competitions classes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API
    participant StaffService
    participant Transfermarkt

    User->>API: GET /{club_id}/staffs
    API->>StaffService: get_club_staffs(club_id)
    StaffService->>Transfermarkt: Request staff data
    Transfermarkt-->>StaffService: Return staff data
    StaffService-->>API: Return staff information
    API-->>User: Return staff information
Loading

🐇 "In the world of clubs, where staff do play,
New routes and data brighten the day.
With regex and XPath, we gather the lore,
For every club's staff, there's so much in store!
Hopping with joy, we cheer and we sing,
For the magic of data that these changes bring!" 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Outside diff range and nitpick comments (4)
app/utils/regex.py (1)

7-7: LGTM: New regex pattern for staff ID extraction.

The new REGEX_STAFF_ID pattern correctly extracts numeric staff IDs from URL paths. However, there are two minor suggestions for improvement:

  1. Consider using double quotes for consistency with other regex patterns in this file.
  2. Add a newline at the end of the file.

Here's a suggested modification:

-REGEX_STAFF_ID: str = r'/small/(\d+)-'
+REGEX_STAFF_ID: str = r"/small/(\d+)-"
+

This change addresses both the quote style and the missing newline.

🧰 Tools
🪛 Ruff

7-7: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)


7-7: No newline at end of file

Add trailing newline

(W292)

app/api/endpoints/clubs.py (2)

Line range hint 1-7: Update imports to support robust implementation

To support the suggested improvements for the new endpoint, consider updating the imports section as follows:

from typing import Optional

from fastapi import APIRouter, HTTPException

from app.services.clubs.players import TransfermarktClubPlayers
from app.services.clubs.profile import TransfermarktClubProfile
from app.services.clubs.search import TransfermarktClubSearch
from app.services.clubs.staff import TransfermarktClubStaffs  # New import
from app.models.staff import StaffResponse  # New import (create this Pydantic model)

These additional imports will support the use of a dedicated staff class, error handling, and a specific response model for the new endpoint.


Line range hint 1-37: Consider overall API improvements

The new endpoint integrates well with the existing structure. However, there are some overall improvements to consider for the entire API:

  1. Implement consistent error handling across all endpoints. This includes handling potential exceptions and returning appropriate HTTP status codes.

  2. Introduce Pydantic models for the responses of all endpoints. This will improve type hinting, validation, and API documentation.

  3. Consider implementing authentication or rate limiting to protect the API from abuse.

Here's a high-level suggestion for implementing these improvements:

  1. Create a base exception handler that can be used across all endpoints.
  2. Define Pydantic models for each endpoint's response.
  3. Implement an authentication middleware using FastAPI's built-in security utilities.

Example of a base exception handler:

from fastapi import HTTPException
from starlette.requests import Request
from starlette.responses import JSONResponse

@app.exception_handler(Exception)
async def generic_exception_handler(request: Request, exc: Exception):
    return JSONResponse(
        status_code=500,
        content={"message": "An unexpected error occurred"},
    )

This approach will enhance the robustness and consistency of your API.

app/services/clubs/staff.py (1)

48-57: Ensure consistent naming conventions for dictionary keys.

In the staff dictionary, the key "lastClub" is in camelCase, whereas other keys are in snake_case. For consistency, consider using the same naming convention for all keys.

Option 1: Convert all keys to snake_case:

             "age": age,
             "nationality": nationality,
             "appointed": appointed,
             "contract": contract,
-            "lastClub": last_club,
+            "last_club": last_club,

Option 2: Convert all keys to camelCase:

-            "id": idx,
+            "staffId": idx,
             "name": name,
             "job": job,
             "age": age,
             "nationality": nationality,
             "appointed": appointed,
             "contract": contract,
             "lastClub": last_club,

Choose the convention that aligns with the rest of your codebase.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 41a0d4b and e74c196.

📒 Files selected for processing (5)
  • app/api/endpoints/clubs.py (1 hunks)
  • app/services/clubs/profile.py (1 hunks)
  • app/services/clubs/staff.py (1 hunks)
  • app/utils/regex.py (1 hunks)
  • app/utils/xpath.py (5 hunks)
🧰 Additional context used
🪛 Ruff
app/services/clubs/staff.py

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

app/utils/regex.py

7-7: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)


7-7: No newline at end of file

Add trailing newline

(W292)

app/utils/xpath.py

99-99: Line too long (137 > 120)

(E501)


138-138: Line too long (126 > 120)

(E501)


139-139: Line too long (126 > 120)

(E501)


187-187: Line too long (128 > 120)

(E501)


188-188: Line too long (126 > 120)

(E501)


189-189: Line too long (128 > 120)

(E501)


190-190: Line too long (151 > 120)

(E501)


192-192: Line too long (122 > 120)

(E501)


193-193: Line too long (127 > 120)

(E501)


196-196: Line too long (132 > 120)

(E501)


197-197: Line too long (123 > 120)

(E501)


219-219: No newline at end of file

Add trailing newline

(W292)

🔇 Additional comments (7)
app/utils/regex.py (2)

5-5: LGTM: Improved flexibility for country ID matching.

The update to REGEX_COUNTRY_ID now allows matching one or more digits instead of a single digit. This change improves the flexibility of the pattern and can accommodate a wider range of country IDs.


Line range hint 1-7: Overall assessment: Improved regex patterns with minor style suggestions.

The changes to this file enhance the flexibility of country ID matching and introduce a new pattern for staff ID extraction. These modifications align well with the PR objectives of updating the API. The suggested minor style improvements will maintain consistency within the file.

🧰 Tools
🪛 Ruff

7-7: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)


7-7: No newline at end of file

Add trailing newline

(W292)

app/services/clubs/profile.py (2)

89-89: 🛠️ Refactor suggestion

Consider using UTC time for updatedAt

The change to set updatedAt to the current time is good for tracking when the profile was last updated. However, datetime.now() returns the local time, which may cause inconsistencies in a distributed system or when dealing with different time zones.

Consider using datetime.utcnow() instead to ensure consistency:

-        self.response["updatedAt"] = datetime.now()
+        self.response["updatedAt"] = datetime.utcnow()

This change will ensure that the timestamp is always in UTC, regardless of the server's local time zone.

Likely invalid or redundant comment.


86-87: LGTM! New league fields added consistently.

The new fields "position" and "inLeagueSince" have been added to the self.response["league"] dictionary in a manner consistent with the existing code structure. The use of self.get_text_by_xpath() with constants from Clubs.Profile aligns with the coding style of the rest of the method.

To ensure these new fields are properly utilized, let's verify their usage in the codebase:

app/services/clubs/staff.py (3)

79-81: Ensure self.response is initialized before use.

The self.response dictionary is being updated, but there is no initialization within this class. Verify that self.response is initialized in the TransfermarktBase class.

If not initialized in the base class, you can initialize it in __post_init__:

 def __post_init__(self) -> None:
     self.URL = self.URL.format(club_id=self.club_id)
+    self.response = {}
     self.page = self.request_url_page()

36-46: Verify that all extracted lists have the same length before zipping.

When creating the staff list, zip() will truncate to the shortest input iterable. If any list is shorter due to missing or extra data, some staff members' information might be incomplete or omitted. Consider adding a check to ensure all lists have the same length.

Add a length check before zipping:

list_lengths = list(map(len, [
    staffs_ids,
    staffs_names,
    staffs_jobs,
    staffs_ages,
    staffs_nationalities,
    staffs_appointed,
    staffs_contracts,
    staffs_last_club,
]))
if len(set(list_lengths)) != 1:
    raise ValueError(f"Data lists have inconsistent lengths: {list_lengths}")

40-40: ⚠️ Potential issue

Use the correct XPath for staff job titles.

The variable staffs_jobs is currently using Clubs.Players.PAGE_INFOS, which may not be appropriate for staff data. Ensure that the correct XPath from Clubs.Staff is used to extract staff job titles accurately.

Consider updating the XPath as follows:

-staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)
+staffs_jobs = self.page.xpath(Clubs.Staff.JOB_TITLE)

Run the following script to verify the available XPath expressions for staff:

✅ Verification successful

Use the correct XPath for staff job titles.

The proposed change to use Clubs.Staff.JOB is appropriate and ensures that the correct XPath is used for extracting staff job titles.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: List available XPaths under Clubs.Staff
grep -A 5 'class Staff' app/utils/xpath.py

Length of output: 543

Comment on lines +33 to +37
@router.get("/{club_id}/staffs")
def get_club_staffs(club_id: str) -> dict:
tfmkt = TransfermarktClubPlayers(club_id=club_id)
club_staffs = tfmkt.get_club_staffs()
return club_staffs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using a dedicated TransfermarktClubStaffs class

The new endpoint for retrieving club staffs is a good addition and follows the existing pattern. However, there are a few points to consider:

  1. The endpoint is using TransfermarktClubPlayers to get staff information. It would be more appropriate to use a dedicated TransfermarktClubStaffs class for better separation of concerns.

  2. There's no error handling or input validation for the club_id. Consider adding try-except blocks and validating the input to ensure robustness.

  3. The return type is a generic dict. For better type hinting and API documentation, consider creating a Pydantic model for the staff data structure.

Here's a suggested refactoring:

from fastapi import APIRouter, HTTPException
from app.services.clubs.staff import TransfermarktClubStaffs
from app.models.staff import StaffResponse  # Create this Pydantic model

@router.get("/{club_id}/staffs", response_model=StaffResponse)
def get_club_staffs(club_id: str) -> StaffResponse:
    try:
        tfmkt = TransfermarktClubStaffs(club_id=club_id)
        club_staffs = tfmkt.get_club_staffs()
        return StaffResponse(**club_staffs)
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))
    except Exception as e:
        raise HTTPException(status_code=500, detail="An unexpected error occurred")

This refactoring introduces error handling, uses a dedicated staff class, and improves type hinting with a Pydantic model.


from app.services.base import TransfermarktBase
from app.utils.regex import REGEX_STAFF_ID
from app.utils.utils import clean_response, extract_from_url, safe_regex
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused import extract_from_url.

The extract_from_url function is imported but not used anywhere in the code. Removing unused imports helps keep the codebase clean and maintainable.

Apply this diff to remove the unused import:

-from app.utils.utils import clean_response, extract_from_url, safe_regex
+from app.utils.utils import clean_response, safe_regex
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from app.utils.utils import clean_response, extract_from_url, safe_regex
from app.utils.utils import clean_response, safe_regex
🧰 Tools
🪛 Ruff

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

app/services/clubs/staff.py Outdated Show resolved Hide resolved
"""
self.response["id"] = self.club_id
self.response["staffs"] = self.__parse_club_staffs()
self.response["updatedAt"] = datetime.now()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use timezone-aware datetime for updatedAt.

Storing timezone-aware timestamps can prevent potential issues related to timezone discrepancies. It's recommended to use UTC for consistent time representation.

Apply this diff to make updatedAt timezone-aware:

-from datetime import datetime
+from datetime import datetime, timezone

 self.response["updatedAt"] = datetime.now()
+self.response["updatedAt"] = datetime.now(timezone.utc)

Committable suggestion was skipped due to low confidence.

Comment on lines +20 to +22
club_id: str = None
URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate club_id during initialization.

The club_id attribute defaults to None, but it is essential for formatting the URL in __post_init__. If club_id is None, formatting the URL will raise a TypeError. Consider adding a validation check to ensure club_id is provided.

Apply this diff to add validation:

 def __post_init__(self) -> None:
+    if not self.club_id:
+        raise ValueError("The 'club_id' must be provided and cannot be None.")
     self.URL = self.URL.format(club_id=self.club_id)
     self.page = self.request_url_page()

Alternatively, make club_id a required field without a default value:

-    club_id: str = None
+    club_id: str
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
club_id: str = None
URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"
club_id: str
URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"
def __post_init__(self) -> None:
if not self.club_id:
raise ValueError("The 'club_id' must be provided and cannot be None.")
self.URL = self.URL.format(club_id=self.club_id)
self.page = self.request_url_page()

CLUB_NAME = "//header//h1//text()"
ID = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@rowspan='2']/img/@src"
NAME = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/a/text()"
JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Line exceeds maximum allowed length (137 > 120 characters).

Line 99 is too long, exceeding the project's maximum line length of 120 characters. This can affect code readability and maintainability.

Consider splitting the XPath expression into multiple lines for better readability:

- JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"
+ JOB = (
+     "//div[@class='large-12 columns']//table[@class='inline-table']"
+     "//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"
+ )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"
JOB = (
"//div[@class='large-12 columns']//table[@class='inline-table']"
"//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"
)
🧰 Tools
🪛 Ruff

99-99: Line too long (137 > 120)

(E501)

@@ -124,6 +135,8 @@ class Profile:
LEAGUE_COUNTRY_ID = "//div[@class='data-header__club-info']//img[contains(@class, 'flaggenrahmen')]//@data-src"
LEAGUE_COUNTRY_NAME = "//div[@class='data-header__club-info']//img[contains(@class, 'flaggenrahmen')]//@title"
LEAGUE_TIER = "//div[@class='data-header__club-info']//strong//text()//following::span[1]/a/text()[2]"
LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Line exceeds maximum allowed length (126 > 120 characters).

Line 138 exceeds the maximum line length of 120 characters, which may hinder code readability.

Consider reformatting the line:

- LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"
+ LEAGUE_POSITION = (
+     "//div[@class='data-header__club-info']//following::span[3]//text()"
+     "//following::span[1]/a/text()[1]"
+ )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"
LEAGUE_POSITION = (
"//div[@class='data-header__club-info']//following::span[3]//text()"
"//following::span[1]/a/text()[1]"
)
🧰 Tools
🪛 Ruff

138-138: Line too long (126 > 120)

(E501)

Comment on lines +192 to +193
NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()"
NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Lines exceed maximum allowed length in Competitions.Profile class.

  • Line 192: 122 characters
  • Line 193: 127 characters

Consider breaking these lines to enhance readability. For instance, line 193 can be refactored:

- NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()"
+ NUMBER_OF_FOREIGNERS = (
+     "//div[@class='data-header__info-box']//div[@class='data-header__details']"
+     "//li[3]/span/a/text()"
+ )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()"
NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()"
NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()"
NUMBER_OF_FOREIGNERS = (
"//div[@class='data-header__info-box']//div[@class='data-header__details']"
"//li[3]/span/a/text()"
)
🧰 Tools
🪛 Ruff

192-192: Line too long (122 > 120)

(E501)


193-193: Line too long (127 > 120)

(E501)

Comment on lines +196 to +197
AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()"
AVERAGE_AGE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[2]/span/text()"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Lines exceed maximum allowed length in Competitions.Profile class.

  • Line 196: 132 characters
  • Line 197: 123 characters

Please refactor these lines for better readability. Example for line 196:

- AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()"
+ AVERAGE_MARKET_VALUE = (
+     "//div[@class='data-header__info-box']//div[@class='data-header__details']"
+     "//ul[2]//li[1]/span/text()"
+ )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()"
AVERAGE_AGE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[2]/span/text()"
AVERAGE_MARKET_VALUE = (
"//div[@class='data-header__info-box']//div[@class='data-header__details']"
"//ul[2]//li[1]/span/text()"
)
AVERAGE_AGE = (
"//div[@class='data-header__info-box']//div[@class='data-header__details']"
"//ul[2]//li[2]/span/text()"
)
🧰 Tools
🪛 Ruff

196-196: Line too long (132 > 120)

(E501)


197-197: Line too long (123 > 120)

(E501)

Comment on lines +187 to +190
REIGNING_CHAMPION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"
RECORD_HOLDING_CHAMPION = "//div[@class='data-header__club-info']//following::span[4]//text()//following::a/text()[1]"
UEFA_COEFFICIENT_POSITION = "//div[@class='data-header__club-info']//following::span[5]//text()//following::a/text()[1]"
UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Lines exceed maximum allowed length in Competitions.Profile class.

Lines 187 to 190 exceed the maximum line length of 120 characters:

  • Line 187: 128 characters
  • Line 188: 126 characters
  • Line 189: 128 characters
  • Line 190: 151 characters

Please split these XPath expressions into multiple lines. For example, line 190 can be refactored as:

- UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]"
+ UEFA_COEFFICIENT_POINT = (
+     "//div[@class='data-header__club-info']//following::span[5]//text()"
+     "//following::span[1]//following::span[1]/text()[1]"
+ )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
REIGNING_CHAMPION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"
RECORD_HOLDING_CHAMPION = "//div[@class='data-header__club-info']//following::span[4]//text()//following::a/text()[1]"
UEFA_COEFFICIENT_POSITION = "//div[@class='data-header__club-info']//following::span[5]//text()//following::a/text()[1]"
UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]"
REIGNING_CHAMPION = (
"//div[@class='data-header__club-info']//following::span[3]//text()"
"//following::span[1]/a/text()[1]"
)
RECORD_HOLDING_CHAMPION = (
"//div[@class='data-header__club-info']//following::span[4]//text()"
"//following::a/text()[1]"
)
UEFA_COEFFICIENT_POSITION = (
"//div[@class='data-header__club-info']//following::span[5]//text()"
"//following::a/text()[1]"
)
UEFA_COEFFICIENT_POINT = (
"//div[@class='data-header__club-info']//following::span[5]//text()"
"//following::span[1]//following::span[1]/text()[1]"
)
🧰 Tools
🪛 Ruff

187-187: Line too long (128 > 120)

(E501)


188-188: Line too long (126 > 120)

(E501)


189-189: Line too long (128 > 120)

(E501)


190-190: Line too long (151 > 120)

(E501)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
app/services/clubs/staff.py (1)

56-56: Inconsistent naming convention for lastClub

The key "lastClub" uses camelCase, whereas the other keys use snake_case. For consistency, consider renaming "lastClub" to "last_club".

Apply this diff to make the naming consistent:

-                    "lastClub": last_club,
+                    "last_club": last_club,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e74c196 and facc94e.

📒 Files selected for processing (1)
  • app/services/clubs/staff.py (1 hunks)
🧰 Additional context used
🪛 Ruff
app/services/clubs/staff.py

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

🔇 Additional comments (1)
app/services/clubs/staff.py (1)

6-6: Remove unused import extract_from_url.

The extract_from_url function is imported but not used anywhere in the code. Removing unused imports helps keep the codebase clean and maintainable.

🧰 Tools
🪛 Ruff

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

Comment on lines 36 to 46
staffs_ids = [
safe_regex(ids, REGEX_STAFF_ID, "id") for ids in self.get_list_by_xpath(Clubs.Staff.ID)
]
staffs_names = self.page.xpath(Clubs.Staff.NAME)
staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)
staffs_ages = self.page.xpath(Clubs.Staff.AGE)
staffs_nationalities = self.page.xpath(Clubs.Staff.NATIONALITIES)
staffs_appointed = self.page.xpath(Clubs.Staff.APPOINTED)
staffs_contracts = self.page.xpath(Clubs.Staff.CONTRACT)
staffs_last_club = self.page.xpath(Clubs.Staff.LAST_CLUB)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure extracted lists have equal lengths before zipping

When zipping the extracted staff data lists, if the lists have unequal lengths, it could lead to data misalignment or runtime errors. Consider adding a check to verify that all lists are of the same length before proceeding.

Apply this diff to add a length check:

# Add this check before zipping the lists
list_lengths = [
    len(staffs_ids),
    len(staffs_names),
    len(staffs_jobs),
    len(staffs_ages),
    len(staffs_nationalities),
    len(staffs_appointed),
    len(staffs_contracts),
    len(staffs_last_club),
]
if len(set(list_lengths)) != 1:
    raise ValueError("Extracted staff data lists have inconsistent lengths.")

Also applies to: 58-68

safe_regex(ids, REGEX_STAFF_ID, "id") for ids in self.get_list_by_xpath(Clubs.Staff.ID)
]
staffs_names = self.page.xpath(Clubs.Staff.NAME)
staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential incorrect XPath reference: Clubs.Players.PAGE_INFOS

In the context of parsing club staff information, staffs_jobs is assigned using Clubs.Players.PAGE_INFOS. This might be incorrect. Consider using Clubs.Staff.JOB or the appropriate attribute under Clubs.Staff to ensure accurate data extraction.

Apply this diff to correct the XPath reference:

-staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)
+staffs_jobs = self.page.xpath(Clubs.Staff.JOB)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)
staffs_jobs = self.page.xpath(Clubs.Staff.JOB)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between facc94e and beeeade.

📒 Files selected for processing (1)
  • app/services/clubs/staff.py (1 hunks)
🧰 Additional context used
🪛 Ruff
app/services/clubs/staff.py

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

🔇 Additional comments (2)
app/services/clubs/staff.py (2)

1-82: Overall, well-structured implementation with minor improvements suggested.

The TransfermarktClubStaffs class provides a solid foundation for retrieving and parsing football club staff information. The suggested improvements (removing unused imports, validating club_id, ensuring data integrity in parsing, and using timezone-aware timestamps) will enhance the robustness and reliability of the code.

Great job on implementing this new feature! The code is well-organized and follows good practices for web scraping and data parsing.

🧰 Tools
🪛 Ruff

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)


6-6: ⚠️ Potential issue

Remove unused import extract_from_url.

The extract_from_url function is imported but not used in this file. Removing unused imports helps maintain a clean and efficient codebase.

Apply this diff to remove the unused import:

-from app.utils.utils import clean_response, extract_from_url, safe_regex
+from app.utils.utils import clean_response, safe_regex

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

Comment on lines +20 to +21
club_id: str = None
URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate club_id during initialization.

The club_id attribute defaults to None, but it's essential for formatting the URL in __post_init__. If club_id is None, formatting the URL will raise a TypeError.

Consider adding a validation check in __post_init__ to ensure club_id is provided:

def __post_init__(self) -> None:
    if not self.club_id:
        raise ValueError("The 'club_id' must be provided and cannot be None.")
    self.URL = self.URL.format(club_id=self.club_id)
    # ... rest of the method

Alternatively, make club_id a required field without a default value:

club_id: str

Comment on lines +29 to +68
def __parse_club_staffs(self) -> list[dict]:
"""
Parse staff information from the webpage and return a list of dictionaries, each representing a staff member.

Returns:
list[dict]: A list of staff information dictionaries.
"""
staffs_ids = [
safe_regex(ids, REGEX_STAFF_ID, "id") for ids in self.get_list_by_xpath(Clubs.Staff.ID)
]
staffs_names = self.page.xpath(Clubs.Staff.NAME)
staffs_jobs = self.page.xpath(Clubs.Staff.JOB)
staffs_ages = self.page.xpath(Clubs.Staff.AGE)
staffs_nationalities = self.page.xpath(Clubs.Staff.NATIONALITIES)
staffs_appointed = self.page.xpath(Clubs.Staff.APPOINTED)
staffs_contracts = self.page.xpath(Clubs.Staff.CONTRACT)
staffs_last_club = self.page.xpath(Clubs.Staff.LAST_CLUB)

return [
{
"id": idx,
"name": name,
"job": job,
"age": age,
"nationality": nationality,
"appointed": appointed,
"contract": contract,
"lastClub": last_club,
}
for idx, name, job, age, nationality, appointed, contract, last_club in zip( # noqa: E501
staffs_ids,
staffs_names,
staffs_jobs,
staffs_ages,
staffs_nationalities,
staffs_appointed,
staffs_contracts,
staffs_last_club,
)
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure extracted lists have equal lengths before zipping.

When zipping the extracted staff data lists, if the lists have unequal lengths, it could lead to data misalignment or runtime errors.

Consider adding a check to verify that all lists are of the same length before proceeding:

def __parse_club_staffs(self) -> list[dict]:
    # ... existing code ...

    list_lengths = [
        len(staffs_ids),
        len(staffs_names),
        len(staffs_jobs),
        len(staffs_ages),
        len(staffs_nationalities),
        len(staffs_appointed),
        len(staffs_contracts),
        len(staffs_last_club),
    ]
    if len(set(list_lengths)) != 1:
        raise ValueError("Extracted staff data lists have inconsistent lengths.")

    # ... rest of the method

This check will ensure data integrity and prevent potential runtime errors.

Comment on lines +70 to +82
def get_club_staffs(self) -> dict:
"""
Retrieve and parse staff information for the specified football club.

Returns:
dict: A dictionary containing the club's unique identifier, staff information, and the timestamp of when
the data was last updated.
"""
self.response["id"] = self.club_id
self.response["staffs"] = self.__parse_club_staffs()
self.response["updatedAt"] = datetime.now()

return clean_response(self.response)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use timezone-aware datetime for updatedAt.

Using timezone-aware timestamps can prevent potential issues related to timezone discrepancies. It's recommended to use UTC for consistent time representation across different systems.

Apply this diff to make updatedAt timezone-aware:

-from datetime import datetime
+from datetime import datetime, timezone

 def get_club_staffs(self) -> dict:
     # ... existing code ...
-    self.response["updatedAt"] = datetime.now()
+    self.response["updatedAt"] = datetime.now(timezone.utc)
     # ... rest of the method

This change ensures that the timestamp is consistently represented in UTC, regardless of the system's local time settings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_club_staffs(self) -> dict:
"""
Retrieve and parse staff information for the specified football club.
Returns:
dict: A dictionary containing the club's unique identifier, staff information, and the timestamp of when
the data was last updated.
"""
self.response["id"] = self.club_id
self.response["staffs"] = self.__parse_club_staffs()
self.response["updatedAt"] = datetime.now()
return clean_response(self.response)
from datetime import datetime, timezone
def get_club_staffs(self) -> dict:
"""
Retrieve and parse staff information for the specified football club.
Returns:
dict: A dictionary containing the club's unique identifier, staff information, and the timestamp of when
the data was last updated.
"""
self.response["id"] = self.club_id
self.response["staffs"] = self.__parse_club_staffs()
self.response["updatedAt"] = datetime.now(timezone.utc)
return clean_response(self.response)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant