Update to API #76

Krayir5 · 2024-10-06T12:52:35Z

Summary by CodeRabbit

New Features
- Added a new endpoint to retrieve staff information for specific clubs.
- Enhanced club profile data with additional fields: position and league tenure.
- Introduced a class for structured retrieval of club staff data.
Improvements
- Expanded data extraction capabilities for clubs and competitions, including detailed staff and competition attributes.
Bug Fixes
- Updated regular expression patterns for improved data matching.

These changes enhance user experience by providing more detailed information about clubs and their staff.

coderabbitai · 2024-10-06T12:52:42Z

Walkthrough

The changes introduce a new GET endpoint in app/api/endpoints/clubs.py for retrieving staff information associated with a specific club. The TransfermarktClubStaffs class is added in app/services/clubs/staff.py to handle the parsing of staff data from the Transfermarkt website. Additionally, the TransfermarktClubProfile class in app/services/clubs/profile.py is updated to include new fields in the club profile response. Several modifications are made to XPath expressions in app/utils/xpath.py and regex patterns in app/utils/regex.py to support these new functionalities.

Changes

File	Change Summary
app/api/endpoints/clubs.py	Added new GET route `/{club_id}/staffs` and method `get_club_staffs(club_id: str) -> dict`.
app/services/clubs/profile.py	Updated `get_club_profile` method to include new fields: `position` and `inLeagueSince`, and set `updatedAt` to current datetime.
app/services/clubs/staff.py	Introduced `TransfermarktClubStaffs` class with methods for parsing staff data and retrieving it.
app/utils/regex.py	Updated `REGEX_COUNTRY_ID` to match multiple digits; added new `REGEX_STAFF_ID`.
app/utils/xpath.py	Added `Staff` class with new attributes for staff data extraction; updated existing attributes in `Players` and `Competitions` classes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API
    participant StaffService
    participant Transfermarkt

    User->>API: GET /{club_id}/staffs
    API->>StaffService: get_club_staffs(club_id)
    StaffService->>Transfermarkt: Request staff data
    Transfermarkt-->>StaffService: Return staff data
    StaffService-->>API: Return staff information
    API-->>User: Return staff information

🐇 "In the world of clubs, where staff do play,
New routes and data brighten the day.
With regex and XPath, we gather the lore,
For every club's staff, there's so much in store!
Hopping with joy, we cheer and we sing,
For the magic of data that these changes bring!" 🐇

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 13

🧹 Outside diff range and nitpick comments (4)

app/utils/regex.py (1)
7-7: LGTM: New regex pattern for staff ID extraction.

The new REGEX_STAFF_ID pattern correctly extracts numeric staff IDs from URL paths. However, there are two minor suggestions for improvement:

Consider using double quotes for consistency with other regex patterns in this file.

Add a newline at the end of the file.

Here's a suggested modification:
-REGEX_STAFF_ID: str = r'/small/(\d+)-'
+REGEX_STAFF_ID: str = r"/small/(\d+)-"
+
This change addresses both the quote style and the missing newline.

🧰 Tools

🪛 Ruff

7-7: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)

7-7: No newline at end of file

Add trailing newline

(W292)
app/api/endpoints/clubs.py (2)
Line range hint 1-7: Update imports to support robust implementation

To support the suggested improvements for the new endpoint, consider updating the imports section as follows:
from typing import Optional

from fastapi import APIRouter, HTTPException

from app.services.clubs.players import TransfermarktClubPlayers
from app.services.clubs.profile import TransfermarktClubProfile
from app.services.clubs.search import TransfermarktClubSearch
from app.services.clubs.staff import TransfermarktClubStaffs  # New import
from app.models.staff import StaffResponse  # New import (create this Pydantic model)
These additional imports will support the use of a dedicated staff class, error handling, and a specific response model for the new endpoint.

Line range hint 1-37: Consider overall API improvements

The new endpoint integrates well with the existing structure. However, there are some overall improvements to consider for the entire API:

Implement consistent error handling across all endpoints. This includes handling potential exceptions and returning appropriate HTTP status codes.

Introduce Pydantic models for the responses of all endpoints. This will improve type hinting, validation, and API documentation.

Consider implementing authentication or rate limiting to protect the API from abuse.

Here's a high-level suggestion for implementing these improvements:

Create a base exception handler that can be used across all endpoints.

Define Pydantic models for each endpoint's response.

Implement an authentication middleware using FastAPI's built-in security utilities.

Example of a base exception handler:
from fastapi import HTTPException
from starlette.requests import Request
from starlette.responses import JSONResponse

@app.exception_handler(Exception)
async def generic_exception_handler(request: Request, exc: Exception):
    return JSONResponse(
        status_code=500,
        content={"message": "An unexpected error occurred"},
    )
This approach will enhance the robustness and consistency of your API.
app/services/clubs/staff.py (1)
48-57: Ensure consistent naming conventions for dictionary keys.

In the staff dictionary, the key "lastClub" is in camelCase, whereas other keys are in snake_case. For consistency, consider using the same naming convention for all keys.

Option 1: Convert all keys to snake_case:
             "age": age,
             "nationality": nationality,
             "appointed": appointed,
             "contract": contract,
-            "lastClub": last_club,
+            "last_club": last_club,
Option 2: Convert all keys to camelCase:
-            "id": idx,
+            "staffId": idx,
             "name": name,
             "job": job,
             "age": age,
             "nationality": nationality,
             "appointed": appointed,
             "contract": contract,
             "lastClub": last_club,
Choose the convention that aligns with the rest of your codebase.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 41a0d4b and e74c196.

📒 Files selected for processing (5)

app/api/endpoints/clubs.py (1 hunks)
app/services/clubs/profile.py (1 hunks)
app/services/clubs/staff.py (1 hunks)
app/utils/regex.py (1 hunks)
app/utils/xpath.py (5 hunks)

🧰 Additional context used

🪛 Ruff

app/services/clubs/staff.py

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

app/utils/regex.py

7-7: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)

7-7: No newline at end of file

Add trailing newline

(W292)

app/utils/xpath.py

99-99: Line too long (137 > 120)

(E501)

138-138: Line too long (126 > 120)

(E501)

139-139: Line too long (126 > 120)

(E501)

187-187: Line too long (128 > 120)

(E501)

188-188: Line too long (126 > 120)

(E501)

189-189: Line too long (128 > 120)

(E501)

190-190: Line too long (151 > 120)

(E501)

192-192: Line too long (122 > 120)

(E501)

193-193: Line too long (127 > 120)

(E501)

196-196: Line too long (132 > 120)

(E501)

197-197: Line too long (123 > 120)

(E501)

219-219: No newline at end of file

Add trailing newline

(W292)

🔇 Additional comments (7)

app/utils/regex.py (2)

5-5: LGTM: Improved flexibility for country ID matching.

The update to REGEX_COUNTRY_ID now allows matching one or more digits instead of a single digit. This change improves the flexibility of the pattern and can accommodate a wider range of country IDs.

Line range hint 1-7: Overall assessment: Improved regex patterns with minor style suggestions.

The changes to this file enhance the flexibility of country ID matching and introduce a new pattern for staff ID extraction. These modifications align well with the PR objectives of updating the API. The suggested minor style improvements will maintain consistency within the file.

🧰 Tools

🪛 Ruff

7-7: Single quotes found but double quotes preferred

Replace single quotes with double quotes

(Q000)

7-7: No newline at end of file

Add trailing newline

(W292)
app/services/clubs/profile.py (2)
89-89: 🛠️ Refactor suggestion

Consider using UTC time for updatedAt

The change to set updatedAt to the current time is good for tracking when the profile was last updated. However, datetime.now() returns the local time, which may cause inconsistencies in a distributed system or when dealing with different time zones.

Consider using datetime.utcnow() instead to ensure consistency:
-        self.response["updatedAt"] = datetime.now()
+        self.response["updatedAt"] = datetime.utcnow()
This change will ensure that the timestamp is always in UTC, regardless of the server's local time zone.

Likely invalid or redundant comment.

86-87: LGTM! New league fields added consistently.

The new fields "position" and "inLeagueSince" have been added to the self.response["league"] dictionary in a manner consistent with the existing code structure. The use of self.get_text_by_xpath() with constants from Clubs.Profile aligns with the coding style of the rest of the method.

To ensure these new fields are properly utilized, let's verify their usage in the codebase:
app/services/clubs/staff.py (3)
79-81: Ensure self.response is initialized before use.

The self.response dictionary is being updated, but there is no initialization within this class. Verify that self.response is initialized in the TransfermarktBase class.

If not initialized in the base class, you can initialize it in __post_init__:
 def __post_init__(self) -> None:
     self.URL = self.URL.format(club_id=self.club_id)
+    self.response = {}
     self.page = self.request_url_page()
36-46: Verify that all extracted lists have the same length before zipping.

When creating the staff list, zip() will truncate to the shortest input iterable. If any list is shorter due to missing or extra data, some staff members' information might be incomplete or omitted. Consider adding a check to ensure all lists have the same length.

Add a length check before zipping:
list_lengths = list(map(len, [
    staffs_ids,
    staffs_names,
    staffs_jobs,
    staffs_ages,
    staffs_nationalities,
    staffs_appointed,
    staffs_contracts,
    staffs_last_club,
]))
if len(set(list_lengths)) != 1:
    raise ValueError(f"Data lists have inconsistent lengths: {list_lengths}")
40-40: ⚠️ Potential issue

Use the correct XPath for staff job titles.

The variable staffs_jobs is currently using Clubs.Players.PAGE_INFOS, which may not be appropriate for staff data. Ensure that the correct XPath from Clubs.Staff is used to extract staff job titles accurately.

Consider updating the XPath as follows:
-staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)
+staffs_jobs = self.page.xpath(Clubs.Staff.JOB_TITLE)
Run the following script to verify the available XPath expressions for staff:

✅ Verification successful

Use the correct XPath for staff job titles.

The proposed change to use Clubs.Staff.JOB is appropriate and ensures that the correct XPath is used for extracting staff job titles.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: List available XPaths under Clubs.Staff
grep -A 5 'class Staff' app/utils/xpath.py
Length of output: 543

coderabbitai · 2024-10-06T12:58:01Z

app/api/endpoints/clubs.py

+@router.get("/{club_id}/staffs")
+def get_club_staffs(club_id: str) -> dict:
+    tfmkt = TransfermarktClubPlayers(club_id=club_id)
+    club_staffs = tfmkt.get_club_staffs()
+    return club_staffs


🛠️ Refactor suggestion

Consider using a dedicated TransfermarktClubStaffs class

The new endpoint for retrieving club staffs is a good addition and follows the existing pattern. However, there are a few points to consider:

The endpoint is using TransfermarktClubPlayers to get staff information. It would be more appropriate to use a dedicated TransfermarktClubStaffs class for better separation of concerns.

There's no error handling or input validation for the club_id. Consider adding try-except blocks and validating the input to ensure robustness.

The return type is a generic dict. For better type hinting and API documentation, consider creating a Pydantic model for the staff data structure.

Here's a suggested refactoring:

from fastapi import APIRouter, HTTPException from app.services.clubs.staff import TransfermarktClubStaffs from app.models.staff import StaffResponse # Create this Pydantic model @router.get("/{club_id}/staffs", response_model=StaffResponse) def get_club_staffs(club_id: str) -> StaffResponse: try: tfmkt = TransfermarktClubStaffs(club_id=club_id) club_staffs = tfmkt.get_club_staffs() return StaffResponse(**club_staffs) except ValueError as e: raise HTTPException(status_code=400, detail=str(e)) except Exception as e: raise HTTPException(status_code=500, detail="An unexpected error occurred")

This refactoring introduces error handling, uses a dedicated staff class, and improves type hinting with a Pydantic model.

coderabbitai · 2024-10-06T12:58:01Z

app/services/clubs/staff.py

+
+from app.services.base import TransfermarktBase
+from app.utils.regex import REGEX_STAFF_ID
+from app.utils.utils import clean_response, extract_from_url, safe_regex


⚠️ Potential issue

Remove unused import extract_from_url.

The extract_from_url function is imported but not used anywhere in the code. Removing unused imports helps keep the codebase clean and maintainable.

Apply this diff to remove the unused import:

-from app.utils.utils import clean_response, extract_from_url, safe_regex +from app.utils.utils import clean_response, safe_regex

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from app.utils.utils import clean_response, extract_from_url, safe_regex

from app.utils.utils import clean_response, safe_regex

🧰 Tools

🪛 Ruff

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

app/services/clubs/staff.py

coderabbitai · 2024-10-06T12:58:01Z

app/services/clubs/staff.py

+        """
+        self.response["id"] = self.club_id
+        self.response["staffs"] = self.__parse_club_staffs()
+        self.response["updatedAt"] = datetime.now()


⚠️ Potential issue

Use timezone-aware datetime for updatedAt.

Storing timezone-aware timestamps can prevent potential issues related to timezone discrepancies. It's recommended to use UTC for consistent time representation.

Apply this diff to make updatedAt timezone-aware:

-from datetime import datetime +from datetime import datetime, timezone self.response["updatedAt"] = datetime.now() +self.response["updatedAt"] = datetime.now(timezone.utc)

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-10-06T12:58:01Z

app/services/clubs/staff.py

+    club_id: str = None
+    URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"
+


⚠️ Potential issue

Validate club_id during initialization.

The club_id attribute defaults to None, but it is essential for formatting the URL in __post_init__. If club_id is None, formatting the URL will raise a TypeError. Consider adding a validation check to ensure club_id is provided.

Apply this diff to add validation:

def __post_init__(self) -> None: + if not self.club_id: + raise ValueError("The 'club_id' must be provided and cannot be None.") self.URL = self.URL.format(club_id=self.club_id) self.page = self.request_url_page()

Alternatively, make club_id a required field without a default value:

- club_id: str = None + club_id: str

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

club_id: str = None

URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"

club_id: str

URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"

def __post_init__(self) -> None:

if not self.club_id:

raise ValueError("The 'club_id' must be provided and cannot be None.")

self.URL = self.URL.format(club_id=self.club_id)

self.page = self.request_url_page()

coderabbitai · 2024-10-06T12:58:02Z

app/utils/xpath.py

+        CLUB_NAME = "//header//h1//text()"
+        ID = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@rowspan='2']/img/@src"
+        NAME = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/a/text()"
+        JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"


⚠️ Potential issue

Line exceeds maximum allowed length (137 > 120 characters).

Line 99 is too long, exceeding the project's maximum line length of 120 characters. This can affect code readability and maintainability.

Consider splitting the XPath expression into multiple lines for better readability:

- JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()" + JOB = ( + "//div[@class='large-12 columns']//table[@class='inline-table']" + "//td[@class='hauptlink']/following-sibling::tr[1]/td/text()" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"

JOB = (

"//div[@class='large-12 columns']//table[@class='inline-table']"

"//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"

)

🧰 Tools

🪛 Ruff

99-99: Line too long (137 > 120)

(E501)

coderabbitai · 2024-10-06T12:58:02Z

app/utils/xpath.py

@@ -124,6 +135,8 @@ class Profile:
        LEAGUE_COUNTRY_ID = "//div[@class='data-header__club-info']//img[contains(@class, 'flaggenrahmen')]//@data-src"
        LEAGUE_COUNTRY_NAME = "//div[@class='data-header__club-info']//img[contains(@class, 'flaggenrahmen')]//@title"
        LEAGUE_TIER = "//div[@class='data-header__club-info']//strong//text()//following::span[1]/a/text()[2]"
+        LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"


⚠️ Potential issue

Line exceeds maximum allowed length (126 > 120 characters).

Line 138 exceeds the maximum line length of 120 characters, which may hinder code readability.

Consider reformatting the line:

- LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]" + LEAGUE_POSITION = ( + "//div[@class='data-header__club-info']//following::span[3]//text()" + "//following::span[1]/a/text()[1]" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"

LEAGUE_POSITION = (

"//div[@class='data-header__club-info']//following::span[3]//text()"

"//following::span[1]/a/text()[1]"

)

🧰 Tools

🪛 Ruff

138-138: Line too long (126 > 120)

(E501)

coderabbitai · 2024-10-06T12:58:02Z

app/utils/xpath.py

+        NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()"
+        NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()"


⚠️ Potential issue

Lines exceed maximum allowed length in Competitions.Profile class.

Line 192: 122 characters

Line 193: 127 characters

Consider breaking these lines to enhance readability. For instance, line 193 can be refactored:

- NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()" + NUMBER_OF_FOREIGNERS = ( + "//div[@class='data-header__info-box']//div[@class='data-header__details']" + "//li[3]/span/a/text()" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()"

NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()"

NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()"

NUMBER_OF_FOREIGNERS = (

"//div[@class='data-header__info-box']//div[@class='data-header__details']"

"//li[3]/span/a/text()"

)

🧰 Tools

🪛 Ruff

192-192: Line too long (122 > 120)

(E501)

193-193: Line too long (127 > 120)

(E501)

coderabbitai · 2024-10-06T12:58:02Z

app/utils/xpath.py

+        AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()"
+        AVERAGE_AGE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[2]/span/text()"


⚠️ Potential issue

Lines exceed maximum allowed length in Competitions.Profile class.

Line 196: 132 characters

Line 197: 123 characters

Please refactor these lines for better readability. Example for line 196:

- AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()" + AVERAGE_MARKET_VALUE = ( + "//div[@class='data-header__info-box']//div[@class='data-header__details']" + "//ul[2]//li[1]/span/text()" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()"

AVERAGE_AGE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[2]/span/text()"

AVERAGE_MARKET_VALUE = (

"//div[@class='data-header__info-box']//div[@class='data-header__details']"

"//ul[2]//li[1]/span/text()"

)

AVERAGE_AGE = (

"//div[@class='data-header__info-box']//div[@class='data-header__details']"

"//ul[2]//li[2]/span/text()"

)

🧰 Tools

🪛 Ruff

196-196: Line too long (132 > 120)

(E501)

197-197: Line too long (123 > 120)

(E501)

coderabbitai · 2024-10-06T12:58:02Z

app/utils/xpath.py

+        REIGNING_CHAMPION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"
+        RECORD_HOLDING_CHAMPION = "//div[@class='data-header__club-info']//following::span[4]//text()//following::a/text()[1]"
+        UEFA_COEFFICIENT_POSITION = "//div[@class='data-header__club-info']//following::span[5]//text()//following::a/text()[1]"
+        UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]"


⚠️ Potential issue

Lines exceed maximum allowed length in Competitions.Profile class.

Lines 187 to 190 exceed the maximum line length of 120 characters:

Line 187: 128 characters

Line 188: 126 characters

Line 189: 128 characters

Line 190: 151 characters

Please split these XPath expressions into multiple lines. For example, line 190 can be refactored as:

- UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]" + UEFA_COEFFICIENT_POINT = ( + "//div[@class='data-header__club-info']//following::span[5]//text()" + "//following::span[1]//following::span[1]/text()[1]" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

REIGNING_CHAMPION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"

RECORD_HOLDING_CHAMPION = "//div[@class='data-header__club-info']//following::span[4]//text()//following::a/text()[1]"

UEFA_COEFFICIENT_POSITION = "//div[@class='data-header__club-info']//following::span[5]//text()//following::a/text()[1]"

UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]"

REIGNING_CHAMPION = (

"//div[@class='data-header__club-info']//following::span[3]//text()"

"//following::span[1]/a/text()[1]"

)

RECORD_HOLDING_CHAMPION = (

"//div[@class='data-header__club-info']//following::span[4]//text()"

"//following::a/text()[1]"

)

UEFA_COEFFICIENT_POSITION = (

"//div[@class='data-header__club-info']//following::span[5]//text()"

"//following::a/text()[1]"

)

UEFA_COEFFICIENT_POINT = (

"//div[@class='data-header__club-info']//following::span[5]//text()"

"//following::span[1]//following::span[1]/text()[1]"

)

🧰 Tools

🪛 Ruff

187-187: Line too long (128 > 120)

(E501)

188-188: Line too long (126 > 120)

(E501)

189-189: Line too long (128 > 120)

(E501)

190-190: Line too long (151 > 120)

(E501)

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)

app/services/clubs/staff.py (1)
56-56: Inconsistent naming convention for lastClub

The key "lastClub" uses camelCase, whereas the other keys use snake_case. For consistency, consider renaming "lastClub" to "last_club".

Apply this diff to make the naming consistent:
-                    "lastClub": last_club,
+                    "last_club": last_club,

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e74c196 and facc94e.

📒 Files selected for processing (1)

app/services/clubs/staff.py (1 hunks)

🧰 Additional context used

🪛 Ruff

app/services/clubs/staff.py

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

🔇 Additional comments (1)

app/services/clubs/staff.py (1)

6-6: Remove unused import extract_from_url.

The extract_from_url function is imported but not used anywhere in the code. Removing unused imports helps keep the codebase clean and maintainable.

🧰 Tools

🪛 Ruff

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

coderabbitai · 2024-10-06T14:19:46Z

app/services/clubs/staff.py

+        staffs_ids = [
+            safe_regex(ids, REGEX_STAFF_ID, "id") for ids in self.get_list_by_xpath(Clubs.Staff.ID)
+        ]
+        staffs_names = self.page.xpath(Clubs.Staff.NAME)
+        staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)
+        staffs_ages = self.page.xpath(Clubs.Staff.AGE)
+        staffs_nationalities = self.page.xpath(Clubs.Staff.NATIONALITIES)
+        staffs_appointed = self.page.xpath(Clubs.Staff.APPOINTED)
+        staffs_contracts = self.page.xpath(Clubs.Staff.CONTRACT)
+        staffs_last_club = self.page.xpath(Clubs.Staff.LAST_CLUB)
+


⚠️ Potential issue

Ensure extracted lists have equal lengths before zipping

When zipping the extracted staff data lists, if the lists have unequal lengths, it could lead to data misalignment or runtime errors. Consider adding a check to verify that all lists are of the same length before proceeding.

Apply this diff to add a length check:

# Add this check before zipping the lists list_lengths = [ len(staffs_ids), len(staffs_names), len(staffs_jobs), len(staffs_ages), len(staffs_nationalities), len(staffs_appointed), len(staffs_contracts), len(staffs_last_club), ] if len(set(list_lengths)) != 1: raise ValueError("Extracted staff data lists have inconsistent lengths.")

Also applies to: 58-68

coderabbitai · 2024-10-06T14:19:46Z

app/services/clubs/staff.py

+            safe_regex(ids, REGEX_STAFF_ID, "id") for ids in self.get_list_by_xpath(Clubs.Staff.ID)
+        ]
+        staffs_names = self.page.xpath(Clubs.Staff.NAME)
+        staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)


⚠️ Potential issue

Potential incorrect XPath reference: Clubs.Players.PAGE_INFOS

In the context of parsing club staff information, staffs_jobs is assigned using Clubs.Players.PAGE_INFOS. This might be incorrect. Consider using Clubs.Staff.JOB or the appropriate attribute under Clubs.Staff to ensure accurate data extraction.

Apply this diff to correct the XPath reference:

-staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS) +staffs_jobs = self.page.xpath(Clubs.Staff.JOB)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)

staffs_jobs = self.page.xpath(Clubs.Staff.JOB)

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between facc94e and beeeade.

📒 Files selected for processing (1)

app/services/clubs/staff.py (1 hunks)

🧰 Additional context used

🪛 Ruff

app/services/clubs/staff.py

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

🔇 Additional comments (2)

app/services/clubs/staff.py (2)
1-82: Overall, well-structured implementation with minor improvements suggested.

The TransfermarktClubStaffs class provides a solid foundation for retrieving and parsing football club staff information. The suggested improvements (removing unused imports, validating club_id, ensuring data integrity in parsing, and using timezone-aware timestamps) will enhance the robustness and reliability of the code.

Great job on implementing this new feature! The code is well-organized and follows good practices for web scraping and data parsing.

🧰 Tools

🪛 Ruff

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

6-6: ⚠️ Potential issue

Remove unused import extract_from_url.

The extract_from_url function is imported but not used in this file. Removing unused imports helps maintain a clean and efficient codebase.

Apply this diff to remove the unused import:
-from app.utils.utils import clean_response, extract_from_url, safe_regex
+from app.utils.utils import clean_response, safe_regex
Likely invalid or redundant comment.

🧰 Tools

🪛 Ruff

6-6: app.utils.utils.extract_from_url imported but unused

Remove unused import: app.utils.utils.extract_from_url

(F401)

coderabbitai · 2024-10-06T16:38:56Z

app/services/clubs/staff.py

+    club_id: str = None
+    URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"


⚠️ Potential issue

Validate club_id during initialization.

The club_id attribute defaults to None, but it's essential for formatting the URL in __post_init__. If club_id is None, formatting the URL will raise a TypeError.

Consider adding a validation check in __post_init__ to ensure club_id is provided:

def __post_init__(self) -> None: if not self.club_id: raise ValueError("The 'club_id' must be provided and cannot be None.") self.URL = self.URL.format(club_id=self.club_id) # ... rest of the method

Alternatively, make club_id a required field without a default value:

club_id: str

coderabbitai · 2024-10-06T16:38:56Z

app/services/clubs/staff.py

+    def __parse_club_staffs(self) -> list[dict]:
+        """
+        Parse staff information from the webpage and return a list of dictionaries, each representing a staff member.
+
+        Returns:
+            list[dict]: A list of staff information dictionaries.
+        """
+        staffs_ids = [
+            safe_regex(ids, REGEX_STAFF_ID, "id") for ids in self.get_list_by_xpath(Clubs.Staff.ID)
+        ]
+        staffs_names = self.page.xpath(Clubs.Staff.NAME)
+        staffs_jobs = self.page.xpath(Clubs.Staff.JOB)
+        staffs_ages = self.page.xpath(Clubs.Staff.AGE)
+        staffs_nationalities = self.page.xpath(Clubs.Staff.NATIONALITIES)
+        staffs_appointed = self.page.xpath(Clubs.Staff.APPOINTED)
+        staffs_contracts = self.page.xpath(Clubs.Staff.CONTRACT)
+        staffs_last_club = self.page.xpath(Clubs.Staff.LAST_CLUB)
+
+        return [
+            {
+                "id": idx,
+                "name": name,
+                "job": job,
+                "age": age,
+                "nationality": nationality,
+                "appointed": appointed,
+                "contract": contract,
+                "lastClub": last_club,
+            }
+            for idx, name, job, age, nationality, appointed, contract, last_club in zip(  # noqa: E501
+                staffs_ids,
+                staffs_names,
+                staffs_jobs,
+                staffs_ages,
+                staffs_nationalities,
+                staffs_appointed,
+                staffs_contracts,
+                staffs_last_club,
+            )
+        ]


⚠️ Potential issue

Ensure extracted lists have equal lengths before zipping.

When zipping the extracted staff data lists, if the lists have unequal lengths, it could lead to data misalignment or runtime errors.

Consider adding a check to verify that all lists are of the same length before proceeding:

def __parse_club_staffs(self) -> list[dict]: # ... existing code ... list_lengths = [ len(staffs_ids), len(staffs_names), len(staffs_jobs), len(staffs_ages), len(staffs_nationalities), len(staffs_appointed), len(staffs_contracts), len(staffs_last_club), ] if len(set(list_lengths)) != 1: raise ValueError("Extracted staff data lists have inconsistent lengths.") # ... rest of the method

This check will ensure data integrity and prevent potential runtime errors.

coderabbitai · 2024-10-06T16:38:56Z

app/services/clubs/staff.py

+    def get_club_staffs(self) -> dict:
+        """
+        Retrieve and parse staff information for the specified football club.
+
+        Returns:
+            dict: A dictionary containing the club's unique identifier, staff information, and the timestamp of when
+                  the data was last updated.
+        """
+        self.response["id"] = self.club_id
+        self.response["staffs"] = self.__parse_club_staffs()
+        self.response["updatedAt"] = datetime.now()
+
+        return clean_response(self.response)


⚠️ Potential issue

Use timezone-aware datetime for updatedAt.

Using timezone-aware timestamps can prevent potential issues related to timezone discrepancies. It's recommended to use UTC for consistent time representation across different systems.

Apply this diff to make updatedAt timezone-aware:

-from datetime import datetime +from datetime import datetime, timezone def get_club_staffs(self) -> dict: # ... existing code ... - self.response["updatedAt"] = datetime.now() + self.response["updatedAt"] = datetime.now(timezone.utc) # ... rest of the method

This change ensures that the timestamp is consistently represented in UTC, regardless of the system's local time settings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_club_staffs(self) -> dict:

"""

Retrieve and parse staff information for the specified football club.

Returns:

dict: A dictionary containing the club's unique identifier, staff information, and the timestamp of when

the data was last updated.

"""

self.response["id"] = self.club_id

self.response["staffs"] = self.__parse_club_staffs()

self.response["updatedAt"] = datetime.now()

return clean_response(self.response)

from datetime import datetime, timezone

def get_club_staffs(self) -> dict:

"""

Retrieve and parse staff information for the specified football club.

Returns:

dict: A dictionary containing the club's unique identifier, staff information, and the timestamp of when

the data was last updated.

"""

self.response["id"] = self.club_id

self.response["staffs"] = self.__parse_club_staffs()

self.response["updatedAt"] = datetime.now(timezone.utc)

return clean_response(self.response)

Krayir5 added 6 commits September 26, 2024 19:57

Update regex.py

82da295

Update xpath.py

df51884

Update profile.py

dea5afe

Add files via upload

f0cf824

Staff Update

6ff24c8

Staff Update

e74c196

Krayir5 mentioned this pull request Oct 6, 2024

Small changes to ClubInfo #75

Open

coderabbitai bot reviewed Oct 6, 2024

View reviewed changes

Update staff.py

facc94e

coderabbitai bot reviewed Oct 6, 2024

View reviewed changes

Update staff.py

beeeade

coderabbitai bot reviewed Oct 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to API #76

Update to API #76

Krayir5 commented Oct 6, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 6, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot left a comment

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot left a comment

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

coderabbitai bot Oct 6, 2024

	from app.utils.utils import clean_response, extract_from_url, safe_regex
	from app.utils.utils import clean_response, safe_regex

		club_id: str = None
		URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"

-    club_id: str = None
-    URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"
+    club_id: str
+    URL: str = "https://www.transfermarkt.us/-/mitarbeiter/verein/{club_id}/plus/1"
+    def __post_init__(self) -> None:
+        if not self.club_id:
+            raise ValueError("The 'club_id' must be provided and cannot be None.")
+        self.URL = self.URL.format(club_id=self.club_id)
+        self.page = self.request_url_page()

-        JOB = "//div[@class='large-12 columns']//table[@class='inline-table']//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"
+        JOB = (
+            "//div[@class='large-12 columns']//table[@class='inline-table']"
+            "//td[@class='hauptlink']/following-sibling::tr[1]/td/text()"
+        )

-        LEAGUE_POSITION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"
+        LEAGUE_POSITION = (
+            "//div[@class='data-header__club-info']//following::span[3]//text()"
+            "//following::span[1]/a/text()[1]"
+        )

		NUMBER_OF_PLAYERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[2]/span/text()"
		NUMBER_OF_FOREIGNERS = "//div[@class='data-header__info-box']//div[@class='data-header__details']//li[3]/span/a/text()"

		AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()"
		AVERAGE_AGE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[2]/span/text()"

-        AVERAGE_MARKET_VALUE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[1]/span/text()"
-        AVERAGE_AGE = "//div[@class='data-header__info-box']//div[@class='data-header__details']//ul[2]//li[2]/span/text()"
+        AVERAGE_MARKET_VALUE = (
+            "//div[@class='data-header__info-box']//div[@class='data-header__details']"
+            "//ul[2]//li[1]/span/text()"
+        )
+        AVERAGE_AGE = (
+            "//div[@class='data-header__info-box']//div[@class='data-header__details']"
+            "//ul[2]//li[2]/span/text()"
+        )

-        REIGNING_CHAMPION = "//div[@class='data-header__club-info']//following::span[3]//text()//following::span[1]/a/text()[1]"
-        RECORD_HOLDING_CHAMPION = "//div[@class='data-header__club-info']//following::span[4]//text()//following::a/text()[1]"
-        UEFA_COEFFICIENT_POSITION = "//div[@class='data-header__club-info']//following::span[5]//text()//following::a/text()[1]"
-        UEFA_COEFFICIENT_POINT = "//div[@class='data-header__club-info']//following::span[5]//text()//following::span[1]//following::span[1]/text()[1]"
+        REIGNING_CHAMPION = (
+            "//div[@class='data-header__club-info']//following::span[3]//text()"
+            "//following::span[1]/a/text()[1]"
+        )
+        RECORD_HOLDING_CHAMPION = (
+            "//div[@class='data-header__club-info']//following::span[4]//text()"
+            "//following::a/text()[1]"
+        )
+        UEFA_COEFFICIENT_POSITION = (
+            "//div[@class='data-header__club-info']//following::span[5]//text()"
+            "//following::a/text()[1]"
+        )
+        UEFA_COEFFICIENT_POINT = (
+            "//div[@class='data-header__club-info']//following::span[5]//text()"
+            "//following::span[1]//following::span[1]/text()[1]"
+        )

	staffs_jobs = self.page.xpath(Clubs.Players.PAGE_INFOS)
	staffs_jobs = self.page.xpath(Clubs.Staff.JOB)

Update to API #76

Are you sure you want to change the base?

Update to API #76

Conversation

Krayir5 commented Oct 6, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 6, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 6, 2024

Choose a reason for hiding this comment

Krayir5 commented Oct 6, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 6, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)