Skip to content

Commit

Permalink
Add ability to parse stations
Browse files Browse the repository at this point in the history
  • Loading branch information
interlark committed Mar 14, 2024
1 parent 948249b commit ecc2752
Show file tree
Hide file tree
Showing 9 changed files with 85 additions and 33 deletions.
9 changes: 8 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

## [Невошедшее]

## [1.2.1] - 14-03-2024
### Добавлено
- Добавлена поддержка парсинга остановок. Fix [issue](https://github.com/interlark/parser-2gis/issues/52).
- Генератор ссылок добавляет в URL сортировку по алфавиту для исключения повторений поисковой выдачи при навигации по страницам.
- Обновлён список рубрик.

## [1.2.0] - 08-02-2024
### Добавлено
- Небольшой багфикс схемы ответов сервера.
Expand Down Expand Up @@ -88,7 +94,8 @@
- Первый релиз.


[Невошедшее]: https://github.com/interlark/parser-2gis/compare/v1.2.0...HEAD
[Невошедшее]: https://github.com/interlark/parser-2gis/compare/v1.2.1...HEAD
[1.2.1]: https://github.com/interlark/parser-2gis/compare/v1.2.0...v1.2.1
[1.2.0]: https://github.com/interlark/parser-2gis/compare/v1.1.2...v1.2.0
[1.1.2]: https://github.com/interlark/parser-2gis/compare/v1.1.1...v1.1.2
[1.1.1]: https://github.com/interlark/parser-2gis/compare/v1.1.0...v1.1.1
Expand Down
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
## ℹ️ Описание

Парсер для автоматического сбора базы адресов и контактов предприятий, которые работают на территории
России <img width="18px" src="https://user-images.githubusercontent.com/20641837/183511175-3d47f0f0-4e3f-45d2-8495-95d0612a8a8c.svg"/>, Казахстана <img width="18px" src="https://user-images.githubusercontent.com/20641837/183511625-20420aef-59c3-426d-a112-654d2caf0dda.svg"/>, Украины <img width="18px" src="https://user-images.githubusercontent.com/20641837/183511753-267f65c2-6cd1-41e4-aa02-8895d7ad7013.svg"/>, Беларуси <img width="18px" src="https://user-images.githubusercontent.com/20641837/183511940-ce088ad1-d97f-4fa1-849a-9b887ad481c5.svg"/>,
России <img width="18px" src="https://user-images.githubusercontent.com/20641837/183511175-3d47f0f0-4e3f-45d2-8495-95d0612a8a8c.svg"/>, Казахстана <img width="18px" src="https://user-images.githubusercontent.com/20641837/183511625-20420aef-59c3-426d-a112-654d2caf0dda.svg"/>, Беларуси <img width="18px" src="https://user-images.githubusercontent.com/20641837/183511940-ce088ad1-d97f-4fa1-849a-9b887ad481c5.svg"/>,
Азербайджана <img width="18px" src="https://user-images.githubusercontent.com/20641837/183512176-1f6795a1-ceac-4865-a29f-b5720ce5115e.svg"/>, Киргизии <img width="18px" src="https://user-images.githubusercontent.com/20641837/183512234-286ca403-5194-4a6d-a59e-59201140078a.svg"/>, Узбекистана <img width="18px" src="https://user-images.githubusercontent.com/20641837/183512333-7ec1f36d-07fe-450d-b6f1-eed59a3b69c8.svg"/>, Чехии <img width="18px" src="https://user-images.githubusercontent.com/20641837/183512458-5a5d9531-a8f0-4624-99da-7069cde84926.svg"/>, Египта <img width="18px" src="https://user-images.githubusercontent.com/20641837/183512581-71fa2106-8cc1-43cc-a680-b3ff420acb8a.svg"/>, Италии <img width="18px" src="https://user-images.githubusercontent.com/20641837/183512763-0b438e5b-3ff0-4717-a826-0baac9207167.svg"/>, Саудовской Аравии <img width="18px" src="https://user-images.githubusercontent.com/20641837/183512980-427a985a-df1b-42c8-90bb-2c61692b6654.svg"/>, Кипра <img width="18px" src="https://user-images.githubusercontent.com/20641837/183513128-4367d2b1-feb9-4efe-bc57-73a15d178ef2.svg"/>, Объединенных Арабских Эмиратов <img width="18px" src="https://user-images.githubusercontent.com/20641837/183513374-9afef8c7-923e-4a18-9cd8-c69645b99377.svg"/>, Чили <img width="18px" src="https://user-images.githubusercontent.com/20641837/183513576-7209ce90-a04a-4258-9832-ef210198c3c4.svg"/>, Катара <img width="18px" src="https://user-images.githubusercontent.com/20641837/183513757-143ee2bf-b66c-4766-bbe1-db896a33eac1.svg"/>, Омана <img width="18px" src="https://user-images.githubusercontent.com/20641837/183513865-27509b74-b08f-4d92-b83b-a0d3aaabe155.svg"/>, Бахрейна <img width="18px" src="https://user-images.githubusercontent.com/20641837/183514076-3b6c9496-7c95-4452-8ee1-8723d98f876d.svg"/>, Кувейта <img width="18px" src="https://user-images.githubusercontent.com/20641837/183514240-7eff8632-5cd2-46ac-bed4-e483bb2df5f0.svg"/>.

## ✨ Особенности
Expand Down Expand Up @@ -47,6 +47,7 @@
## 📖 Документация
Описание работы доступно на [вики](https://github.com/interlark/parser-2gis/wiki).

## 👍 Поддержать
Поблагодарить и поддержать разработку <a href="https://qiwi.com/n/INTERLARK" target="_blank"><img alt="QIWI Donate" src="https://user-images.githubusercontent.com/20641837/195283860-ad41d5f7-c2f4-4960-8454-586a6271db10.png" width="100" height="45"></a>

## 👍 Поддержать проект
<a href="https://yoomoney.ru/to/4100118362270186" target="_blank">
<img alt="Yoomoney Donate" src="https://github.com/interlark/parser-2gis/assets/20641837/e875e948-0d69-4ed5-804c-8a1736ab0c9d" width="150">
</a>
11 changes: 6 additions & 5 deletions parser_2gis/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,14 @@ def inner(*args, timeout=timeout, finished=finished,
call_time = time.time()
while True:
ret = func(*args, **kwargs)
if not timeout or finished(ret):
if finished(ret):
return ret

if time.time() - call_time > timeout:
if throw_exception:
raise TimeoutError(func)
return ret
if timeout is not None:
if time.time() - call_time > timeout:
if throw_exception:
raise TimeoutError(func)
return ret

time.sleep(poll_interval)
return inner
Expand Down
74 changes: 55 additions & 19 deletions parser_2gis/data/rubrics.json
Original file line number Diff line number Diff line change
Expand Up @@ -931,6 +931,7 @@
"809",
"631",
"774",
"51414",
"309",
"52495",
"110423",
Expand All @@ -946,7 +947,6 @@
"764",
"53139",
"384",
"51414",
"259"
]
},
Expand Down Expand Up @@ -977,14 +977,15 @@
"111005",
"547",
"110387",
"24472",
"111540",
"269",
"270",
"112677",
"19487",
"70348",
"19601",
"24472",
"112877",
"111539",
"112545",
"25108",
Expand All @@ -995,6 +996,7 @@
"272",
"16615",
"23494",
"112878",
"52681"
]
},
Expand Down Expand Up @@ -1132,6 +1134,7 @@
"308",
"112527",
"157",
"112876",
"53624",
"50414",
"312",
Expand Down Expand Up @@ -1180,6 +1183,7 @@
"1015",
"52973",
"110519",
"16022",
"321",
"334",
"110506",
Expand All @@ -1198,7 +1202,6 @@
"8367",
"1123",
"5602",
"16022",
"54145",
"110493",
"51324",
Expand Down Expand Up @@ -1614,6 +1617,7 @@
"406",
"7900",
"56428",
"112879",
"110365",
"10792",
"22829",
Expand Down Expand Up @@ -4189,7 +4193,7 @@
"416": {
"isRussian": true,
"isNonRussian": true,
"label": "Автовокзалы / Автостанции",
"label": "Автовокзалы",
"code": "416",
"parentCode": "22191",
"children": []
Expand Down Expand Up @@ -6175,10 +6179,10 @@
"children": [
"697",
"110651",
"19661",
"53989",
"52958",
"51368",
"19661",
"54419",
"380",
"52255",
Expand Down Expand Up @@ -9022,7 +9026,7 @@
"16022": {
"isRussian": true,
"isNonRussian": true,
"label": "Услуги гравировки",
"label": "Гравировка",
"code": "16022",
"parentCode": "59",
"children": []
Expand Down Expand Up @@ -9634,13 +9638,13 @@
"children": [
"51646",
"112548",
"110332",
"110427",
"71232",
"68951",
"110377",
"111582",
"51221",
"110332",
"51008",
"112684",
"110490",
Expand Down Expand Up @@ -9749,7 +9753,7 @@
"19661": {
"isRussian": true,
"isNonRussian": true,
"label": "Гостиницы для животных",
"label": "Зоогостиницы",
"code": "19661",
"parentCode": "749",
"children": []
Expand Down Expand Up @@ -10044,6 +10048,7 @@
"110384",
"112490",
"171",
"24504",
"112489",
"110300",
"112464",
Expand All @@ -10052,7 +10057,6 @@
"7332",
"9281",
"174",
"24504",
"112651",
"55875",
"15707",
Expand Down Expand Up @@ -10240,15 +10244,15 @@
"24472": {
"isRussian": true,
"isNonRussian": true,
"label": "Лыжные базы / Горнолыжные комплексы",
"label": "Горнолыжные комплексы",
"code": "24472",
"parentCode": "50",
"children": []
},
"24504": {
"isRussian": true,
"isNonRussian": true,
"label": "Продажа лотерейных билетов",
"label": "Лотерейные билеты",
"code": "24504",
"parentCode": "22215",
"children": []
Expand Down Expand Up @@ -11275,7 +11279,7 @@
"51414": {
"isRussian": true,
"isNonRussian": true,
"label": "Товары для подводного плавания",
"label": "Подводное снаряжение",
"code": "51414",
"parentCode": "48",
"children": []
Expand Down Expand Up @@ -11434,7 +11438,7 @@
},
"52248": {
"isRussian": true,
"isNonRussian": true,
"isNonRussian": false,
"label": "Рюмочные",
"code": "52248",
"parentCode": "25",
Expand Down Expand Up @@ -11499,7 +11503,7 @@
"52959": {
"isRussian": true,
"isNonRussian": true,
"label": "Киоски / магазины по продаже печатной продукции",
"label": "Продажа печатной продукции",
"code": "52959",
"parentCode": "47",
"children": []
Expand Down Expand Up @@ -12259,7 +12263,7 @@
},
"56581": {
"isRussian": true,
"isNonRussian": true,
"isNonRussian": false,
"label": "Избирательные участки",
"code": "56581",
"parentCode": "44",
Expand Down Expand Up @@ -12926,7 +12930,7 @@
},
"103240": {
"isRussian": true,
"isNonRussian": false,
"isNonRussian": true,
"label": "Услуга распила мяса",
"code": "103240",
"parentCode": "70",
Expand Down Expand Up @@ -13143,7 +13147,7 @@
"110320": {
"isRussian": true,
"isNonRussian": true,
"label": "Станции для зарядки электротранспорта",
"label": "Станции зарядки электромобилей",
"code": "110320",
"parentCode": "77",
"children": []
Expand Down Expand Up @@ -13223,7 +13227,7 @@
"110332": {
"isRussian": true,
"isNonRussian": true,
"label": "Занятия по аквааэробике",
"label": "Аквааэробика",
"code": "110332",
"parentCode": "19519",
"children": []
Expand Down Expand Up @@ -15366,7 +15370,7 @@
},
"112614": {
"isRussian": true,
"isNonRussian": false,
"isNonRussian": true,
"label": "Дарксторы",
"code": "112614",
"parentCode": "73",
Expand Down Expand Up @@ -16124,6 +16128,38 @@
"parentCode": "31",
"children": []
},
"112876": {
"isRussian": true,
"isNonRussian": false,
"label": "Станции зарядки электросамокатов",
"code": "112876",
"parentCode": "56",
"children": []
},
"112877": {
"isRussian": true,
"isNonRussian": false,
"label": "Лыжные базы",
"code": "112877",
"parentCode": "50",
"children": []
},
"112878": {
"isRussian": true,
"isNonRussian": false,
"label": "Тюбинговые трассы",
"code": "112878",
"parentCode": "50",
"children": []
},
"112879": {
"isRussian": true,
"isNonRussian": false,
"label": "Услуги русификации",
"code": "112879",
"parentCode": "77",
"children": []
},
"-1": {
"isRussian": true,
"isNonRussian": true,
Expand Down
4 changes: 3 additions & 1 deletion parser_2gis/gui/urls_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def gui_urls_generator() -> list[str]:
ae='Объединенные Арабские Эмираты', iq='Ирак',
az='Азербайджан', bh='Бахрейн', by='Беларусь', cl='Чили', cy='Кипр', cz='Чехия',
eg='Египт', it='Италия', kg='Киргизия', kw='Кувейт', kz='Казахстан', om='Оман',
qa='Катар', ru='Россия', sa='Саудовская Аравия', ua='Украина', uz='Узбекистан')
qa='Катар', ru='Россия', sa='Саудовская Аравия', uz='Узбекистан')

country_name_to_code = {v: k for k, v in country_code_to_name.items()}

Expand Down Expand Up @@ -182,6 +182,8 @@ def get_selected_urls(query: str) -> list[str]:
if rubric:
rest_url += f'/rubricId/{rubric["code"]}'

rest_url += '/filters/sort=name'

url = base_url + rest_url
urls.append(url)

Expand Down
2 changes: 1 addition & 1 deletion parser_2gis/parser/parsers/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def _get_links(self) -> list[DOMNode]:
"""Extracts specific DOM node links from current DOM snapshot."""
def valid_link(node: DOMNode) -> bool:
if node.local_name == 'a' and 'href' in node.attributes:
link_match = re.match(r'.*/firm/.*\?stat=(?P<data>[a-zA-Z0-9%]+)', node.attributes['href'])
link_match = re.match(r'.*/(firm|station)/.*\?stat=(?P<data>[a-zA-Z0-9%]+)', node.attributes['href'])
if link_match:
try:
base64.b64decode(urllib.parse.unquote(link_match.group('data')))
Expand Down
2 changes: 1 addition & 1 deletion parser_2gis/version.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Version info."""

version = '1.2.0'
version = '1.2.1'
config_version = '0.1'
5 changes: 5 additions & 0 deletions parser_2gis/writer/writers/csv_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def _type_names(self) -> dict[str, str]:
'street': 'Улица',
'road': 'Дорога',
'crossroad': 'Перекрёсток',
'station': 'Остановка',
}

@property
Expand Down Expand Up @@ -59,6 +60,7 @@ def _data_mapping(self) -> dict[str, Any]:
'point_lat': 'Широта',
'point_lon': 'Долгота',
'url': '2GIS URL',
'type': 'Тип',
}
}

Expand Down Expand Up @@ -202,6 +204,9 @@ def _extract_raw(self, catalog_doc: Any) -> dict[str, Any]:
elif catalog_item.type in self._type_names:
data['name'] = self._type_names[catalog_item.type]

# Type
data['type'] = catalog_item.type

# Address
data['address'] = catalog_item.address_name

Expand Down
Loading

0 comments on commit ecc2752

Please sign in to comment.