importer
¶
Import target catalog dumps into a SQL database.
base_dump_extractor
¶
Base class for catalog dumps extraction.
-
class
soweego.importer.base_dump_extractor.
BaseDumpExtractor
[source]¶ Method definitions to download catalog dumps, extract data, and populate a database instance.
-
extract_and_populate
(dump_file_paths, resolve)[source]¶ Extract relevant data and populate SQLAlchemy ORM entities accordingly. Entities will be then persisted to a database instance.
-
discogs_dump_extractor
¶
Discogs dump extractor.
-
class
soweego.importer.discogs_dump_extractor.
DiscogsDumpExtractor
[source]¶ Download Discogs dumps, extract data, and populate a database instance.
-
extract_and_populate
(dump_file_paths, resolve)[source]¶ Extract relevant data from the artists (people) and masters (works) Discogs dumps, preprocess them, populate SQLAlchemy ORM entities, and persist them to a database instance.
See
discogs_entity
for the ORM definitions.
-
imdb_dump_extractor
¶
IMDb dump extractor.
-
class
soweego.importer.imdb_dump_extractor.
IMDbDumpExtractor
[source]¶ Download IMDb dumps, extract data, and populate a database instance.
-
extract_and_populate
(dump_file_paths, resolve)[source]¶ Extract relevant data from the name (people) and title (works) IMDb dumps, preprocess them, populate SQLAlchemy ORM entities, and persist them to a database instance.
See
imdb_entity
for the ORM definitions.
-
musicbrainz_dump_extractor
¶
MusicBrainz dump extractor.
-
class
soweego.importer.musicbrainz_dump_extractor.
MusicBrainzDumpExtractor
[source]¶ Download MusicBrainz dumps, extract data, and populate a database instance.
-
extract_and_populate
(dump_file_paths, resolve)[source]¶ Extract relevant data from the artist (people) and release group (works) MusicBrainz dumps, preprocess them, populate SQLAlchemy ORM entities, and persist them to a database instance.
See
musicbrainz_entity
for the ORM definitions.
-
importer
¶
Download, extract, and import a supported catalog.
-
class
soweego.importer.importer.
Importer
[source]¶ Handle a catalog dump: check its freshness and dispatch the appropriate extractor.
-
refresh_dump
(output_folder, extractor, resolve)[source]¶ - Eventually download the latest dump, and call the
corresponding extractor.
- Parameters
output_folder (
str
) – a path where the downloaded dumps will be storedextractor (
BaseDumpExtractor
) –BaseDumpExtractor
implementation to process the dumpresolve (
bool
) – whether to resolve URLs found in catalog dumps or not
-