Skip to content

Commit

Permalink
Merge pull request #73 from chronicle-app/use-chronicle-core
Browse files Browse the repository at this point in the history
Use new Chronicle Schema work from chronicle-core
  • Loading branch information
hyfen authored Apr 26, 2024
2 parents 6127eec + 4973981 commit 83d057f
Show file tree
Hide file tree
Showing 98 changed files with 1,603 additions and 1,261 deletions.
46 changes: 2 additions & 44 deletions .rubocop.yml
Original file line number Diff line number Diff line change
@@ -1,44 +1,2 @@
AllCops:
EnabledByDefault: true
TargetRubyVersion: 2.7

Style/FrozenStringLiteralComment:
SafeAutoCorrect: true

Style/StringLiterals:
Enabled: false

Layout/MultilineAssignmentLayout:
Enabled: false

Layout/MultilineMethodCallIndentation:
EnforcedStyle: indented

Layout/RedundantLineBreak:
Enabled: false

Style/MethodCallWithArgsParentheses:
Enabled: false

Style/MethodCalledOnDoEndBlock:
Exclude:
- 'spec/**/*'

Style/OpenStructUse:
Enabled: false

Style/Copyright:
Enabled: false

Style/MissingElse:
Enabled: false

Style/SymbolArray:
EnforcedStyle: brackets

Style/WordArray:
EnforcedStyle: brackets

Lint/ConstantResolution:
Enabled: false

inherit_gem:
chronicle-core: .rubocop.yml
4 changes: 2 additions & 2 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
source "https://rubygems.org"
source 'https://rubygems.org'

git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
git_source(:github) { |repo_name| "https://github.com/#{repo_name}" }

# Specify your gem's dependencies in chronicle-etl.gemspec
gemspec
6 changes: 3 additions & 3 deletions Guardfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
guard :rspec, cmd: "bundle exec rspec" do
require "guard/rspec/dsl"
guard :rspec, cmd: 'bundle exec rspec' do
require 'guard/rspec/dsl'

watch(%r{^spec/.+_spec\.rb$})
watch(%r{^lib/(.+)\.rb$}) { |m| "spec/#{m[1]}_spec.rb" }
watch('spec/spec_helper.rb') { "spec" }
watch('spec/spec_helper.rb') { 'spec' }
end
54 changes: 20 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@

Are you trying to archive your digital history or incorporate it into your own projects? You’ve probably discovered how frustrating it is to get machine-readable access to your own data. While [building a memex](https://hyfen.net/memex/), I learned first-hand what great efforts must be made before you can begin using the data in interesting ways.

If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing takeout data, this tool is for you! (_If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues)._)
If you don’t want to spend all your time writing scrapers, reverse-engineering APIs, or parsing export data, this tool is for you! (_If you do enjoy these things, please see the [open issues](https://github.com/chronicle-app/chronicle-etl/issues)._)

**`chronicle-etl` is a CLI tool that gives you a unified interface to your personal data.** It uses the ETL pattern to _extract_ data from a source (e.g. your local browser history, a directory of images, goodreads.com reading history), _transform_ it (into a given schema), and _load_ it to a destination (e.g. a CSV file, JSON, external API).

## What does `chronicle-etl` give you?

- **A CLI tool for working with personal data**. You can monitor progress of exports, manipulate the output, set up recurring jobs, manage credentials, and more.
- **Plugins for many third-party providers** (see [list](#available-plugins-and-connectors)). This plugin system allows you to access data from dozens of third-party services, all accessible through a common CLI interface.
- **Plugins for many third-party sources** (see [list](#available-plugins-and-connectors)). This plugin system allows you to access data from dozens of third-party services, all accessible through a common CLI interface.
- **A common, opinionated schema**: You can normalize different datasets into a single schema so that, for example, all your iMessages and emails are represented in a common schema. (Don’t want to use this schema? `chronicle-etl` always allows you to fall back on working with the raw extraction data.)

## Chronicle-ETL in action
Expand Down Expand Up @@ -58,10 +58,10 @@ $ chronicle-etl --extractor csv --input data.csv --loader table

# Show available plugins and install one
$ chronicle-etl plugins:list
$ chronicle-etl plugins:install shell
$ chronicle-etl plugins:install imessage

# Retrieve shell commands run in the last 5 hours
$ chronicle-etl -e shell --since 5h
# Retrieve imessage messages from the last 5 hours
$ chronicle-etl -e imessage --since 5h

# Get email senders from an .mbox email archive file
$ chronicle-etl --extractor email:mbox -i sample-email-archive.mbox -t email --fields actor.slug
Expand All @@ -80,12 +80,16 @@ Options:
[--extractor-opts=key:value] # Extractor options
-t, [--transformer=NAME] # Transformer class. Default: null
[--transformer-opts=key:value] # Transformer options
-l, [--loader=NAME] # Loader class. Default: table
-l, [--loader=NAME] # Loader class. Default: json
[--loader-opts=key:value] # Loader options
-i, [--input=FILENAME] # Input filename or directory
[--since=DATE] # Load records SINCE this date (or fuzzy time duration)
[--until=DATE] # Load records UNTIL this date (or fuzzy time duration)
[--limit=N] # Only extract the first LIMIT records
[--schema=SCHEMA_NAME] # Which Schema to transform
# Possible values: chronicle, activitystream, schemaorg, chronobase
[--format=SCHEMA_NAME] # How to serialize results
# Possible values: jsonapi, jsonld
-o, [--output=OUTPUT] # Output filename
[--fields=field1 field2 ...] # Output only these fields
[--header-row], [--no-header-row] # Output the header row of tabular output
Expand Down Expand Up @@ -119,7 +123,7 @@ $ chronicle-etl jobs:list

## Connectors and plugins

Connectors let you work with different data formats or third-party providers.
Connectors let you work with different data formats or third-party sources.

### Built-in Connectors

Expand All @@ -139,13 +143,16 @@ $ chronicle-etl connectors:list
#### Transformers

- [`null`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/transformers/null_transformer.rb) - (default) Don’t do anything and pass on raw extraction data
- [`sampler`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/transformers/sampler_transformer.rb) - Sample `percent` records from the extraction
- [`sort`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/transformers/sampler_transformer.rb) - sort extracted results by `key` and `direction`


#### Loaders

- [`table`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/table_loader.rb) - (default) Output an ascii table of records. Useful for exploring data.
- [`json`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/json_loader.rb) - (default) Load records serialized as JSON
- [`table`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/table_loader.rb) - Output an ascii table of records. Useful for exploring data.
- [`csv`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/extractors/csv_extractor.rb) - Load records to CSV
- [`json`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/json_loader.rb) - Load records serialized as JSON
- [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Serialize records with [JSONAPI](https://jsonapi.org/) and send to a REST API
- [`rest`](https://github.com/chronicle-app/chronicle-etl/blob/main/lib/chronicle/etl/loaders/rest_loader.rb) - Send JSON to a REST API

### Chronicle Plugins for third-party services

Expand All @@ -161,8 +168,8 @@ $ chronicle-etl plugins:list
$ chronicle-etl plugins:install NAME

# Use a plugin
$ chronicle-etl plugins:install shell
$ chronicle-etl --extractor shell:history --limit 10
$ chronicle-etl plugins:install imessage
$ chronicle-etl --extractor imessage --limit 10

# Uninstall a plugin
$ chronicle-etl plugins:uninstall NAME
Expand Down Expand Up @@ -219,28 +226,7 @@ If you want to work together on a connector, please [get in touch](#get-in-touch
#### Sample custom Extractor class

```ruby
module Chronicle
module FooService
class FooExtractor < Chronicle::ETL::Extractor
register_connector do |r|
r.identifier = 'foo'
r.description = 'from foo.com'
end

setting :access_token, required: true

def prepare
@records = # load from somewhere
end

def extract
@records.each do |record|
yield Chronicle::ETL::Extraction.new(data: row.to_h)
end
end
end
end
end
# TODO
```

## Secrets Management
Expand Down
4 changes: 2 additions & 2 deletions Rakefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
require "bundler/gem_tasks"
require "rspec/core/rake_task"
require 'bundler/gem_tasks'
require 'rspec/core/rake_task'
RSpec::Core::RakeTask.new(:spec)

require 'yard'
Expand Down
9 changes: 4 additions & 5 deletions bin/console
Original file line number Diff line number Diff line change
@@ -1,26 +1,25 @@
#!/usr/bin/env ruby

require "bundler/setup"
require "chronicle/etl"
require 'bundler/setup'
require 'chronicle/etl'

# You can add fixtures and/or initialization code here to make experimenting
# with your gem easier. You can also use a different console, if you like.

# (If you use this, don't forget to add pry to your Gemfile!)
require "pry"
require 'pry'
Pry.start

def reload!(print = true)
puts 'Reloading ...' if print
# Main project directory.
root_dir = File.expand_path('..', __dir__)
# Directories within the project that should be reloaded.
reload_dirs = %w{lib}
reload_dirs = %w[lib]
# Loop through and reload every file in all relevant project directories.
reload_dirs.each do |dir|
Dir.glob("#{root_dir}/#{dir}/**/*.rb").each { |f| load(f) }
end
# Return true when complete.
true
end

102 changes: 51 additions & 51 deletions chronicle-etl.gemspec
Original file line number Diff line number Diff line change
@@ -1,73 +1,73 @@
# frozen_string_literal: true

lib = File.expand_path("../lib", __FILE__)
lib = File.expand_path('lib', __dir__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require "chronicle/etl/version"
require 'chronicle/etl/version'

Gem::Specification.new do |spec|
spec.name = "chronicle-etl"
spec.name = 'chronicle-etl'
spec.version = Chronicle::ETL::VERSION
spec.authors = ["Andrew Louis"]
spec.email = ["[email protected]"]
spec.authors = ['Andrew Louis']
spec.email = ['[email protected]']

spec.summary = "ETL tool for personal data"
spec.description = "Chronicle-ETL allows you to extract personal data from a variety of services, transformer it, and load it."
spec.homepage = "https://github.com/chronicle-app"
spec.license = "MIT"
spec.summary = 'ETL tool for personal data'
spec.description = 'Chronicle-ETL allows you to extract personal data from a variety of services, transformer it, and load it.'
spec.homepage = 'https://github.com/chronicle-app'
spec.license = 'MIT'

# Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
# to allow pushing to a single host or delete this section to allow pushing to any host.
if spec.respond_to?(:metadata)
spec.metadata['allowed_push_host'] = "https://rubygems.org"
spec.metadata['allowed_push_host'] = 'https://rubygems.org'

spec.metadata["homepage_uri"] = spec.homepage
spec.metadata["source_code_uri"] = "https://github.com/chronicle-app/chronicle-etl"
spec.metadata["changelog_uri"] = "https://github.com/chronicle-app/chronicle-etl/releases"
spec.metadata['homepage_uri'] = spec.homepage
spec.metadata['source_code_uri'] = 'https://github.com/chronicle-app/chronicle-etl'
spec.metadata['changelog_uri'] = 'https://github.com/chronicle-app/chronicle-etl/releases'
else
raise "RubyGems 2.0 or newer is required to protect against " \
"public gem pushes."
raise 'RubyGems 2.0 or newer is required to protect against ' \
'public gem pushes.'
end

# Specify which files should be added to the gem when it is released.
# The `git ls-files -z` loads the files in the RubyGem that have been added into git.
spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
spec.files = Dir.chdir(File.expand_path(__dir__)) do
`git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
end
spec.bindir = "exe"
spec.bindir = 'exe'
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
spec.require_paths = ["lib"]
spec.required_ruby_version = ">= 2.7"
spec.require_paths = ['lib']
spec.required_ruby_version = '>= 3.1'
spec.metadata['rubygems_mfa_required'] = 'true'

spec.add_dependency "activesupport", "~> 7.0"
spec.add_dependency "chronicle-core", "~> 0.2.2"
spec.add_dependency "chronic_duration", "~> 0.10.6"
spec.add_dependency "colorize", "~> 0.8.1"
spec.add_dependency "gems", ">= 1"
spec.add_dependency "launchy"
spec.add_dependency "marcel", "~> 1.0.2"
spec.add_dependency "mini_exiftool", "~> 2.10"
spec.add_dependency "nokogiri", "~> 1.13"
spec.add_dependency "omniauth", "~> 2"
spec.add_dependency "sequel", "~> 5.35"
spec.add_dependency "sinatra", "~> 2"
spec.add_dependency "sqlite3", "~> 1.4"
spec.add_dependency "thor", "~> 1.2"
spec.add_dependency "thor-hollaback", "~> 0.2"
spec.add_dependency "tty-progressbar", "~> 0.17"
spec.add_dependency "tty-prompt", "~> 0.23"
spec.add_dependency "tty-spinner"
spec.add_dependency "tty-table", "~> 0.11"
spec.add_dependency "xdg", ">= 4.0"
spec.add_dependency 'activesupport', '~> 7.0'
spec.add_dependency 'chronic_duration', '~> 0.10.6'
spec.add_dependency 'chronicle-core', '~> 0.3'
spec.add_dependency 'colorize', '~> 0.8.1'
spec.add_dependency 'gems', '>= 1'
spec.add_dependency 'launchy'
spec.add_dependency 'marcel', '~> 1.0.2'
spec.add_dependency 'omniauth', '~> 2'
spec.add_dependency 'sequel', '~> 5.35'
spec.add_dependency 'sinatra', '~> 2'
spec.add_dependency 'sqlite3', '~> 1.4'
spec.add_dependency 'thor', '~> 1.2'
spec.add_dependency 'thor-hollaback', '~> 0.2'
spec.add_dependency 'tty-progressbar', '~> 0.17'
spec.add_dependency 'tty-prompt', '~> 0.23'
spec.add_dependency 'tty-spinner'
spec.add_dependency 'tty-table', '~> 0.12'
spec.add_dependency 'xdg', '>= 4.0'

spec.add_development_dependency "bundler", "~> 2.1"
spec.add_development_dependency "fakefs", "~> 1.4"
spec.add_development_dependency "guard-rspec", "~> 4.7.3"
spec.add_development_dependency "pry-byebug", "~> 3.9"
spec.add_development_dependency "rake", "~> 13.0"
spec.add_development_dependency "rspec", "~> 3.9"
spec.add_development_dependency "rubocop", "~> 1.25.1"
spec.add_development_dependency "simplecov", "~> 0.21"
spec.add_development_dependency "vcr", "~> 6.1"
spec.add_development_dependency "webmock", "~> 3"
spec.add_development_dependency "webrick", "~> 1.7"
spec.add_development_dependency "yard", "~> 0.9.7"
spec.add_development_dependency 'bundler', '~> 2.1'
spec.add_development_dependency 'fakefs', '~> 1.4'
spec.add_development_dependency 'guard-rspec', '~> 4.7.3'
spec.add_development_dependency 'pry-byebug', '~> 3.9'
spec.add_development_dependency 'rake', '~> 13.0'
spec.add_development_dependency 'rspec', '~> 3.9'
spec.add_development_dependency 'rubocop', '~> 1.57'
spec.add_development_dependency 'simplecov', '~> 0.21'
spec.add_development_dependency 'vcr', '~> 6.1'
spec.add_development_dependency 'webmock', '~> 3'
spec.add_development_dependency 'webrick', '~> 1.7'
spec.add_development_dependency 'yard', '~> 0.9.7'
end
2 changes: 1 addition & 1 deletion exe/chronicle-etl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env ruby

require "chronicle/etl/cli"
require 'chronicle/etl/cli'

Chronicle::ETL::CLI::Main.start(ARGV)
5 changes: 4 additions & 1 deletion lib/chronicle/etl.rb
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
# frozen_string_literal: true

require 'chronicle/schema'
require 'chronicle/models/base'

require_relative 'etl/registry/registry'
require_relative 'etl/authorizer'
require_relative 'etl/config'
require_relative 'etl/configurable'
require_relative 'etl/exceptions'
require_relative 'etl/extraction'
require_relative 'etl/record'
require_relative 'etl/job_definition'
require_relative 'etl/job_log'
require_relative 'etl/job_logger'
Expand All @@ -14,7 +18,6 @@
require_relative 'etl/runner'
require_relative 'etl/secrets'
require_relative 'etl/utils/binary_attachments'
require_relative 'etl/utils/text_recognition'
require_relative 'etl/utils/progress_bar'
require_relative 'etl/version'

Expand Down
Loading

0 comments on commit 83d057f

Please sign in to comment.