Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix latlon #72

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 12 additions & 13 deletions app/models/occurrence_record.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,18 @@ def species_path
BASIS_OF_RECORD=%w(MACHINE_OBSERVATION HUMAN_OBSERVATION MATERIAL_SAMPLE PRESERVED_SPECIMEN)

validates_each :lat, :lon do |record, attr, value|
record.errors.add attr, "nil or 0" unless value.present? && OccurrenceRecord.float_rounded(value) != 0
record.errors.add attr, "nil or 0" unless value.present? && value.to_f.round != 0
end

validates_inclusion_of :basis_of_record, in: BASIS_OF_RECORD
validates_format_of :taxon_kingdom, :taxon_phylum, :taxon_class, :taxon_order, :taxon_family, :taxon_genus, :taxon_species, :taxon_subspecies, without: /\d/

def self.from_str(str)
#FIXME: dangerous: should be using CSV reader/writer
OccurrenceRecord.new(Hash[HEADERS.zip(str.chomp.split("\t", -1))])
o = OccurrenceRecord.new(Hash[HEADERS.zip(str.chomp.split("\t", -1))])
o.lat = self.format_decimal(o.lat)
o.lon = self.format_decimal(o.lon)
o
end

# read the file line by line, processing
Expand Down Expand Up @@ -110,7 +113,7 @@ def self.filter(records)
# etc.

# 1. are geographic coordinates the same?
if ! same(records, [:lng_rounded, :lng_rounded])
if !same(records, [:lat, :lon])
# keep all records and flag with g
flag(records, 'g')
elsif ! same(records, :basis_of_record)
Expand Down Expand Up @@ -158,7 +161,7 @@ def self.each_occurrence_slice_grouped_by_path(occurrences_tsv_file)
# end

def duplicate?(other)
if ! (lng_rounded == other.lng_rounded && lat_rounded == other.lat_rounded)
if ! (lon == other.lon && lat == other.lat)
false
elsif taxon_species != other.taxon_species
true
Expand All @@ -169,16 +172,12 @@ def duplicate?(other)
end
end

def self.float_rounded(value)
value.to_f.round(2)
end

def lat_rounded
self.class.float_rounded(lat)
end
def self.format_decimal(num)
int, dec = num.to_s.split('.')
dec = '00' if dec.nil?
dec = "#{dec}0" if dec.length == 1

def lng_rounded
self.class.float_rounded(lon)
"#{int}.#{dec.chars[0, 2].join}".to_f
end
end

15 changes: 15 additions & 0 deletions docs/PIPELINE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,21 @@ of this project.
Other schedulers need to replicate this pattern to make use of environment variables
to change locations of data files.

## Workflow

### Pipeline Overview

![alt text needed](imgs/Figure1.png "Figure 1: Pipeline Overview")

### GBIF & GenBank Quality Control

![alt text needed](imgs/Figure2.png "Figure 2: GBIF & GenBank Quality Control")

### Bold Quality Control

![alt text needed](imgs/Figure3.png "Figure 3: Bold Quality Control")


## Prepare Binaries

You'll need a host of binaries for aligning sequences. You can see the [Dockerfile](../Dockerfile)
Expand Down
Binary file added docs/imgs/Figure1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/Figure2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/Figure3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.