fix: build histogram x-axis as intervals (a, b] (#4) #5

luigidcsoares · 2024-04-17T05:44:39Z

Closes #4

This PR changes the way histograms are constructed by (i) using intervals as labels for the keys and (ii) using ceil instead of rounding to compute the bins. By using ceil, all values are transformed into the endpoint of the corresponding interval, so reconstructing the interval becomes trivial.

luigidcsoares · 2024-04-24T07:55:40Z

The following example illustrates the new format of the histogram. I'm using the synthetic dataset from the single-dataset example. First, to setup and run the analyses (change the path in sys.path.append as appropriate):

import pandas as pd
import sys

sys.path.append("bvm-library")
from bvmlib.bvm import BVM

dataset = pd.DataFrame({
    "id": [i for i in range(1, 11)],
    "age": [25, 25, 25, 25, 25, 49, 49, 49, 49, 60],
    "gender": ["F", "F", "F", "M", "M", "F", "F", "F", "M", "M"],
    "grade": ["A", "A", "C", "B", "B", "C", "C", "E", "D", "D"],
    "disability": [False, True, True, True, False, True, True, False, False, False]
})

bvm = BVM(dataset)
bvm.qids(["age", "gender"])
bvm.sensitive(["grade", "disability"])

results = bvm.assess()

The histogram for re-identification:

results["re_id"].loc[0, "Histogram"]

Output

{'(0, 1]': 0.0,
 '(1, 2]': 0.0,
 '(2, 3]': 0.0,
 '(3, 4]': 0.0,
 '(4, 5]': 0.0,
 '(5, 6]': 0.0,
 '(6, 7]': 0.0,
 '(7, 8]': 0.0,
 '(8, 9]': 0.0,
 '(9, 10]': 0.0,
 '(10, 11]': 0.0,
 '(11, 12]': 0.0,
 '(12, 13]': 0.0,
 '(13, 14]': 0.0,
 '(14, 15]': 0.0,
 '(15, 16]': 0.0,
 '(16, 17]': 0.0,
 '(17, 18]': 0.0,
 '(18, 19]': 0.0,
 '(19, 20]': 0.0,
 '(20, 21]': 0.0,
 '(21, 22]': 0.0,
 '(22, 23]': 0.0,
 '(23, 24]': 0.0,
 '(24, 25]': 0.0,
 '(25, 26]': 0.0,
 '(26, 27]': 0.0,
 '(27, 28]': 0.0,
 '(28, 29]': 0.0,
 '(29, 30]': 0.0,
 '(30, 31]': 0.0,
 '(31, 32]': 0.0,
 '(32, 33]': 0.0,
 '(33, 34]': 0.6,
 '(34, 35]': 0.0,
 '(35, 36]': 0.0,
 '(36, 37]': 0.0,
 '(37, 38]': 0.0,
 '(38, 39]': 0.0,
 '(39, 40]': 0.0,
 '(40, 41]': 0.0,
 '(41, 42]': 0.0,
 '(42, 43]': 0.0,
 '(43, 44]': 0.0,
 '(44, 45]': 0.0,
 '(45, 46]': 0.0,
 '(46, 47]': 0.0,
 '(47, 48]': 0.0,
 '(48, 49]': 0.0,
 '(49, 50]': 0.2,
 '(50, 51]': 0.0,
 '(51, 52]': 0.0,
 '(52, 53]': 0.0,
 '(53, 54]': 0.0,
 '(54, 55]': 0.0,
 '(55, 56]': 0.0,
 '(56, 57]': 0.0,
 '(57, 58]': 0.0,
 '(58, 59]': 0.0,
 '(59, 60]': 0.0,
 '(60, 61]': 0.0,
 '(61, 62]': 0.0,
 '(62, 63]': 0.0,
 '(63, 64]': 0.0,
 '(64, 65]': 0.0,
 '(65, 66]': 0.0,
 '(66, 67]': 0.0,
 '(67, 68]': 0.0,
 '(68, 69]': 0.0,
 '(69, 70]': 0.0,
 '(70, 71]': 0.0,
 '(71, 72]': 0.0,
 '(72, 73]': 0.0,
 '(73, 74]': 0.0,
 '(74, 75]': 0.0,
 '(75, 76]': 0.0,
 '(76, 77]': 0.0,
 '(77, 78]': 0.0,
 '(78, 79]': 0.0,
 '(79, 80]': 0.0,
 '(80, 81]': 0.0,
 '(81, 82]': 0.0,
 '(82, 83]': 0.0,
 '(83, 84]': 0.0,
 '(84, 85]': 0.0,
 '(85, 86]': 0.0,
 '(86, 87]': 0.0,
 '(87, 88]': 0.0,
 '(88, 89]': 0.0,
 '(89, 90]': 0.0,
 '(90, 91]': 0.0,
 '(91, 92]': 0.0,
 '(92, 93]': 0.0,
 '(93, 94]': 0.0,
 '(94, 95]': 0.0,
 '(95, 96]': 0.0,
 '(96, 97]': 0.0,
 '(97, 98]': 0.0,
 '(98, 99]': 0.0,
 '(99, 100]': 0.2}

The histogram for attribute inference when the sensitive attribute is grade:

results["att_inf"].loc[0, "Histogram"]

Output

{'(0, 1]': 0.0,
 '(1, 2]': 0.0,
 '(2, 3]': 0.0,
 '(3, 4]': 0.0,
 '(4, 5]': 0.0,
 '(5, 6]': 0.0,
 '(6, 7]': 0.0,
 '(7, 8]': 0.0,
 '(8, 9]': 0.0,
 '(9, 10]': 0.0,
 '(10, 11]': 0.0,
 '(11, 12]': 0.0,
 '(12, 13]': 0.0,
 '(13, 14]': 0.0,
 '(14, 15]': 0.0,
 '(15, 16]': 0.0,
 '(16, 17]': 0.0,
 '(17, 18]': 0.0,
 '(18, 19]': 0.0,
 '(19, 20]': 0.0,
 '(20, 21]': 0.0,
 '(21, 22]': 0.0,
 '(22, 23]': 0.0,
 '(23, 24]': 0.0,
 '(24, 25]': 0.0,
 '(25, 26]': 0.0,
 '(26, 27]': 0.0,
 '(27, 28]': 0.0,
 '(28, 29]': 0.0,
 '(29, 30]': 0.0,
 '(30, 31]': 0.0,
 '(31, 32]': 0.0,
 '(32, 33]': 0.0,
 '(33, 34]': 0.0,
 '(34, 35]': 0.0,
 '(35, 36]': 0.0,
 '(36, 37]': 0.0,
 '(37, 38]': 0.0,
 '(38, 39]': 0.0,
 '(39, 40]': 0.0,
 '(40, 41]': 0.0,
 '(41, 42]': 0.0,
 '(42, 43]': 0.0,
 '(43, 44]': 0.0,
 '(44, 45]': 0.0,
 '(45, 46]': 0.0,
 '(46, 47]': 0.0,
 '(47, 48]': 0.0,
 '(48, 49]': 0.0,
 '(49, 50]': 0.0,
 '(50, 51]': 0.0,
 '(51, 52]': 0.0,
 '(52, 53]': 0.0,
 '(53, 54]': 0.0,
 '(54, 55]': 0.0,
 '(55, 56]': 0.0,
 '(56, 57]': 0.0,
 '(57, 58]': 0.0,
 '(58, 59]': 0.0,
 '(59, 60]': 0.0,
 '(60, 61]': 0.0,
 '(61, 62]': 0.0,
 '(62, 63]': 0.0,
 '(63, 64]': 0.0,
 '(64, 65]': 0.0,
 '(65, 66]': 0.0,
 '(66, 67]': 0.6,
 '(67, 68]': 0.0,
 '(68, 69]': 0.0,
 '(69, 70]': 0.0,
 '(70, 71]': 0.0,
 '(71, 72]': 0.0,
 '(72, 73]': 0.0,
 '(73, 74]': 0.0,
 '(74, 75]': 0.0,
 '(75, 76]': 0.0,
 '(76, 77]': 0.0,
 '(77, 78]': 0.0,
 '(78, 79]': 0.0,
 '(79, 80]': 0.0,
 '(80, 81]': 0.0,
 '(81, 82]': 0.0,
 '(82, 83]': 0.0,
 '(83, 84]': 0.0,
 '(84, 85]': 0.0,
 '(85, 86]': 0.0,
 '(86, 87]': 0.0,
 '(87, 88]': 0.0,
 '(88, 89]': 0.0,
 '(89, 90]': 0.0,
 '(90, 91]': 0.0,
 '(91, 92]': 0.0,
 '(92, 93]': 0.0,
 '(93, 94]': 0.0,
 '(94, 95]': 0.0,
 '(95, 96]': 0.0,
 '(96, 97]': 0.0,
 '(97, 98]': 0.0,
 '(98, 99]': 0.0,
 '(99, 100]': 0.4}

The histogram for attribute inference when the sensitive attribute is disability:

results["att_inf"].loc[1, "Histogram"]

Output

{'(0, 1]': 0.0,
 '(1, 2]': 0.0,
 '(2, 3]': 0.0,
 '(3, 4]': 0.0,
 '(4, 5]': 0.0,
 '(5, 6]': 0.0,
 '(6, 7]': 0.0,
 '(7, 8]': 0.0,
 '(8, 9]': 0.0,
 '(9, 10]': 0.0,
 '(10, 11]': 0.0,
 '(11, 12]': 0.0,
 '(12, 13]': 0.0,
 '(13, 14]': 0.0,
 '(14, 15]': 0.0,
 '(15, 16]': 0.0,
 '(16, 17]': 0.0,
 '(17, 18]': 0.0,
 '(18, 19]': 0.0,
 '(19, 20]': 0.0,
 '(20, 21]': 0.0,
 '(21, 22]': 0.0,
 '(22, 23]': 0.0,
 '(23, 24]': 0.0,
 '(24, 25]': 0.0,
 '(25, 26]': 0.0,
 '(26, 27]': 0.0,
 '(27, 28]': 0.0,
 '(28, 29]': 0.0,
 '(29, 30]': 0.0,
 '(30, 31]': 0.0,
 '(31, 32]': 0.0,
 '(32, 33]': 0.0,
 '(33, 34]': 0.0,
 '(34, 35]': 0.0,
 '(35, 36]': 0.0,
 '(36, 37]': 0.0,
 '(37, 38]': 0.0,
 '(38, 39]': 0.0,
 '(39, 40]': 0.0,
 '(40, 41]': 0.0,
 '(41, 42]': 0.0,
 '(42, 43]': 0.0,
 '(43, 44]': 0.0,
 '(44, 45]': 0.0,
 '(45, 46]': 0.0,
 '(46, 47]': 0.0,
 '(47, 48]': 0.0,
 '(48, 49]': 0.0,
 '(49, 50]': 0.2,
 '(50, 51]': 0.0,
 '(51, 52]': 0.0,
 '(52, 53]': 0.0,
 '(53, 54]': 0.0,
 '(54, 55]': 0.0,
 '(55, 56]': 0.0,
 '(56, 57]': 0.0,
 '(57, 58]': 0.0,
 '(58, 59]': 0.0,
 '(59, 60]': 0.0,
 '(60, 61]': 0.0,
 '(61, 62]': 0.0,
 '(62, 63]': 0.0,
 '(63, 64]': 0.0,
 '(64, 65]': 0.0,
 '(65, 66]': 0.0,
 '(66, 67]': 0.6,
 '(67, 68]': 0.0,
 '(68, 69]': 0.0,
 '(69, 70]': 0.0,
 '(70, 71]': 0.0,
 '(71, 72]': 0.0,
 '(72, 73]': 0.0,
 '(73, 74]': 0.0,
 '(74, 75]': 0.0,
 '(75, 76]': 0.0,
 '(76, 77]': 0.0,
 '(77, 78]': 0.0,
 '(78, 79]': 0.0,
 '(79, 80]': 0.0,
 '(80, 81]': 0.0,
 '(81, 82]': 0.0,
 '(82, 83]': 0.0,
 '(83, 84]': 0.0,
 '(84, 85]': 0.0,
 '(85, 86]': 0.0,
 '(86, 87]': 0.0,
 '(87, 88]': 0.0,
 '(88, 89]': 0.0,
 '(89, 90]': 0.0,
 '(90, 91]': 0.0,
 '(91, 92]': 0.0,
 '(92, 93]': 0.0,
 '(93, 94]': 0.0,
 '(94, 95]': 0.0,
 '(95, 96]': 0.0,
 '(96, 97]': 0.0,
 '(97, 98]': 0.0,
 '(98, 99]': 0.0,
 '(99, 100]': 0.2}

replace round with math.ceil and change hist labels to intervals

966e1bb

luigidcsoares requested a review from nunesgh April 17, 2024 05:44

luigidcsoares self-assigned this Apr 17, 2024

luigidcsoares added 2 commits April 17, 2024 17:48

change label repr to str to make half-closed interval explicit

a340993

update single-dataset example

d844e41

luigidcsoares mentioned this pull request Apr 18, 2024

Histograms: Fix rounding and improve format #4

Open

luigidcsoares marked this pull request as ready for review April 18, 2024 05:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: build histogram x-axis as intervals (a, b] (#4) #5

fix: build histogram x-axis as intervals (a, b] (#4) #5

luigidcsoares commented Apr 17, 2024 •

edited

luigidcsoares commented Apr 24, 2024

fix: build histogram x-axis as intervals (a, b] (#4) #5

Are you sure you want to change the base?

fix: build histogram x-axis as intervals (a, b] (#4) #5

Conversation

luigidcsoares commented Apr 17, 2024 • edited

luigidcsoares commented Apr 24, 2024

luigidcsoares commented Apr 17, 2024 •

edited