Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea #1

Open
MrCyjaneK opened this issue Aug 9, 2021 · 67 comments
Open

Idea #1

MrCyjaneK opened this issue Aug 9, 2021 · 67 comments
Labels
enhancement New feature or request

Comments

@MrCyjaneK
Copy link

MrCyjaneK commented Aug 9, 2021

https://github.com/Miaosi001/JW-Library-macOS/blob/3f39b4a386ba00f52432607a745d7f0b9dcb9db1/JWLibrary/Utility/JWPubManager.swift#L45-51

Here you can get latest catalog version: https://app.jw-cdn.org/catalogs/publications/v4/manifest.json

and with that version go to https://app.jw-cdn.org/catalogs/publications/v4/ + version + /catalog.db.gz

Also, I've seen JWPUB manager.. have you figured out how to read them? (I'm not into swift, and I don't even own an 🍎 device)..

It took me couple of days with no results to get content out of the jwpub. MrCyjaneK/jwapi#1

@darioragusa
Copy link
Owner

Hi, thanks for the help.

I have not yet managed to find a way to read JWPub files. Even if I change the words in the db they don't change in the app. I noticed that if I change a word it does not change in the text but only in the text search function. I have no clue how it works.

@MrCyjaneK
Copy link
Author

ah well. So I'm back to reverse engineering it again..

@darioragusa
Copy link
Owner

darioragusa commented Aug 10, 2021

Me too. I started again this morning. This time I noticed some things that I didn't noticed before.

Now I arrived here:
image

UPDATE
I think I could manage to extract the text. But only the text... I can't find any info regarding punctuation and size.

@MrCyjaneK
Copy link
Author

OH! If you can get the text that would he a huiegrnfdjksxfnwliekfncl (random letters of joy) help for me!

@MrCyjaneK
Copy link
Author

I'm soo happy that somebody figured out how to use it :D <3

@darioragusa
Copy link
Owner

Well yes, but I keep believing that it is not the correct way to do it

@MrCyjaneK
Copy link
Author

MrCyjaneK commented Aug 11, 2021

I'm actually out of luck and I have no idea what to do now.., So I just hope that you will figure out some method for it..

@darioragusa
Copy link
Owner

darioragusa commented Aug 11, 2021

I think everything is in the 'Content' field of the 'Document' table but it is somehow encrypted and we cannot read it. Then two options remain:

  1. Extract the text from the search table and invent up uppercase and punctuation;
  2. Scraping of the text from the site.

And I would say that the first is not really the best. Even the second is not the best but I think it's the only way we have.

@MrCyjaneK
Copy link
Author

How would you do the 1st thing? I believe that it is in fact stored in that table, and it is the correct way to extract the content, and with the leftover bytes we could possibly figure which stand for images/punctation/etc

@darioragusa
Copy link
Owner

darioragusa commented Aug 11, 2021

Starting from the 'TextUnitIndices' column I take all the words that start with '80' (HEX). Then from the column 'PositionalList' the first word is the one that starts with '80', then the word with '81', etc... Reached 'FF' starts again with '00 81 ', '01 81', etc... After 'FF 81' I think there is '00 82' (but I have not yet tried). In the 'PositionalListIndex' column the first value (eg '85') indicates that the word is present 5 times so I could go to subtract that value until it reaches '80' and do not control that word for the current document (maybe I remove the '80' at the end of the cycle). Finally, in the 'TextUnitIndices' column I remove '80' and for all the rows decrease of 1 the next value (eg '81' -> '80', '85' -> '84', etc...). Then I start again the loop for the next document.

@MrCyjaneK
Copy link
Author

MrCyjaneK commented Aug 11, 2021

Ooookay! I'll try to write a parser for that! Thanks for help :D

@darioragusa
Copy link
Owner

You're welcome. Let me know if you find a way to get the punctuation

@MrCyjaneK
Copy link
Author

I will!

@MrCyjaneK
Copy link
Author

image
words like god which probably are on every page, except for the table of content, have length of TextUnitIndices row equal to 18 (in publication with 18 topics + Table Of Content, which I believe is different than the rest.

I'm actually curious what those lss, lsr, sqr etc.. are, maybe formatting?

@MrCyjaneK
Copy link
Author

image
yay! I think we have a bit of it here 🥳

@MrCyjaneK
Copy link
Author

image

Sounds right, at least at the beginning

@MrCyjaneK
Copy link
Author

Reached 'FF' starts again with '00 81 ', '01 81', etc... After 'FF 81' I think there is '00 82' (but I have not yet tried).

right. That's why it's wrong

@MrCyjaneK
Copy link
Author

So - I'm out of luck today - It still produce wrong output, I'll start from scratch in the morning

@MrCyjaneK
Copy link
Author

UH.. do you have some example implementation..?

@darioragusa
Copy link
Owner

Nope, tomorrow I start to make one

@darioragusa
Copy link
Owner

I said tomorrow, here it is 1:10 am so technically it is the next day. Anyway I believe it does not work, it returns repeated words with and without accent

@MrCyjaneK
Copy link
Author

image

Sounds right, at least at the beginning

Here it returned correct thing for the first page, and then it just skipped the words that already were used. I believe that it is correct way to go.

@darioragusa
Copy link
Owner

After 'FF 81'

I was wrong, for some reason it stops at '7F 81' and restart from '00 82'

@darioragusa
Copy link
Owner

I did it! Or at least the first and the last words seem in place

@darioragusa
Copy link
Owner

Registrazione.schermo.2021-08-12.alle.15.25.46-Wi-Fi.High.mp4

🥳🥳🥳

@MrCyjaneK
Copy link
Author

YAYYYYYY

@MrCyjaneK
Copy link
Author

You are genius!

@MrCyjaneK
Copy link
Author

I'm looking forward to see this code :D, you saved my project :D Thank youuu

@darioragusa
Copy link
Owner

darioragusa commented Aug 12, 2021

Here it is:
JWPubExtractor.swift

@MrCyjaneK
Copy link
Author

Thanks!

@darioragusa
Copy link
Owner

I think Content is encrypted with AES, maybe AES-256

@MrCyjaneK
Copy link
Author

🦆 so no luck for us, but if it would be aes'd then the content shouldn't change, it should just break

@darioragusa
Copy link
Owner

darioragusa commented Aug 15, 2021

It's made of pieces of 16 bytes. You can duplicate this pieces but if you change one byte it doesn't work

@MrCyjaneK
Copy link
Author

oh

@darioragusa
Copy link
Owner

By now the only thing that remains is scraping from the site

@darioragusa darioragusa unpinned this issue Aug 29, 2021
@ghost
Copy link

ghost commented Mar 27, 2022

I think Content is encrypted with AES, maybe AES-256

This is the algorithm:

  1. Determine the publication card hash
    1. Query the SQLite Publication table
    2. Create a list with the MepsLanguageIndex, Symbol, Year fields
    3. If the IssueTagNumber field is not zero, add it to the end of the list
    4. Join the list with underscores to one string, for example for w_S_202206.jwpub, this would be 1_w22_2022_20220600
    5. Calculate the SHA 256 hash of that string
    6. Calculate the bitwise XOR with 11cbb5587e32846d4c26790c633da289f66fe5842a3a585ce1bc3a294af5ada7
      CyberChef example 1
  2. Decrypt the text
    1. Query a row from the Document, BibleChapter or BibleVerse table
    2. Read the encoded Content field
    3. Run AES-128-CBC, use the first 16 bytes of the hash as AES Key, and the last 16 bytes as Initialization Vector (IV)
    4. Run Zlib Inflate
      CyberChef example 2

@MrCyjaneK
Copy link
Author

@Bedan1 you are a hero :0

@darioragusa
Copy link
Owner

WOW, thanks a lot @Bedan1

@ghost
Copy link

ghost commented May 22, 2022

Figuring out the encryption of content in the .jwpub file format is reverse engineering. Certain jurisdictions allow you to reverse engineer file formats for the purpose of interoperability. However, it might not be allowed to go the other way by encrypting custom content, it really depends on your use case. Please read the JW Library app terms, see the license agreement paragraph 3 'restrictions on use'. I'm not a lawyer.

@darioragusa
Copy link
Owner

I mean, if they would allow you to edit their publications, they wouldn't spend such efforts to encrypt them

@Chiriat
Copy link

Chiriat commented Nov 6, 2022

Hello,

What does this string refer to? 11cbb5587 ...

How did you calculate the second string in the Cyber Chef 2 table? 3bc2c616d0ca2cff6dc4c0d7263a2327

Thank you

@darioragusa
Copy link
Owner

I don't know from where the first hash comes from. The string 3bc2... is the second half of the result of the first CyberChef example as

Run AES-128-CBC, use the first 16 bytes of the hash as AES Key, and the last 16 bytes as Initialization Vector (IV)

@Chiriat
Copy link

Chiriat commented Nov 6, 2022

Ok grazie.

Hai provato anche con pubblicazioni in altre lingue?

@darioragusa
Copy link
Owner

Ho provato solo in italiano e funziona

@mjacobus
Copy link

mjacobus commented Jan 3, 2023

  • Join the list with underscores to one string, for example for w_S_202206.jwpub, this would be 1_w22_2022_20220600
  • Calculate the SHA 256 hash of that string
  • Calculate the bitwise XOR with 11cbb5587e32846d4c26790c633da289f66fe5842a3a585ce1bc3a294af5ada7

Where is this value coming from? 11cbb5587e32846d4c26790c633da289f66fe5842a3a585ce1bc3a294af5ada7

SHA256 1_w22_2022_20220600 is 815460ec63ef0e18e01fbf9a67bf28f4cdad2392faf074a38c78fafe6ccf8e80, so I am confused.

@darioragusa
Copy link
Owner

I don't know from where it comes from, it's probably a fixed value extracted from the app. The next thing you need to do is to get a third hash by doing a bitwise xor between the two hashes.

@mjacobus
Copy link

mjacobus commented Jan 3, 2023

In this example, it looks like the key and iv are 32 bytes. But when I try to decrypt that message in ruby, I get a message saying the iv and the key should be 16bytes.

# frozen_string_literal: true

RSpec.describe Jwpub::EncryptionHelper do
  subject(:helper) { described_class.new(card_hash: card_hash) }
  let(:input) do
    "f6afc0113fdb368018fa3ba0d5062eeaf4be75acd642d734a467c693c8221c6738830c1444025ded6e6f4fb60cf82c70ae2a693a2a876495ee0c1e590154728349320b59f640074c0833617dac297fcfb556888a083902dfaa82d3f6f526a22652de9aba3f5864d3de9430e67d86eb740be33c26ac1ec2ff4840a584db7a2c23b7779caed3adfaf4932aeba23805158364d20a4b10e7687b5870c433eff996fbc0b94ddc4d124b0465655125ddcf728f3ea81099b6d4c5b395733e996c614c8210816d47735b228c541bb4ba10d00e85dfee89f7f3944ccf996abf02dcd2e7d32e9540f7e5d0532169ace398991b6c33a9fd7abe3232a58820afa41f97f156ab8a1131322928d03a9c89184a3919a474c6b351c8086490ac4d62a27056ac37c651ff61ba12bfd4a794bd163e6c724abf6f311759420189c763ddbc0d1346cd5d52b2b711a850f45b719942342044ad5ab302f5dd31bcf9a270f38a9f6be1acd9dc31a73b15c0e4706f2a8c4176ab7858284322507343213651da199889532c2b942a9094c05e9f68dc3a16e4553518108343896bdae06e5fb21d6334cb986ef39d7d5347a405e7c930f3a5e7e5ec51f909893db85dfef53a3780aed78cd64c3a317c5c9090f62b3a0ed26b80eac17312a3091304121bc5378e570d925c2702f06f7ebe5604df44f244b09261892a14922c9110ae22cad3856c704549f2f055f4b5284040c99cb8df6cd87d96052b3a2ccb1ba5ec15329f6e8e6c668c29225b1c17f1a5a7c3fd4bef84b362b2d9874e2122453c5200791d91abfc354c911be1686af6b3a2f20ce7630ef4b32caa1c72e0678e470196563e6e581e3cb0094e8f21b2e51efca3e47dd117f34afeba6b682ccadf8d3d6f1905e7217bde5c157e8b2a19f2aabda0e378fd8072804ef5ba7fc1439856ac45db6506590d024f79b64695ebba627e6a7f993c6e2f747add42f29420ec3796bc5e93d3700584738d26ba7853e5e6832e4c494350b918c19fe252f5"  # rubocop:disable Layout/LineLength
  end
  let(:key) { "909fd5b41ddd8a75ac39c69604828a7d" }
  let(:iv) { "3bc2c616d0ca2cff6dc4c0d7263a2327" }
  let(:card_hash) { "909fd5b41ddd8a75ac39c69604828a7d3bc2c616d0ca2cff6dc4c0d7263a2327" }

  describe "when only card hash is passed as an argument" do
    subject(:helper) { described_class.new(card_hash: card_hash) }

    it "sets key" do
      expect(helper.key).to eq(key)
    end

    it "sets iv" do
      expect(helper.iv).to eq(iv)
    end
  end

  describe "when key and iv are passed as arguments" do
    subject(:helper) { described_class.new(key: key, iv: iv) }

    it "sets key" do
      expect(helper.key).to eq(key)
    end

    it "sets iv" do
      expect(helper.iv).to eq(iv)
    end
  end

  describe "#decrypt" do
    # This fails with
    # Failure/Error: cipher.key = key
    #
    # ArgumentError:
    #   key must be 16 bytes
    it "decrypts the message" do
      message = helper.decrypt(input)

      expect(message).to include("<strong>No se los pierda en JW Library y en jw.org</strong></h1>")
    end
  end
end
# frozen_string_literal: true

require "openssl"
require "hex_string"

module Jwpub
  # encryption interface
  class EncryptionHelper
    attr_reader :key
    attr_reader :iv

    def initialize(card_hash: nil, key: nil, iv: nil)
      @card_hash = card_hash
      @key = key || card_hash.chars.first(32).join
      @iv = iv || card_hash.chars.last(32).join
    end


    def decrypt(cipher_string)
      cipher = OpenSSL::Cipher.new("aes-128-cbc")
      cipher.decrypt
      cipher.key = key
      cipher.iv = iv
      inflate(cipher.update(cipher_string) + cipher.final)
    end

    private

    def inflate(string)
      zstream = Zlib::Inflate.new
      buf = zstream.inflate(string)
      zstream.finish
      zstream.close
      buf
    end
  end
end

Pulling hair off.

@darioragusa
Copy link
Owner

let(:key) { "909fd5b41ddd8a75ac39c69604828a7d" }
let(:iv) { "3bc2c616d0ca2cff6dc4c0d7263a2327" }
let(:card_hash) { "909fd5b41ddd8a75ac39c69604828a7d3bc2c616d0ca2cff6dc4c0d7263a2327" }

I guess that's because you used them as strings (same for the input). Those are hex values so you should get the raw value first. For example the raw value of 909fd5b41ddd8a75ac39c69604828a7d is 16 bytes

@mjacobus
Copy link

mjacobus commented Jan 4, 2023

Thank you @darioragusa !

I was able to read the contents by converting the input to raw value (I spent so much time trying to figure that out).

What I still could not figure out is where 11cbb5587e32846d4c26790c633da289f66fe5842a3a585ce1bc3a294af5ada7 comes from. I thought that was the manifest hash, but it is not. It is a fixed value that can be used in all publications.

Could that have been leaked, some how, or extracted from the a mobile app package?

@darioragusa
Copy link
Owner

Yes, it's a fixed value for all publications. I don't know, but I guess it was probably extracted from somewhere too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants