Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query a parquet file 4 times slower than clickhouse local #115

Open
l1t1 opened this issue Sep 30, 2023 · 8 comments
Open

query a parquet file 4 times slower than clickhouse local #115

l1t1 opened this issue Sep 30, 2023 · 8 comments
Labels
Arrow Apache Arrow support enhancement New feature or request help wanted Extra attention is needed

Comments

@l1t1
Copy link

l1t1 commented Sep 30, 2023

(you don't have to strictly follow this form)

Describe the situation
SELECT avg(i) FROM file('/data/t.parquet') group by round(log10(i));
chdb costs 400s, clickhouse local costs 100s
How to reproduce

  • Which ClickHouse server version to use 23.6
  • Which interface to use, if matters CLI.py
  • Non-default settings, if any
  • CREATE TABLE statements for all tables involved
    select number::int i FROM numbers_mt(1,1000000000)t into outfile '/data/t.parquet';
  • Sample data for all these tables, use clickhouse-obfuscator if necessary
  • Queries to run that lead to slow performance
    SELECT avg(i) FROM file('/data/t.parquet') group by round(log10(i));

Expected performance
What are your performance expectation, why do you think they are realistic? Has it been working faster in older ClickHouse releases? Is it working faster in some specific other system?
I hope chdb runs as fast as clickhouse local.
Additional context
Add any other context about the problem here.
btw
select number::int i FROM numbers_mt(1,1000000000)t into outfile '/data/t.parquet';
chdb runs as fast as clickhouse local

@l1t1
Copy link
Author

l1t1 commented Oct 1, 2023

a 1/10 size parquet has the same problem

root@localhost:/data/data/com.termux/files/home# time python3  -m chdb "SELECT avg(i) FROM file('/data/t2.parquet') group by round(log10(i)); " Pretty
<jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
┏━━━━━━━━━━━━┓
┃     avg(i) ┃
┡━━━━━━━━━━━━┩
│          2 │
├────────────┤
│   17392527 │
├────────────┤
│    17392.5 │
├────────────┤
│     1739.5 │
├────────────┤
│     173925 │
├────────────┤
│       17.5 │
├────────────┤
│        174 │
├────────────┤
│ 65811388.5 │
├────────────┤
│  1739252.5 │
└────────────┘

real    0m38.475s
user    0m37.700s
sys     0m17.630s


root@localhost:/data/data/com.termux/files/home# time ./ch239 --local -q "SELECT avg(i) FROM file('/data/t2.parquet') group by round(log10(i));"
<jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
Lowered mark cache size to 2.79 GiB because the system has limited RAM
2
17392527
17392.5
1739.5
173925
17.5
174
65811388.5
1739252.5

real    0m13.121s
user    0m24.100s
sys     0m6.240s

@lmangani
Copy link
Contributor

lmangani commented Oct 1, 2023

Thanks for the report @l1t1 we appreciate it! This definitely helps refine our scope. The next chdb versions should improve any gaps in performance, although clickhouse-local and chdb are built slightly different (jemalloc for one) so details about the execution context are extremely important. I cannot reproduce this performance gap on my system for instance.

<jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled. This is quite odd. Could you tell us more about the running environment?

@lmangani lmangani added enhancement New feature or request help wanted Extra attention is needed labels Oct 1, 2023
@l1t1
Copy link
Author

l1t1 commented Oct 1, 2023

not only query file, query numbers_mt() has the same question. the max_threads values are the same.


root@localhost:/data/data/com.termux/files/home# time ./ck235 --local -q "SELECT avg(number) FROM numbers_mt(1,100000000) group by round(log10(number));"
<jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
2
17392527
17392.5
1739.5
173925
17.5
174
65811388.5
1739252.5

real    0m1.535s
user    0m7.530s
sys     0m0.200s
root@localhost:/data/data/com.termux/files/home# time python3  -m chdb "SELECT avg(number) FROM numbers_mt(1,100000000) group by round(log10(number))" Pretty
<jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
┏━━━━━━━━━━━━━┓
┃ avg(number) ┃
┡━━━━━━━━━━━━━┩
│           2 │
├─────────────┤
│    17392527 │
├─────────────┤
│     17392.5 │
├─────────────┤
│      1739.5 │
├─────────────┤
│      173925 │
├─────────────┤
│        17.5 │
├─────────────┤
│         174 │
├─────────────┤
│  65811388.5 │
├─────────────┤
│   1739252.5 │
└─────────────┘

real    0m3.695s
user    0m8.820s
sys     0m0.460s

root@localhost:/data/data/com.termux/files/home# time python3  -m chdb "SELECT getSetting('max_threads')" Pretty                                        <jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ getSetting('max_threads') ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│                         8 │
└───────────────────────────┘

real    0m2.337s
user    0m1.500s
sys     0m0.380s
root@localhost:/data/data/com.termux/files/home# time ./ck235 --local -q "SELECT getSetting('max_threads');"                                            <jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
8

real    0m0.609s
user    0m0.390s
sys     0m0.110s

@l1t1
Copy link
Author

l1t1 commented Oct 1, 2023

I run them on termux + proot-distro ubuntu 22.04
a phone has 8-core CPU and 6GB memory


Termux Variables:
TERMUX_APK_RELEASE=GITHUB
TERMUX_APP_PACKAGE_MANAGER=apt
TERMUX_APP_PID=3239
TERMUX_IS_DEBUGGABLE_BUILD=1
TERMUX_MAIN_PACKAGE_FORMAT=debian
TERMUX_VERSION=0.118.0
Packages CPU architecture:
aarch64
Subscribed repositories:
# sources.list
deb https://mirrors.tuna.tsinghua.edu.cn/termux/apt/termux-main stable main
# tur-repo (sources.list.d/tur.list)
deb https://tur.kcubeterm.com tur-packages tur tur-on-device tur-continuous
Updatable packages:
apt/stable 2.7.6 aarch64 [upgradable from: 2.7.3]
brotli/stable 1.1.0 aarch64 [upgradable from: 1.0.9-1]
command-not-found/stable 2.3.0-3 aarch64 [upgradable from: 2.3.0-1]
coreutils/stable 9.4 aarch64 [upgradable from: 9.3]
curl/stable 8.3.0 aarch64 [upgradable from: 8.2.1]
dos2unix/stable 7.5.1 aarch64 [upgradable from: 7.5.0]
libarchive/stable 3.7.2 aarch64 [upgradable from: 3.7.1]
libcurl/stable 8.3.0 aarch64 [upgradable from: 8.2.1]
libimagequant/stable 4.2.1 aarch64 [upgradable from: 4.2.0]
libsqlite/stable 3.43.1 aarch64 [upgradable from: 3.42.0]
libunbound/stable 1.18.0 aarch64 [upgradable from: 1.17.1-2]
libwebp/stable 1.3.2 aarch64 [upgradable from: 1.3.1-2]
openssl/stable 1:3.1.3 aarch64 [upgradable from: 1:3.1.2]
python-numpy/stable 1.26.0 aarch64 [upgradable from: 1.25.2]
python-pandas/tur-packages 2.1.1 aarch64 [upgradable from: 2.1.0]
python-pillow/stable 10.0.1 aarch64 [upgradable from: 10.0.0]
rust/stable 1.72.1 aarch64 [upgradable from: 1.70.0-1]
unbound/stable 1.18.0 aarch64 [upgradable from: 1.17.1-2]
vim-runtime/stable 9.0.1900 all [upgradable from: 9.0.1800]
vim/stable 9.0.1900 aarch64 [upgradable from: 9.0.1800]
termux-tools version:
1.39.0
Android version:
8.1.0
Kernel build information:
Linux localhost 4.4.78-perf-g6520db4 #1 SMP PREEMPT Sat Dec 8 00:50:29 CST 2018 aarch64 Android
Device manufacturer:
Xiaomi
Device model:
Mi Note 3
LD Variables:
LD_LIBRARY_PATH=
LD_PRELOAD=/data/data/com.termux/files/usr/lib/libtermux-exec.so

@l1t1
Copy link
Author

l1t1 commented Oct 1, 2023

another phone didn't show jmalloc message, has the same question too

root@localhost:/data/data/com.termux/files/home# time ./clickhouse --local -q "SELECT avg(number) FROM numbers_mt(1,100000000) group by round(log10(number));"
2
17392527
17392.5
1739.5
173925
17.5
174
65811388.5
1739252.5

real    0m0.617s
user    0m3.563s
sys     0m0.145s
root@localhost:/data/data/com.termux/files/home# time python3  -m chdb "SELECT avg(number) FROM numbers_mt(1,100000000) group by round(log10(number))" Pretty
┏━━━━━━━━━━━━━┓
┃ avg(number) ┃
┡━━━━━━━━━━━━━┩
│           2 │
├─────────────┤
│    17392527 │
├─────────────┤
│     17392.5 │
├─────────────┤
│      1739.5 │
├─────────────┤
│      173925 │
├─────────────┤
│        17.5 │
├─────────────┤
│         174 │
├─────────────┤
│  65811388.5 │
├─────────────┤
│   1739252.5 │
└─────────────┘

real    0m1.688s
user    0m3.928s
sys     0m0.208s

the cpu info

~ $ lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              ARM
  Model name:           Cortex-A55
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r1p0
    CPU(s) scaling MHz: 72%
    CPU max MHz:        1863.0000
    CPU min MHz:        554.0000
    BogoMIPS:           3.84
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrc
                        pc dcpop asimddp
  Model name:           -
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          2
    Stepping:           0x3
    CPU(s) scaling MHz: 65%
    CPU max MHz:        2861.0000
    CPU min MHz:        826.0000
    BogoMIPS:           3.84
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrc
                        pc dcpop asimddp

@l1t1
Copy link
Author

l1t1 commented Oct 2, 2023

maybe it only occurs on a slow CPU, at faster phone, when increasing the data size, chdb runs as fast as clickhouse local.

root@localhost:/data/data/com.termux/files/home# time ./clickhouse --local -q "SELECT avg(number) FROM numbers_mt(1,1000000000) group by round(log10(number));"
2
658113883.5
17392527
17392.5
1739.5
173925
17.5
174
173925271.5
1739252.5

real    0m4.716s
user    0m35.153s
sys     0m0.238s
root@localhost:/data/data/com.termux/files/home# time ./clickhouse --local -q "SELECT avg(number) FROM numbers_mt(1,2000000000) group by round(log10(number));"
2
1158113883.5
17392527
17392.5
1739.5
173925
17.5
174
173925271.5
1739252.5

real    0m9.257s
user    1m10.508s
sys     0m0.407s

root@localhost:~# time python3  -m chdb "SELECT avg(number) FROM numbers_mt(1,1000000000) group by round(log10(number))" Pretty
┏━━━━━━━━━━━━━┓
┃ avg(number) ┃
┡━━━━━━━━━━━━━┩
│           2 │
├─────────────┤
│ 658113883.5 │
├─────────────┤
│    17392527 │
├─────────────┤
│     17392.5 │
├─────────────┤
│      1739.5 │
├─────────────┤
│      173925 │
├─────────────┤
│        17.5 │
├─────────────┤
│         174 │
├─────────────┤
│ 173925271.5 │
├─────────────┤
│   1739252.5 │
└─────────────┘

real    0m5.027s
user    0m33.919s
sys     0m0.405s
root@localhost:~# time python3  -m chdb "SELECT avg(number) FROM numbers_mt(1,2000000000) group by round(log10(number))" Pretty
┏━━━━━━━━━━━━━━┓
┃  avg(number) ┃
┡━━━━━━━━━━━━━━┩
│            2 │
├──────────────┤
│ 1158113883.5 │
├──────────────┤
│     17392527 │
├──────────────┤
│      17392.5 │
├──────────────┤
│       1739.5 │
├──────────────┤
│       173925 │
├──────────────┤
│         17.5 │
├──────────────┤
│          174 │
├──────────────┤
│  173925271.5 │
├──────────────┤
│    1739252.5 │
└──────────────┘

real    0m9.718s
user    1m8.479s
sys     0m0.670s

@lmangani
Copy link
Contributor

lmangani commented Oct 2, 2023

Thanks for the @l1t1 this is interesting and most likely means the performance issue is related to jemalloc use in chdb or even with build options for certain processors or instruction sets. Could we compare the two CPU features by any chance?

@l1t1
Copy link
Author

l1t1 commented Oct 2, 2023

the slow CPU mentions is
Qualcomm Snapdragon 660 Processor

~ $ free
               total        used        free      shared  buff/cache   available
Mem:         5849420     2074308      424304       28024     3350808     3515508
Swap:        2621436      327476     2293960
~ $ lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              Qualcomm
  Model name:           Kryo-V2
    Model:              4
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           0xa
    CPU(s) scaling MHz: 76%
    CPU max MHz:        1843.2000
    CPU min MHz:        633.6000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
  Model name:           Falkor-V1/Kryo
    Model:              2
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           0xa
    CPU(s) scaling MHz: 50%
    CPU max MHz:        2208.0000
    CPU min MHz:        1113.6000
    BogoMIPS:           38.40
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
Caches (sum of all):
  L1d:                  384 KiB (8 instances)
  L1i:                  384 KiB (8 instances)
  L2:                   2 MiB (2 instances)

when increasing data size(not the parquet file size), the gap is smaller.

root@localhost:/data/data/com.termux/files/home# time ./ck235 --local -q "SELECT avg(number) FROM numbers_mt(1,400000000) group by round(log10(number));"
<jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
2
358113883.5
17392527
17392.5
1739.5
173925
17.5
174
173925271.5
1739252.5

real    0m4.258s
user    0m28.420s
sys     0m0.460s
root@localhost:/data/data/com.termux/files/home# time python3  -m chdb "SELECT avg(number) FROM numbers_mt(1,400000000) group by round(log10(number))" Pretty
<jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
┏━━━━━━━━━━━━━┓
┃ avg(number) ┃
┡━━━━━━━━━━━━━┩
│           2 │
├─────────────┤
│ 358113883.5 │
├─────────────┤
│    17392527 │
├─────────────┤
│     17392.5 │
├─────────────┤
│      1739.5 │
├─────────────┤
│      173925 │
├─────────────┤
│        17.5 │
├─────────────┤
│         174 │
├─────────────┤
│ 173925271.5 │
├─────────────┤
│   1739252.5 │
└─────────────┘

real    0m6.457s
user    0m30.310s
sys     0m0.690s
root@localhost:/data/data/com.termux/files/home#
root@localhost:/data/data/com.termux/files/home# time python3  -m chdb "SELECT avg(number) FROM numbers_mt(1,1000000000) group by round(log10(number))" Pretty
<jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
┏━━━━━━━━━━━━━┓                                                                                                                                         ┃ avg(number) ┃
┡━━━━━━━━━━━━━┩
│           2 │
├─────────────┤
│ 658113883.5 │
├─────────────┤
│    17392527 │                                                                                                                                         ├─────────────┤
│     17392.5 │
├─────────────┤
│      1739.5 │
├─────────────┤
│      173925 │
├─────────────┤
│        17.5 │
├─────────────┤
│         174 │
├─────────────┤
│ 173925271.5 │
├─────────────┤
│   1739252.5 │
└─────────────┘

real    0m12.387s
user    1m13.140s
sys     0m1.580s
root@localhost:/data/data/com.termux/files/home# time ./ck235 --local -q "SELECT avg(number) FROM numbers_mt(1,1000000000) group by round(log10(number));"
<jemalloc>: Number of CPUs detected is not deterministic. Per-CPU arena disabled.
2
658113883.5
17392527
17392.5
1739.5
173925
17.5
174
173925271.5
1739252.5

real    0m9.979s
user    1m11.570s
sys     0m0.590s

@auxten auxten added the Arrow Apache Arrow support label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow Apache Arrow support enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants