[SS]《3.1 Structured Streaming 之状态存储解析》讨论区 #33

lw-lin · 2017-01-01T07:37:11Z

如需要贴代码，请复制以下内容并修改：

public static final thisIsJavaCode;

val thisIsScalaCode

谢谢！

junhero · 2017-02-16T07:32:31Z

@lw-lin
如果计算count distinct这种算uv的场景statestore方式不能做吧？

lw-lin · 2017-02-19T06:23:56Z

@junhero

这个跟数据集大小有关。如果数据集非常小，如 user id 的空间很小，那么 statestore 是没有问题的。如果 user id 的空间很大，但每天的 distinct user id 很小，那么 statestore 也是没有问题的。但如果 user id 空间很大，每天的 distinct user id 又很多，那 statestore 就有问题了。可以考虑其它方法如 hyperloglog 等。

junhero · 2017-02-20T10:04:19Z

谢谢

KevinZwx · 2017-08-30T02:49:45Z

您好，我想请教一下stateStore里具体存储的是什么内容？我看到在statefulOperators里的一些对state的put操作如下：

val thisIsScalaCode
val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output)
...
while (iter.hasNext) {
                val row = iter.next().asInstanceOf[UnsafeRow]
                val key = getKey(row)
                store.put(key, row)
                numUpdatedStateRows += 1
              }

lw-lin · 2017-08-31T03:21:09Z

@KevinZwx 是 UnsafeRow；key 和 value 都是 UnsafeRow。UnsafeRow 在 SparkSQL 模块里相当于 Object 在 Java 里的作用。UnsafeRow 里包含各种类型（数值、字符串等）的具体数据。

KevinZwx · 2017-08-31T07:28:30Z

好的谢谢

LinMingQiang · 2019-08-12T06:19:09Z

您好，我想请教下，是不是每次批次的数据在做状态更新的时候都要去hdfs拉一遍对应的stateStore,然后更新完之后再放回hdfs。

lecssmi · 2020-03-10T07:42:08Z

请问一个可能不算是state的问题。在structured streaming中，两个流之间Join，
但是两个流join的时间范围比较大，比如几个小时。那这部分缓存数据，如果内存存不下，会溢写到磁盘吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SS]《3.1 Structured Streaming 之状态存储解析》讨论区 #33

[SS]《3.1 Structured Streaming 之状态存储解析》讨论区 #33

lw-lin commented Jan 1, 2017

junhero commented Feb 16, 2017

lw-lin commented Feb 19, 2017 •

edited

Loading

junhero commented Feb 20, 2017

KevinZwx commented Aug 30, 2017 •

edited

Loading

lw-lin commented Aug 31, 2017

KevinZwx commented Aug 31, 2017

LinMingQiang commented Aug 12, 2019

lecssmi commented Mar 10, 2020

[SS]《3.1 Structured Streaming 之状态存储解析》讨论区 #33

[SS]《3.1 Structured Streaming 之状态存储解析》讨论区 #33

Comments

lw-lin commented Jan 1, 2017

junhero commented Feb 16, 2017

lw-lin commented Feb 19, 2017 • edited Loading

junhero commented Feb 20, 2017

KevinZwx commented Aug 30, 2017 • edited Loading

lw-lin commented Aug 31, 2017

KevinZwx commented Aug 31, 2017

LinMingQiang commented Aug 12, 2019

lecssmi commented Mar 10, 2020

lw-lin commented Feb 19, 2017 •

edited

Loading

KevinZwx commented Aug 30, 2017 •

edited

Loading