Skip to content

Commit

Permalink
cpufreq: stats: Print error messages if table initialization fails
Browse files Browse the repository at this point in the history
Failing to initialize cpufreq_stats table effectively disables
cpufreq_stats. Such failures should not be silent. Print error
messages when cpufreq_stats fails to create stats table.

Change-Id: I71cc0dd8262c7c6946e169f148ae39bd8f213a96
Signed-off-by: Junjie Wu <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: stats: Don't update cpufreq_stats_table if it's NULL

When cpufreq_stats_table is not initialized for policy->last_cpu in
CPUFREQ_UPDATE_POLICY_CPU callback, no updates are necessary. Current
implementation dereferences the table unconditionally, causing a crash
if the table is NULL.

Return directly in cpufreq_stats_update_policy_cpu() if
cpufreq_stats_table of policy->last_cpu is NULL.

Change-Id: Ic9ef8120557702791ba5b873532e47cc0a5dc027
Signed-off-by: Junjie Wu <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: Protect against hotplug in cpufreq_register_driver()

cpufreq_register_driver() could race with CPU hotplug during
bootup. Since hotplug notification is not registered when
subsys_interface_register() is being executed, it's possible
cpufreq's view of online CPUs becomes stale before it registers
for hotplug notification.

Register for hotplug notification first and protect
subsys_interface_register() against hotplug using
get/put_online_cpus().

Change-Id: I26b2908f1d167c2becc4e8664c357bb7c6162406
Signed-off-by: Junjie Wu <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: Return directly in __cpufreq_get if policy is NULL

__cpufreq_get() checks whether policy->cur matches frequency returned
by cpufreq device driver and acts accordingly. However, policy could
be NULL if the CPU is being hotplugged. Current implementation crashes
when trying to dereference the NULL policy.

Return the frequency provided by cpufreq device driver directly if
policy is not available.

Change-Id: I1f2ba4028c259392bf730db37dbec2d8c5ae3fa4
Signed-off-by: Junjie Wu <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: Add if cpu is online check in show

Make sure CPU is online before proceeding with any "show"
ops. Without this check, the show can race with hotplug
and try to show the details of a stale or non-existent
policy.

CRs-Fixed: 689522
Change-Id: Ie791c73cb281bcfc4d722f7c8c10eee07cb11f2e
Signed-off-by: Maria Yu <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: governor: Be friendly towards latency-sensitive bursty workloads

Cpufreq governors like the ondemand governor calculate the load on the CPU
periodically by employing deferrable timers. A deferrable timer won't fire
if the CPU is completely idle (and there are no other timers to be run), in
order to avoid unnecessary wakeups and thus save CPU power.

However, the load calculation logic is agnostic to all this, and this can
lead to the problem described below.

Time (ms)               CPU 1

100                Task-A running

110                Governor's timer fires, finds load as 100% in the last
                   10ms interval and increases the CPU frequency.

110.5              Task-A running

120		   Governor's timer fires, finds load as 100% in the last
		   10ms interval and increases the CPU frequency.

125		   Task-A went to sleep. With nothing else to do, CPU 1
		   went completely idle.

200		   Task-A woke up and started running again.

200.5		   Governor's deferred timer (which was originally programmed
		   to fire at time 130) fires now. It calculates load for the
		   time period 120 to 200.5, and finds the load is almost zero.
		   Hence it decreases the CPU frequency to the minimum.

210		   Governor's timer fires, finds load as 100% in the last
		   10ms interval and increases the CPU frequency.

So, after the workload woke up and started running, the frequency was suddenly
dropped to absolute minimum, and after that, there was an unnecessary delay of
10ms (sampling period) to increase the CPU frequency back to a reasonable value.
And this pattern repeats for every wake-up-from-cpu-idle for that workload.
This can be quite undesirable for latency- or response-time sensitive bursty
workloads. So we need to fix the governor's logic to detect such wake-up-from-
cpu-idle scenarios and start the workload at a reasonably high CPU frequency.

One extreme solution would be to fake a load of 100% in such scenarios. But
that might lead to undesirable side-effects such as frequency spikes (which
might also need voltage changes) especially if the previous frequency happened
to be very low.

We just want to avoid the stupidity of dropping down the frequency to a minimum
and then enduring a needless (and long) delay before ramping it up back again.
So, let us simply carry forward the previous load - that is, let us just pretend
that the 'load' for the current time-window is the same as the load for the
previous window. That way, the frequency and voltage will continue to be set
to whatever values they were set at previously. This means that bursty workloads
will get a chance to influence the CPU frequency at which they wake up from
cpu-idle, based on their past execution history. Thus, they might be able to
avoid suffering from slow wakeups and long response-times.

However, we should take care not to over-do this. For example, such a "copy
previous load" logic will benefit cases like this: (where # represents busy
and . represents idle)

but it will be detrimental in cases like the one shown below, because it will
retain the high frequency (copied from the previous interval) even in a mostly
idle system:

(i.e., the workload finished and the remaining tasks are such that their busy
periods are smaller than the sampling interval, which causes the timer to
always get deferred. So, this will make the copy-previous-load logic copy
the initial high load to subsequent idle periods over and over again, thus
keeping the frequency high unnecessarily).

So, we modify this copy-previous-load logic such that it is used only once
upon every wakeup-from-idle. Thus if we have 2 consecutive idle periods, the
previous load won't get blindly copied over; cpufreq will freshly evaluate the
load in the second idle interval, thus ensuring that the system comes back to
its normal state.

[ The right way to solve this whole problem is to teach the CPU frequency
governors to also track load on a per-task basis, not just a per-CPU basis,
and then use both the data sources intelligently to set the appropriate
frequency on the CPUs. But that involves redesigning the cpufreq subsystem,
so this patch should make the situation bearable until then. ]

Experimental results:
+-------------------+

I ran a modified version of ebizzy (called 'sleeping-ebizzy') that sleeps in
between its execution such that its total utilization can be a user-defined
value, say 10% or 20% (higher the utilization specified, lesser the amount of
sleeps injected). This ebizzy was run with a single-thread, tied to CPU 8.

Behavior observed with tracing (sample taken from 40% utilization runs):
------------------------------------------------------------------------

Without patch:
~~~~~~~~~~~~~~
kworker/8:2-12137  416.335742: cpu_frequency: state=2061000 cpu_id=8
kworker/8:2-12137  416.335744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40753  416.345741: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137  416.345744: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-12137  416.345746: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40753  416.355738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
<snip>  ---------------------------------------------------------------------  <snip>
      <...>-40753  416.402202: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
     <idle>-0      416.502130: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
      <...>-40753  416.505738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137  416.505739: cpu_frequency: state=2061000 cpu_id=8
kworker/8:2-12137  416.505741: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40753  416.515739: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137  416.515742: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-12137  416.515744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy

Observation: Ebizzy went idle at 416.402202, and started running again at
416.502130. But cpufreq noticed the long idle period, and dropped the frequency
at 416.505739, only to increase it back again at 416.515742, realizing that the
workload is in-fact CPU bound. Thus ebizzy needlessly ran at the lowest frequency
for almost 13 milliseconds (almost 1 full sample period), and this pattern
repeats on every sleep-wakeup. This could hurt latency-sensitive workloads quite
a lot.

With patch:
~~~~~~~~~~~

kworker/8:2-29802  464.832535: cpu_frequency: state=2061000 cpu_id=8
<snip>  ---------------------------------------------------------------------  <snip>
kworker/8:2-29802  464.962538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  464.972533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802  464.972536: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-29802  464.972538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  464.982531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
<snip>  ---------------------------------------------------------------------  <snip>
kworker/8:2-29802  465.022533: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  465.032531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802  465.032532: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  465.035797: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
     <idle>-0      465.240178: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
      <...>-40738  465.242533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802  465.242535: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  465.252531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2

Observation: Ebizzy went idle at 465.035797, and started running again at
465.240178. Since ebizzy was the only real workload running on this CPU,
cpufreq retained the frequency at 4.1Ghz throughout the run of ebizzy, no
matter how many times ebizzy slept and woke-up in-between. Thus, ebizzy
got the 10ms worth of 4.1 Ghz benefit during every sleep-wakeup (as compared
to the run without the patch) and this boost gave a modest improvement in total
throughput, as shown below.

Sleeping-ebizzy records-per-second:
-----------------------------------

Utilization  Without patch  With patch  Difference (Absolute and % values)
    10%         274767        277046        +  2279 (+0.829%)
    20%         543429        553484        + 10055 (+1.850%)
    40%        1090744       1107959        + 17215 (+1.578%)
    60%        1634908       1662018        + 27110 (+1.658%)

A rudimentary and somewhat approximately latency-sensitive workload such as
sleeping-ebizzy itself showed a consistent, noticeable performance improvement
with this patch. Hence, workloads that are truly latency-sensitive will benefit
quite a bit from this change. Moreover, this is an overall win-win since this
patch does not hurt power-savings at all (because, this patch does not reduce
the idle time or idle residency; and the high frequency of the CPU when it goes
to cpu-idle does not affect/hurt the power-savings of deep idle states).

Signed-off-by: Srivatsa S. Bhat <[email protected]>
Reviewed-by: Gautham R. Shenoy <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled

When a CPU is hot removed we'll cancel all the delayed work items via
gov_cancel_work(). Sometimes the delayed work function determines that
it should adjust the delay for all other CPUs that the policy is
managing. If this scenario occurs, the canceling CPU will cancel its own
work but queue up the other CPUs works to run.

Commit 3617f2 (cpufreq: Fix timer/workqueue corruption due to double
queueing) has tried to fix this, but reading governor_enabled is not
protected by cpufreq_governor_lock. Even though od_dbs_timer() checks
governor_enabled before gov_queue_work(), this scenario may occur. For
example:

 CPU0                                        CPU1
 ----                                        ----
 cpu_down()
  ...                                        <work runs>
  __cpufreq_remove_dev()                     od_dbs_timer()
   __cpufreq_governor()                       policy->governor_enabled
    policy->governor_enabled = false;
    cpufreq_governor_dbs()
     case CPUFREQ_GOV_STOP:
      gov_cancel_work(dbs_data, policy);
       cpu0 work is canceled
        timer is canceled
        cpu1 work is canceled
        <waits for cpu1>
                                              gov_queue_work(*, *, true);
                                               cpu0 work queued
                                               cpu1 work queued
                                               cpu2 work queued
                                               ...
        cpu1 work is canceled
        cpu2 work is canceled
        ...

At the end of the GOV_STOP case cpu0 still has a work queued to
run although the code is expecting all of the works to be
canceled. __cpufreq_remove_dev() will then proceed to
re-initialize all the other CPUs works except for the CPU that is
going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
will trample over the queued work and debugobjects will spit out
a warning:

WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14
Modules linked in:
CPU: 1 PID: 1205 Comm: sh Tainted: G        W    3.10.0 #200
[<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14)
[<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68)
[<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40)
[<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc)
[<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0)
[<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104)
[<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c)
[<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0)
[<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380)
[<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c)
[<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84)
[<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48)
[<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258)
[<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c)
[<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74)
[<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24)
[<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180)
[<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184)
[<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68)
[<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48)

In gov_queue_work(), lock cpufreq_governor_lock before gov_queue_work,
and unlock it after __gov_queue_work(). In this way, governor_enabled
is guaranteed not changed in gov_queue_work().

Signed-off-by: Jane Li <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Reviewed-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: remove race while accessing cur_policy

While accessing cur_policy during executing events
CPUFREQ_GOV_START, CPUFREQ_GOV_STOP, CPUFREQ_GOV_LIMITS,
same mutex lock is not taken, dbs_data->mutex, which leads
to race and data corruption while running continious suspend
resume test. This is seen with ondemand governor with suspend
resume test using rtcwake.

 Unable to handle kernel NULL pointer dereference at virtual address 00000028
 pgd = ed610000
 [00000028] *pgd=adf11831, *pte=00000000, *ppte=00000000
 Internal error: Oops: 17 [#1] PREEMPT SMP ARM
 Modules linked in: nvhost_vi
 CPU: 1 PID: 3243 Comm: rtcwake Not tainted 3.10.24-gf5cf9e5 #1
 task: ee708040 ti: ed61c000 task.ti: ed61c000
 PC is at cpufreq_governor_dbs+0x400/0x634
 LR is at cpufreq_governor_dbs+0x3f8/0x634
 pc : [<c05652b8>] lr : [<c05652b0>] psr: 600f0013
 sp : ed61dcb0 ip : 000493e0 fp : c1cc14f0
 r10: 00000000 r9 : 00000000 r8 : 00000000
 r7 : eb725280 r6 : c1cc1560 r5 : eb575200 r4 : ebad7740
 r3 : ee708040 r2 : ed61dca8 r1 : 001ebd24 r0 : 00000000
 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
 Control: 10c5387d Table: ad61006a DAC: 00000015
 [<c05652b8>] (cpufreq_governor_dbs+0x400/0x634) from [<c055f700>] (__cpufreq_governor+0x98/0x1b4)
 [<c055f700>] (__cpufreq_governor+0x98/0x1b4) from [<c0560770>] (__cpufreq_set_policy+0x250/0x320)
 [<c0560770>] (__cpufreq_set_policy+0x250/0x320) from [<c0561dcc>] (cpufreq_update_policy+0xcc/0x168)
 [<c0561dcc>] (cpufreq_update_policy+0xcc/0x168) from [<c0561ed0>] (cpu_freq_notify+0x68/0xdc)
 [<c0561ed0>] (cpu_freq_notify+0x68/0xdc) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c)
 [<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68)
 [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28)
 [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310)
 [<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310) from [<c00ab3b0>] (__pm_qos_update_request+0x64/0x70)
 [<c00ab3b0>] (__pm_qos_update_request+0x64/0x70) from [<c004b4b8>] (tegra_pm_notify+0x114/0x134)
 [<c004b4b8>] (tegra_pm_notify+0x114/0x134) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c)
 [<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68)
 [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28)
 [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00ac228>] (pm_notifier_call_chain+0x1c/0x34)
 [<c00ac228>] (pm_notifier_call_chain+0x1c/0x34) from [<c00ad38c>] (enter_state+0xec/0x128)
 [<c00ad38c>] (enter_state+0xec/0x128) from [<c00ad400>] (pm_suspend+0x38/0xa4)
 [<c00ad400>] (pm_suspend+0x38/0xa4) from [<c00ac114>] (state_store+0x70/0xc0)
 [<c00ac114>] (state_store+0x70/0xc0) from [<c027b1e8>] (kobj_attr_store+0x14/0x20)
 [<c027b1e8>] (kobj_attr_store+0x14/0x20) from [<c019cd9c>] (sysfs_write_file+0x104/0x184)
 [<c019cd9c>] (sysfs_write_file+0x104/0x184) from [<c0143038>] (vfs_write+0xd0/0x19c)
 [<c0143038>] (vfs_write+0xd0/0x19c) from [<c0143414>] (SyS_write+0x4c/0x78)
 [<c0143414>] (SyS_write+0x4c/0x78) from [<c000f080>] (ret_fast_syscall+0x0/0x30)
 Code: e1a00006 eb084346 e59b0020 e5951024 (e5903028)
 ---[ end trace 0488523c8f6b0f9d ]---

Signed-off-by: Bibek Basu <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Cc: 3.11+ <[email protected]> # 3.11+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info'

'copy_prev_load' was recently added by commit: 18b46ab (cpufreq: governor: Be
friendly towards latency-sensitive bursty workloads).

It actually is a bit redundant as we also have 'prev_load' which can store any
integer value and can be used instead of 'copy_prev_load' by setting it zero.

True load can also turn out to be zero during long idle intervals (and hence the
actual value of 'prev_load' and the overloaded value can clash). However this is
not a problem because, if the true load was really zero in the previous
interval, it makes sense to evaluate the load afresh for the current interval
rather than copying the previous load.

So, drop 'copy_prev_load' and use 'prev_load' instead.

Update comments as well to make it more clear.

There is another change here which was probably missed by Srivatsa during the
last version of updates he made. The unlikely in the 'if' statement was covering
only half of the condition and the whole line should actually come under it.

Also checkpatch is made more silent as it was reporting this (--strict option):

CHECK: Alignment should match open parenthesis
+		if (unlikely(wall_time > (2 * sampling_rate) &&
+						j_cdbs->prev_load)) {

Signed-off-by: Viresh Kumar <[email protected]>
Reviewed-by: Srivatsa S. Bhat <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: Break out early when frequency equals target_freq

Many drivers keep frequencies in frequency table in ascending
or descending order. When governor tries to change to policy->min
or policy->max respectively then the cpufreq_frequency_table_target
could return on first iteration. This will save some iteration cycles.

So, break out early when a frequency in cpufreq_frequency_table
equals to target one.

Testing this during kernel compilation using ondemand governor
with a frequency table in ascending order, the
cpufreq_frequency_table_target returned early on the first
iteration at about 30% of times called.

Signed-off-by: Stratos Karafotis <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

Revert "cpufreq: make the "scaling_cur_freq" sysfs entry pollable"

This reverts commit ee974af.

Junjuie says:
    sysfs_notify of scaling_cur_freq is no longer used by userspace
    components. Since sysfs_notify contends on a single mutex, it could
    add long delay to CPU frequency scaling.

    Remove the sysfs_notify() to speed up CPU frequency scaling.

Signed-off-by: franciscofranco <[email protected]>

qcom-cpufreq: Restore CPU frequency during resume

qcom-cpufreq blocks CPU frequency change request during suspend, because
its dependencies might be suspended. Thus a freq change request would
fail silently, and CPU clock won't change until first frequency update
is requested after system comes out of suspend. This creates a period
when thermal driver cannot perform frequency mitigation, even though
policy->min/max have been correctly updated.

Check each online CPU's policy during resume to correct any frequency
violation as soon as possible.

Change-Id: I3be79cf91e7d5e361314020c9806b770823c0b72
Signed-off-by: Junjie Wu <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: Check current frequency in device driver

__cpufreq_driver_target() checks if policy->cur is same as target_freq
without holding any lock. This function is used by governor to
directly set CPU frequency. Governor calling this function can't hold
any CPUfreq framework locks due to deadlock possibility.

However, this results in a race condition where one thread could see
a stale policy->cur while another thread is changing CPU frequency.

Thread A: Governor calls __cpufreq_driver_target(), starts increasing
frequency but hasn't sent out CPUFREQ_POSTCHANGE notification yet.
Thread B: Some other driver (could be thermal mitigation) starts
limiting frequency using cpufreq_update_policy(). Every limits are
applied to policy->min/max and final policy->max happens to be same as
policy->cur. __cpufreq_driver_target() simply returns 0.
Thread A: Governor finish scaling and now policy->cur violates
policy->max and could last forever until next CPU frequency scaling
happens.

Shifting the responsibility of checking policy->cur and target_freq
to CPUfreq device driver would resolve the race as long as the device
driver holds a common mutex.

Change-Id: I6f943228e793a4a4300c58b3ae0143e09ed01d7d
Signed-off-by: Junjie Wu <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

IKSWL-14278 cpufreq: set policy with user_policy data

When we update/set cpufreq policy, we should set the new
policy with user_policy data, and the cpufreq notifier
will limit the min/max freq accordingly.

This is the right process to update the cpufreq policy.

But we miss it in store_scaling_governor func, which cause
the governor set failed if the policy.max is less than
user_policy.min.

Change-Id: Ie21593e146b8d06fa017c87434e85ff33456c20f
Signed-off-by: Lianwei Wang <[email protected]>
Reviewed-on: http://gerrit.mot.com/746985
SLTApproved: Slta Waiver <[email protected]>
SME-Granted: SME Approvals Granted
Tested-by: Jira Key <[email protected]>
Reviewed-by: Ke Lu <[email protected]>
Reviewed-by: Ravi Chebolu <[email protected]>
Reviewed-by: Christopher Fries <[email protected]>
Submit-Approved: Jira Key <[email protected]>
Signed-off-by: franciscofranco <[email protected]>

cpufreq: Use correct locking for cpufreq_cpu_data

Use write lock when updating cpufreq_cpu_data,
and read lock when getting the policy pointer.

CRs-Fixed: 689522
Change-Id: I454f0d575157b3411d369e04097386f50aeaaa1c
Signed-off-by: Maria Yu <[email protected]>
Signed-off-by: Francisco Franco <[email protected]>

cpufreq: Disable light-weight init/teardown during suspend/resume

Light-weight init/teardown is introduced to preserve file permission and
reduce suspend/resume latency. However, it doesn't work correctly if
multiple CPUs controlled by same policy can all go offline.

Suspend and resume removes and adds back CPUs in the same order for
non-boot CPUs. Say CPU2 and CPU3 are both online when suspend starts.
CPU2 goes offline first, handing policy and sysfs over to CPU3. Then
CPU3 goes offline. Due to light-weight teardown, CPU3 still owns the
policy and sysfs files.

When CPU2 comes online after resume, it calls update_policy_cpu() to take
back the policy ownership, but sysfs is not touched. Now CPU2 is online,
with an active policy, but no sysfs folder. In additions, when CPU3 comes
online, it will attempt to create a symlink while owning the real sysfs
folder.

To really fix this issue, special handling for sysfs and symlinks is
required during suspend and resume. This requires non-trivial refactoring
of certain functions.

Temporarly disable light-weight init/teardown until a solution is found.

Change-Id: I485483244954129fa405bc5ef1a5e0da5c05a7d5
Signed-off-by: Junjie Wu <[email protected]>
Signed-off-by: Francisco Franco <[email protected]>

cpufreq: Save state for entire cluster when the last CPU goes offline

When the last CPU within a cluster goes offline its cpufreq limits and
the current governor are saved so that it can start with the same
state (cpufreq min/max limits and scaling governor) when it comes back
online later. However if a different CPU from the same cluster comes
online after this it will restore the state that it saved when it went
offline and which can be different from the state of the CPU last
offlined. This can leave the system in incorrect (i.e. not the latest)
state since the rest of the CPUs in the same cluster brought back
online later will be associated with the CPU that first came online.
To prevent such a scenario save the state for all CPUs in a cluster
when the last CPU goes offline (since the last CPU holds the latest
updates) so that it can be properly restored by any CPU that first
comes online in that cluster.

Change-Id: I67cd6bb219b7cc4fd18507ffb9b43ca37dcf0ae7
Signed-off-by: Rohit Gupta <[email protected]>
Signed-off-by: Francisco Franco <[email protected]>

cpufreq: force all cores to use the same governor

If BCL is enabled, and under certain conditions cpu2 and cpu3
are pegged to max freq when they come back online (if they've been
offlined by BCL). If this happens during boot cpu2 and cpu3 lose
their policy governor reference (I don't know why) and simply fall
back to the default performance governor.

Signed-off-by: Francisco Franco <[email protected]>

Nexus 6: user voltage control

Signed-off-by: flar2 <[email protected]>
Signed-off-by: franciscofranco <[email protected]>
  • Loading branch information
Junjie Wu authored and franciscofranco committed Oct 6, 2016
1 parent 3472dd8 commit 3f5f78a
Show file tree
Hide file tree
Showing 8 changed files with 323 additions and 45 deletions.
6 changes: 6 additions & 0 deletions arch/arm/mach-msm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -768,6 +768,12 @@ config MSM_DEVFREQ_CPUBW
agnostic interface to so that some of the devfreq governors can be
shared across SoCs.

config MSM_CPU_VOLTAGE_CONTROL
bool "Userspace CPU Voltage Control"
default y
help
This enables userspace CPU Voltage Control.

config MSM_AVS_HW
bool "Enable Adaptive Voltage Scaling (AVS)"
default n
Expand Down
71 changes: 71 additions & 0 deletions drivers/clk/qcom/clock-krait-8974.c
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,14 @@
#include <soc/qcom/clock-local2.h>
#include <soc/qcom/clock-krait.h>
#include <mach/mmi_soc_info.h>
#include <linux/cpufreq.h>

#include <asm/cputype.h>

#include "clock.h"



/* Clock inputs coming into Krait subsystem */
DEFINE_FIXED_DIV_CLK(hfpll_src_clk, 1, NULL);
DEFINE_FIXED_DIV_CLK(acpu_aux_clk, 2, NULL);
Expand Down Expand Up @@ -693,6 +696,74 @@ module_param_string(table_name, table_name, sizeof(table_name), S_IRUGO);
static unsigned int pvs_config_ver;
module_param(pvs_config_ver, uint, S_IRUGO);

#ifdef CONFIG_MSM_CPU_VOLTAGE_CONTROL
#define CPU_VDD_MAX 1200
#define CPU_VDD_MIN 600

extern int use_for_scaling(unsigned int freq);
static unsigned int cnt;

ssize_t show_UV_mV_table(struct cpufreq_policy *policy,
char *buf)
{
int i, freq, len = 0;
unsigned int cpu = 0;
unsigned int num_levels = cpu_clk[cpu]->vdd_class->num_levels;

if (!buf)
return -EINVAL;

for (i = 0; i < num_levels; i++) {
freq = use_for_scaling(cpu_clk[cpu]->fmax[i] / 1000);
if (freq < 0)
continue;

len += sprintf(buf + len, "%dmhz: %u mV\n", freq / 1000,
cpu_clk[cpu]->vdd_class->vdd_uv[i] / 1000);
}

return len;
}

ssize_t store_UV_mV_table(struct cpufreq_policy *policy,
char *buf, size_t count)
{
int i, j;
int ret = 0;
unsigned int val, cpu = 0;
unsigned int num_levels = cpu_clk[cpu]->vdd_class->num_levels;
char size_cur[4];

if (cnt) {
cnt = 0;
return -EINVAL;
}

for (i = 0; i < num_levels; i++) {
if (use_for_scaling(cpu_clk[cpu]->fmax[i] / 1000) < 0)
continue;

ret = sscanf(buf, "%u", &val);
if (!ret)
return -EINVAL;

if (val > CPU_VDD_MAX)
val = CPU_VDD_MAX;
else if (val < CPU_VDD_MIN)
val = CPU_VDD_MIN;

for (j = 0; j < NR_CPUS; j++)
cpu_clk[j]->vdd_class->vdd_uv[i] = val * 1000;

ret = sscanf(buf, "%s", size_cur);
cnt = strlen(size_cur);
buf += cnt + 1;
}

return ret;
}
#endif

static int clock_krait_8974_driver_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
Expand Down
117 changes: 82 additions & 35 deletions drivers/cpufreq/cpufreq.c
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ static struct cpufreq_driver *cpufreq_driver;
static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback);
static DEFINE_RWLOCK(cpufreq_driver_lock);
static DEFINE_MUTEX(cpufreq_governor_lock);
DEFINE_MUTEX(cpufreq_governor_lock);
static LIST_HEAD(cpufreq_policy_list);

#ifdef CONFIG_HOTPLUG_CPU
Expand Down Expand Up @@ -305,10 +305,8 @@ static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
trace_cpu_frequency(freqs->new, freqs->cpu);
srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
CPUFREQ_POSTCHANGE, freqs);
if (likely(policy) && likely(policy->cpu == freqs->cpu)) {
if (likely(policy) && likely(policy->cpu == freqs->cpu))
policy->cur = freqs->new;
sysfs_notify(&policy->kobj, NULL, "scaling_cur_freq");
}
break;
}
}
Expand Down Expand Up @@ -504,6 +502,9 @@ static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
&new_policy.governor))
return -EINVAL;

new_policy.min = new_policy.user_policy.min;
new_policy.max = new_policy.user_policy.max;

ret = cpufreq_set_policy(policy, &new_policy);

policy->user_policy.policy = policy->policy;
Expand Down Expand Up @@ -625,6 +626,14 @@ static ssize_t show_bios_limit(struct cpufreq_policy *policy, char *buf)
return sprintf(buf, "%u\n", policy->cpuinfo.max_freq);
}

#ifdef CONFIG_MSM_CPU_VOLTAGE_CONTROL
extern ssize_t show_UV_mV_table(struct cpufreq_policy *policy,
char *buf);

extern ssize_t store_UV_mV_table(struct cpufreq_policy *policy,
const char *buf, size_t count);
#endif

cpufreq_freq_attr_ro_perm(cpuinfo_cur_freq, 0400);
cpufreq_freq_attr_ro(cpuinfo_min_freq);
cpufreq_freq_attr_ro(cpuinfo_max_freq);
Expand All @@ -639,6 +648,9 @@ cpufreq_freq_attr_rw(scaling_min_freq);
cpufreq_freq_attr_rw(scaling_max_freq);
cpufreq_freq_attr_rw(scaling_governor);
cpufreq_freq_attr_rw(scaling_setspeed);
#ifdef CONFIG_MSM_CPU_VOLTAGE_CONTROL
cpufreq_freq_attr_rw(UV_mV_table);
#endif

static struct attribute *default_attrs[] = {
&cpuinfo_min_freq.attr,
Expand All @@ -652,6 +664,9 @@ static struct attribute *default_attrs[] = {
&scaling_driver.attr,
&scaling_available_governors.attr,
&scaling_setspeed.attr,
#ifdef CONFIG_MSM_CPU_VOLTAGE_CONTROL
&UV_mV_table.attr,
#endif
NULL
};

Expand All @@ -662,10 +677,15 @@ static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)
{
struct cpufreq_policy *policy = to_policy(kobj);
struct freq_attr *fattr = to_attr(attr);
ssize_t ret;
ssize_t ret = -EINVAL;

get_online_cpus();

if (!cpu_online(policy->cpu))
goto unlock;

if (!down_read_trylock(&cpufreq_rwsem))
return -EINVAL;
goto unlock;

down_read(&policy->rwsem);

Expand All @@ -676,7 +696,8 @@ static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)

up_read(&policy->rwsem);
up_read(&cpufreq_rwsem);

unlock:
put_online_cpus();
return ret;
}

Expand Down Expand Up @@ -1200,6 +1221,27 @@ static int cpufreq_nominate_new_policy_cpu(struct cpufreq_policy *policy,
return cpu_dev->id;
}

#ifdef CONFIG_HOTPLUG_CPU
static void update_related_cpus(struct cpufreq_policy *policy)
{
unsigned int j;

for_each_cpu(j, policy->related_cpus) {
if (!cpufreq_driver->setpolicy)
strlcpy(per_cpu(cpufreq_policy_save, j).gov,
policy->governor->name, CPUFREQ_NAME_LEN);
per_cpu(cpufreq_policy_save, j).min = policy->user_policy.min;
per_cpu(cpufreq_policy_save, j).max = policy->user_policy.max;
pr_debug("Saving CPU%d user policy min %d and max %d\n",
j, policy->user_policy.min, policy->user_policy.max);
}
}
#else
static void update_related_cpus(struct cpufreq_policy *policy)
{
}
#endif

static int __cpufreq_remove_dev_prepare(struct device *dev,
struct subsys_interface *sif,
bool frozen)
Expand Down Expand Up @@ -1235,20 +1277,13 @@ static int __cpufreq_remove_dev_prepare(struct device *dev,
}
}

#ifdef CONFIG_HOTPLUG_CPU
if (!cpufreq_driver->setpolicy)
strlcpy(per_cpu(cpufreq_policy_save, cpu).gov,
policy->governor->name, CPUFREQ_NAME_LEN);
per_cpu(cpufreq_policy_save, cpu).min = policy->user_policy.min;
per_cpu(cpufreq_policy_save, cpu).max = policy->user_policy.max;
pr_debug("Saving CPU%d user policy min %d and max %d\n",
cpu, policy->user_policy.min, policy->user_policy.max);
#endif

down_read(&policy->rwsem);
cpus = cpumask_weight(policy->cpus);
up_read(&policy->rwsem);

if (cpus == 1)
update_related_cpus(policy);

if (cpu != policy->cpu) {
sysfs_remove_link(&dev->kobj, "cpufreq");
} else if (cpus > 1) {
Expand All @@ -1275,10 +1310,10 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
unsigned long flags;
struct cpufreq_policy *policy;

read_lock_irqsave(&cpufreq_driver_lock, flags);
write_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, cpu);
per_cpu(cpufreq_cpu_data, cpu) = NULL;
read_unlock_irqrestore(&cpufreq_driver_lock, flags);
write_unlock_irqrestore(&cpufreq_driver_lock, flags);

if (!policy) {
pr_debug("%s: No cpu_data found\n", __func__);
Expand Down Expand Up @@ -1444,14 +1479,22 @@ EXPORT_SYMBOL(cpufreq_quick_get_max);

static unsigned int __cpufreq_get(unsigned int cpu)
{
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
struct cpufreq_policy *policy;
unsigned int ret_freq = 0;
unsigned long flags;

if (!cpufreq_driver->get)
return ret_freq;

read_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, cpu);
read_unlock_irqrestore(&cpufreq_driver_lock, flags);

ret_freq = cpufreq_driver->get(cpu);

if (!policy)
return ret_freq;

if (ret_freq && policy->cur &&
!(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) {
/* verify no discrepancy between actual and
Expand All @@ -1473,12 +1516,17 @@ static unsigned int __cpufreq_get(unsigned int cpu)
*/
unsigned int cpufreq_get(unsigned int cpu)
{
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
struct cpufreq_policy *policy;
unsigned int ret_freq = 0;
unsigned long flags;

if (cpufreq_disabled() || !cpufreq_driver)
return -ENOENT;

read_lock_irqsave(&cpufreq_driver_lock, flags);
policy = per_cpu(cpufreq_cpu_data, cpu);
read_unlock_irqrestore(&cpufreq_driver_lock, flags);

BUG_ON(!policy);

if (!down_read_trylock(&cpufreq_rwsem))
Expand Down Expand Up @@ -1696,15 +1744,6 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n",
policy->cpu, target_freq, relation, old_target_freq);

/*
* This might look like a redundant call as we are checking it again
* after finding index. But it is left intentionally for cases where
* exactly same freq is called again and so we can save on few function
* calls.
*/
if (target_freq == policy->cur)
return 0;

if (cpufreq_driver->target)
retval = cpufreq_driver->target(policy, target_freq, relation);
else if (cpufreq_driver->target_index) {
Expand Down Expand Up @@ -1956,6 +1995,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
struct cpufreq_policy *new_policy)
{
int ret = 0, failed = 1;
struct cpufreq_policy *cpu0_policy = NULL;

pr_debug("setting new policy for CPU %u: %u - %u kHz\n", new_policy->cpu,
new_policy->min, new_policy->max);
Expand Down Expand Up @@ -1997,6 +2037,9 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
policy->max = new_policy->max;
trace_cpu_frequency_limits(policy->max, policy->min, policy->cpu);

if (new_policy->cpu)
cpu0_policy = cpufreq_cpu_get(0);

pr_debug("new min and max freqs are %u - %u kHz\n",
policy->min, policy->max);

Expand All @@ -2021,7 +2064,10 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
}

/* start new governor */
policy->governor = new_policy->governor;
if (new_policy->cpu && cpu0_policy)
policy->governor = cpu0_policy->governor;
else
policy->governor = new_policy->governor;
if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) {
if (!__cpufreq_governor(policy, CPUFREQ_GOV_START)) {
failed = 0;
Expand Down Expand Up @@ -2120,9 +2166,6 @@ static int cpufreq_cpu_callback(struct notifier_block *nfb,
dev = get_cpu_device(cpu);
if (dev) {

if (action & CPU_TASKS_FROZEN)
frozen = true;

switch (action & ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
__cpufreq_add_dev(dev, NULL, frozen);
Expand Down Expand Up @@ -2189,7 +2232,11 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
cpufreq_driver = driver_data;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);

register_hotcpu_notifier(&cpufreq_cpu_notifier);

get_online_cpus();
ret = subsys_interface_register(&cpufreq_interface);
put_online_cpus();
if (ret)
goto err_null_driver;

Expand All @@ -2212,13 +2259,13 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data)
}
}

register_hotcpu_notifier(&cpufreq_cpu_notifier);
pr_debug("driver %s up and running\n", driver_data->name);

return 0;
err_if_unreg:
subsys_interface_unregister(&cpufreq_interface);
err_null_driver:
unregister_hotcpu_notifier(&cpufreq_cpu_notifier);
write_lock_irqsave(&cpufreq_driver_lock, flags);
cpufreq_driver = NULL;
write_unlock_irqrestore(&cpufreq_driver_lock, flags);
Expand Down
Loading

0 comments on commit 3f5f78a

Please sign in to comment.