2009年4月16日星期四

Suspend slackware in DELL1501

This is not true! It seems this is a kernel regression. I will post more info after the problem is identified!

Many users say that a Linux box should be shutdown as less time as possible. Because system will buffer the disk for you so the second-start-up time of an app will be much shorter. The longer time you run, the more quantity and more accurate buffer system will allocate, the faster your app will launch. If you shutdown you computer, all the buffer will gone. But I have to pay for my electricity charge! I want save power when I do not use my box. Unfortunately, Hibernate won't help because system will free "useless" pages to optimize load time.(I think) But Suspend is will keep the buffers and consume less energy. This is tested by other people.

But I suffered a disaster after resume from suspend. System goes totally unstable after resuming. I got something like this in dmesg:

[ 7962.125970] BUG: unable to handle kernel paging request at f76f5004
[ 7962.125984] IP: [] ext3_check_dir_entry+0x19/0x140
[ 7962.126000] *pde = 00007067 *pte = 77520002 
[ 7962.126010] Oops: 0000 [#1] SMP 
[ 7962.126017] last sysfs file: /sys/power/state
[ 7962.126024] Modules linked in: radeon drm vboxnetflt vboxdrv snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ipv6 nls_cp936 vfat fat ext4 jbd2 crc16 fuse dell_laptop dcdbas b43 mac80211 joydev ricoh_mmc sdhci_pci cfg80211 led_class shpchp ohci_hcd sg input_polldev ati_agp agpgart video output
[ 7962.126075] 
[ 7962.126082] Pid: 6509, comm: pm-powersave Not tainted (2.6.29.1-slk-based-2 #4) Inspiron 1501 
[ 7962.126089] EIP: 0060:[] EFLAGS: 00010292 CPU: 1
[ 7962.126097] EIP is at ext3_check_dir_entry+0x19/0x140
[ 7962.126102] EAX: c0463684 EBX: f76f5000 ECX: f76f5000 EDX: f6c5a478
[ 7962.126108] ESI: f6846a00 EDI: 00002000 EBP: f6c5a478 ESP: f6785cec
[ 7962.126113]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 7962.126120] Process pm-powersave (pid: 6509, ti=f6784000 task=f7031090 task.ti=f6784000)
[ 7962.126126] Stack:
[ 7962.126129]  f7781000 d9e5ad68 f6c35040 c01e6c4d 00000000 00000000 f6785dec f6785dec
[ 7962.126141]  f6785e80 c0463684 00000000 f76f5000 f6846a00 00002000 f6e246c0 c01e86f3
[ 7962.126153]  f6e246c0 00002000 f66c2cc0 f6ba1400 00000000 c01e5a07 0000000b d9c5fbb8
[ 7962.126166] Call Trace:
[ 7962.126171]  [] dx_probe+0x8d/0x350
[ 7962.126180]  [] ext3_find_entry+0x403/0x650
[ 7962.126189]  [] ext3_truncate+0x317/0x8e0
[ 7962.126197]  [] __ext3_get_inode_loc+0xda/0x2f0
[ 7962.126205]  [] __wake_up+0x3e/0x60
[ 7962.126215]  [] ext3_lookup+0x46/0xf0
[ 7962.126222]  [] d_alloc+0xf7/0x170
[ 7962.126233]  [] do_lookup+0x1ba/0x1e0
[ 7962.126240]  [] __link_path_walk+0x675/0xd90
[ 7962.126247]  [] generic_file_aio_read+0x2fa/0x6d0
[ 7962.126257]  [] do_page_fault+0x28c/0x6a0
[ 7962.126266]  [] path_walk+0x54/0xb0
[ 7962.126273]  [] do_path_lookup+0xb0/0x160
[ 7962.126280]  [] getname+0x96/0xd0
[ 7962.126287]  [] user_path_at+0x5a/0x90
[ 7962.126295]  [] vfs_stat_fd+0x22/0x60
[ 7962.126305]  [] sys_stat64+0xf/0x30
[ 7962.126313]  [] syscall_call+0x7/0xb
[ 7962.126321] Code: ff ff 83 c4 0c 5b c3 90 90 90 90 90 90 90 90 90 90 90 83 ec 3c 89 6c 24 38 89 5c 24 2c 89 d5 89 74 24 30 89 7c 24 34 89 44 24 24 <0f> b7 41 04 3d ff ff 00 00 74 74 83 f8 0b 89 c6 0f 8f c1 00 00 
[ 7962.126379] EIP: [] ext3_check_dir_entry+0x19/0x140 SS:ESP 0068:f6785cec
[ 7962.126390] ---[ end trace a30eca9c02841218 ]---
Some process has been terminated for some reason that I don't know. Even syslogd at worst case! Firefox won't launch, su don't work, if you click icons on panel, plasma will freeze...

After some exploring, I see these lines lay in /usr/doc/pm-utils-1.2.5/README.SLACKWARE

==============================================================================

KNOWN ISSUES

****  If you encounter either of these, mail rworkman@slackware.com ****

If your alsa drivers don't correctly save and restore state across a sleep /
resume cycle (due to a buggy driver), then you will need to add the drivers
to a custom file named /etc/pm/config.d/defaults (create the file if it does
not exist already) in a variable named "SUSPEND_MODULES" - see the file at
/usr/lib/pm-utils/defaults for proper format.

The /usr/lib/pm-utils/sleep.d/90clock does not run by default.  It added
over a second to suspend, and the vast majority of hardware does not need it
to keep the clocks in sync.  If you need this hook, you can set the
NEED_CLOCK_SYNC environment variable in a custom /etc/pm/config.d/defaults
file.

==============================================================================
So I tried to set NEED_CLOCK_SYNC=1 in /etc/pm/config.d/defaults . Suspend, go to a sleep, Resume. And everything works well! Angles sing, light suddenly fills the dorm ;)

What /usr/lib/pm-utils/sleep.d/90clock actually done is here:

#!/bin/sh
# Synchronize system time with hardware time.
# TODO: Do modern kernels handle this correctly?  If so, we should detect that
#       and skip this hook.

. "${PM_FUNCTIONS}"

suspend_clock()
{
        /sbin/hwclock --systohc >/dev/null 2>&1 0<&1
}

resume_clock()
{
        /sbin/hwclock --hctosys >/dev/null 2>&1 0<&1
}

[ "$NEED_CLOCK_SYNC" ] || exit $NA

case "$1" in
        hibernate|suspend) suspend_clock ;;
        thaw|resume) resume_clock ;;
        *) exit $NA ;;
esac


A LQ post that I created to discuss this issue is here