I'm using Ubuntu 20.04, with i3 as the window manager but the gnome subsystem installed.
I have a problem that at some random interval (hours or days) after a reboot, any attempt to open thunar or nautilus, or any attempt by an application (such as Firefox) to open a file dialog either takes a minute or more to open, or just times out forever and requires me to kill the application.
How can I find out which application, file, extension or driver is behind this delay?
I've tried the following:
- Running as root - the applications are fast as normal.
- Disable tracker (as per here)
- libgoa seems up to date (as per this)
- Tried the pulseaudio hack here
- Running nautilus with strace; there's a 30-second gap with the below lines, but I'm not sure how to interpret it (fd=24 is some file descriptor?)
journalctlshows the below when (e.g.) nautilus finally works
I have several machines with very similar setups, only one has this behaviour. I can't figure out what application or library is the culprit.
Output of strace -t -o /tmp/nautilus nautilus:
20:15:32 write(24, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:32 poll([{fd=24, events=POLLIN}], 1, 0) = 1 ([{fd=24, revents=POLLIN}])
20:15:32 read(24, "\1\0\0\0\0\0\0\0", 16) = 8
20:15:32 write(24, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:32 futex(0x56241fa9c880, FUTEX_WAKE_PRIVATE, 2147483647) = 0
20:15:32 close(24) = 0
20:15:32 eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 24
20:15:32 write(24, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:32 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:32 futex(0x56241ef26dc0, FUTEX_WAKE_PRIVATE, 1) = 1
20:15:32 futex(0x56241ef26910, FUTEX_WAKE_PRIVATE, 1) = 1
20:15:32 futex(0x56241ef1c038, FUTEX_WAKE_PRIVATE, 1) = 1
20:15:32 poll([{fd=24, events=POLLIN}], 1, 25000) = 1 ([{fd=24, revents=POLLIN}])
20:15:32 read(24, "\1\0\0\0\0\0\0\0", 16) = 8
20:15:32 poll([{fd=24, events=POLLIN}], 1, 25000) = 0 (Timeout)
20:15:57 write(24, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:57 futex(0x56241fa9c880, FUTEX_WAKE_PRIVATE, 2147483647) = 0
20:15:57 close(24) = 0
From the journal:
Apr 20 20:32:35 aubrey dbus-daemon[1180]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.134190' (uid=1000 pid=1450204 comm="nautilus " label="unconfined")
Apr 20 20:32:35 aubrey systemd[1]: Starting Hostname Service...
Apr 20 20:32:35 aubrey dbus-daemon[1180]: [system] Successfully activated service 'org.freedesktop.hostname1'
Apr 20 20:32:35 aubrey systemd[1]: Started Hostname Service.
Apr 20 20:32:35 aubrey nautilus[1450204]: Called "net usershare info" but it failed: Failed to execute child process “net” (No such file or directory)
Apr 20 20:32:45 aubrey ddclient[1450366]: WARNING: file /var/cache/ddclient/ddclient.cache, line 3: Invalid Value for keyword 'ip' =
Edit:
The above trace is just the output of strace -t -o /tmp/nautilus nautilus - a very verbose log, but with tail -f I'm able to inspect it at the point where it hangs, which is where I see an entry such as 20:15:32 poll([{fd=7, events=POLLIN}], 1, 25000) = 0 (Timeout) followed by 30+ seconds of nothing (then it starts and is very verbose again).
The fd reported by lsof, if I'm reading it right, looks like:
nautilus 1719429 coljac 7u a_inode 0,13 0 17491 [eventfd]
The top entries from strace -c nautilus are as follows (yes, poll is number one):
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
22.70 0.011333 2 3988 poll
16.55 0.008265 1 5018 4490 stat
9.31 0.004647 2 1966 131 futex
7.73 0.003859 1 2119 1385 recvmsg
7.33 0.003660 1 3612 1441 openat
4.76 0.002375 2 1038 436 access
4.29 0.002140 1 1424 read
3.60 0.001799 1 1356 105 readlink
3.50 0.001748 0 1868 mmap
3.30 0.001646 8 188 getdents64
3.14 0.001568 0 2232 close
2.38 0.001188 0 1758 fstat
2.14 0.001067 0 1150 munmap
1.99 0.000993 2 337 writev
1.67 0.000833 92 9 clone
1.57 0.000782 2 367 write
1.47 0.000733 3 190 mprotect
Output from lsof | wc -l: 1340334
Edit 2:
Currently, thunar won't open at all - it fails with the message Failed to register: Timeout was reached. The output from strace -c thunar is as follows:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
50.53 0.002814 3 882 792 openat
28.21 0.001571 4 354 mmap
4.79 0.000267 2 96 read
3.75 0.000209 2 90 fstat
3.54 0.000197 2 96 close
3.48 0.000194 1 109 mprotect
1.87 0.000104 5 18 poll
1.45 0.000081 0 165 1 futex
1.31 0.000073 2 28 write
...
If I run strace -t thunar I also see this in the output before it hangs:
futex(0x564445b9ff70, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=7, events=POLLIN}], 1, 25000) = 1 ([{fd=7, revents=POLLIN}])
read(7, "\1\0\0\0\0\0\0\0", 16) = 8
poll([{fd=7, events=POLLIN}], 1, 25000
lsof -p <pid> returns the following:
thunar 2036483 coljac 7u a_inode 0,13 0 17491 [eventfd]
Some more outout from strace thunar, which fails to open, is:
20:29:32 read(7, "\1\0\0\0\0\0\0\0", 16) = 8
20:29:32 poll([{fd=7, events=POLLIN}], 1, 25000) = 0 (Timeout)
20:29:58 write(7, "\1\0\0\0\0\0\0\0", 8) = 8
20:29:58 futex(0x5566ad906ba0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
20:29:58 close(7) = 0
20:29:58 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
20:29:58 futex(0x5566ad8e1f70, FUTEX_WAKE_PRIVATE, 1) = 1
20:29:58 futex(0x5566ad8e1b90, FUTEX_WAKE_PRIVATE, 1) = 1
20:29:58 futex(0x5566ad8d9058, FUTEX_WAKE_PRIVATE, 1) = 1
20:29:58 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/charset.alias", O_RDONLY) = -1 ENOENT (No such file or directory)
20:29:58 write(2, "Failed to register: Timeout was "..., 40) = 40
20:29:58 poll([{fd=4, events=POLLIN}], 1, 0) = 1 ([{fd=4, revents=POLLIN}])
20:29:58 read(4, "\1\0\0\0\0\0\0\0", 16) = 8
20:29:58 futex(0x7f0dccfb72a0, FUTEX_WAKE_PRIVATE, 1) = 1
20:29:58 poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
Further strace outputs are here on pastebin.