24

I'm using Ubuntu 20.04, with i3 as the window manager but the gnome subsystem installed.

I have a problem that at some random interval (hours or days) after a reboot, any attempt to open thunar or nautilus, or any attempt by an application (such as Firefox) to open a file dialog either takes a minute or more to open, or just times out forever and requires me to kill the application.

How can I find out which application, file, extension or driver is behind this delay?

I've tried the following:

  • Running as root - the applications are fast as normal.
  • Disable tracker (as per here)
  • libgoa seems up to date (as per this)
  • Tried the pulseaudio hack here
  • Running nautilus with strace; there's a 30-second gap with the below lines, but I'm not sure how to interpret it (fd=24 is some file descriptor?)
  • journalctl shows the below when (e.g.) nautilus finally works

I have several machines with very similar setups, only one has this behaviour. I can't figure out what application or library is the culprit.

Output of strace -t -o /tmp/nautilus nautilus:

20:15:32 write(24, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:32 poll([{fd=24, events=POLLIN}], 1, 0) = 1 ([{fd=24, revents=POLLIN}])
20:15:32 read(24, "\1\0\0\0\0\0\0\0", 16) = 8
20:15:32 write(24, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:32 futex(0x56241fa9c880, FUTEX_WAKE_PRIVATE, 2147483647) = 0
20:15:32 close(24)                      = 0
20:15:32 eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 24
20:15:32 write(24, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:32 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:32 futex(0x56241ef26dc0, FUTEX_WAKE_PRIVATE, 1) = 1
20:15:32 futex(0x56241ef26910, FUTEX_WAKE_PRIVATE, 1) = 1
20:15:32 futex(0x56241ef1c038, FUTEX_WAKE_PRIVATE, 1) = 1
20:15:32 poll([{fd=24, events=POLLIN}], 1, 25000) = 1 ([{fd=24, revents=POLLIN}])
20:15:32 read(24, "\1\0\0\0\0\0\0\0", 16) = 8
20:15:32 poll([{fd=24, events=POLLIN}], 1, 25000) = 0 (Timeout)
20:15:57 write(24, "\1\0\0\0\0\0\0\0", 8) = 8
20:15:57 futex(0x56241fa9c880, FUTEX_WAKE_PRIVATE, 2147483647) = 0
20:15:57 close(24)                      = 0

From the journal:

Apr 20 20:32:35 aubrey dbus-daemon[1180]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.134190' (uid=1000 pid=1450204 comm="nautilus " label="unconfined")
Apr 20 20:32:35 aubrey systemd[1]: Starting Hostname Service...
Apr 20 20:32:35 aubrey dbus-daemon[1180]: [system] Successfully activated service 'org.freedesktop.hostname1'
Apr 20 20:32:35 aubrey systemd[1]: Started Hostname Service.
Apr 20 20:32:35 aubrey nautilus[1450204]: Called "net usershare info" but it failed: Failed to execute child process “net” (No such file or directory)
Apr 20 20:32:45 aubrey ddclient[1450366]: WARNING:  file /var/cache/ddclient/ddclient.cache, line 3: Invalid Value for keyword 'ip' =

Edit:

The above trace is just the output of strace -t -o /tmp/nautilus nautilus - a very verbose log, but with tail -f I'm able to inspect it at the point where it hangs, which is where I see an entry such as 20:15:32 poll([{fd=7, events=POLLIN}], 1, 25000) = 0 (Timeout) followed by 30+ seconds of nothing (then it starts and is very verbose again).

The fd reported by lsof, if I'm reading it right, looks like:

nautilus 1719429 coljac 7u a_inode 0,13 0 17491 [eventfd]

The top entries from strace -c nautilus are as follows (yes, poll is number one):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 22.70    0.011333           2      3988           poll
 16.55    0.008265           1      5018      4490 stat
  9.31    0.004647           2      1966       131 futex
  7.73    0.003859           1      2119      1385 recvmsg
  7.33    0.003660           1      3612      1441 openat
  4.76    0.002375           2      1038       436 access
  4.29    0.002140           1      1424           read
  3.60    0.001799           1      1356       105 readlink
  3.50    0.001748           0      1868           mmap
  3.30    0.001646           8       188           getdents64
  3.14    0.001568           0      2232           close
  2.38    0.001188           0      1758           fstat
  2.14    0.001067           0      1150           munmap
  1.99    0.000993           2       337           writev
  1.67    0.000833          92         9           clone
  1.57    0.000782           2       367           write
  1.47    0.000733           3       190           mprotect

Output from lsof | wc -l: 1340334

Edit 2:

Currently, thunar won't open at all - it fails with the message Failed to register: Timeout was reached. The output from strace -c thunar is as follows:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.53    0.002814           3       882       792 openat
 28.21    0.001571           4       354           mmap
  4.79    0.000267           2        96           read
  3.75    0.000209           2        90           fstat
  3.54    0.000197           2        96           close
  3.48    0.000194           1       109           mprotect
  1.87    0.000104           5        18           poll
  1.45    0.000081           0       165         1 futex
  1.31    0.000073           2        28           write
...

If I run strace -t thunar I also see this in the output before it hangs:

futex(0x564445b9ff70, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=7, events=POLLIN}], 1, 25000) = 1 ([{fd=7, revents=POLLIN}])
read(7, "\1\0\0\0\0\0\0\0", 16)         = 8
poll([{fd=7, events=POLLIN}], 1, 25000

lsof -p <pid> returns the following:

thunar  2036483 coljac    7u  a_inode               0,13        0    17491 [eventfd]

Some more outout from strace thunar, which fails to open, is:

20:29:32 read(7, "\1\0\0\0\0\0\0\0", 16) = 8
20:29:32 poll([{fd=7, events=POLLIN}], 1, 25000) = 0 (Timeout)
20:29:58 write(7, "\1\0\0\0\0\0\0\0", 8) = 8
20:29:58 futex(0x5566ad906ba0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
20:29:58 close(7)                       = 0
20:29:58 write(6, "\1\0\0\0\0\0\0\0", 8) = 8
20:29:58 futex(0x5566ad8e1f70, FUTEX_WAKE_PRIVATE, 1) = 1
20:29:58 futex(0x5566ad8e1b90, FUTEX_WAKE_PRIVATE, 1) = 1
20:29:58 futex(0x5566ad8d9058, FUTEX_WAKE_PRIVATE, 1) = 1
20:29:58 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/charset.alias", O_RDONLY) = -1 ENOENT (No such file or directory)
20:29:58 write(2, "Failed to register: Timeout was "..., 40) = 40
20:29:58 poll([{fd=4, events=POLLIN}], 1, 0) = 1 ([{fd=4, revents=POLLIN}])
20:29:58 read(4, "\1\0\0\0\0\0\0\0", 16) = 8
20:29:58 futex(0x7f0dccfb72a0, FUTEX_WAKE_PRIVATE, 1) = 1
20:29:58 poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)

Further strace outputs are here on pastebin.

  • Also as a new user (also slow): here
  • As root (not slow): here
Coljac
  • 642
  • 2
  • 6
  • 23

6 Answers6

33

I had the same problem for years because I have uptimes of at least a month quite often. After a while, all programs using the system-supplied file open dialog will seemingly hang or wait for many minutes.

You can run the following command to kill gvfsd-trash, after which all programs hanging and waiting for the file dialog should immediately resume working.

pkill gvfsd-trash

Another thing to note is that gvfsd-trash gets restarted after a while, probably by caja or any other file browser or by the system file open dialog itself. This is not the first problem I had with gvfsd-trash and I begin to bear a grudge, however, I don't want to uninstall gvfsd, which I need for mounting USB thumb drives and my smartphone over MTP. So, I opted for the brute-force solution and made only the gvfsd-trash binary inaccessible, e.g., by renaming it:

sudo mv /usr/libexec/gvfsd-trash{,.bak}

It might be in another location on different systems, try asking your packet manager, e.g., by calling:

dpkg -L gvfs-daemons | grep trash

Beware of the implications. Without gvfsd-trash you won't be able to access the special trash:/// URI path via your file browser! Personally, I never used that anyway. You can also access deleted files manually in the .Trash folders, which exist for each mount point and in your home, e.g.:

  • ~/.local/share/Trash
  • /media/mounted-external-harddrive/.Trash

It looks like the problem is a combination of gvfsd-trash and D-Bus. There seems to be a fitting issue issue in the gvfs bugtracker, which has already been fixed but has not been rolled out to my system, yet.

For years, I used a different solution, namely starting each program in a new DBus session by prefixing the most important programs like Firefox and text editors with dbus-launch. However, that comes with its own shares of problems because each session will spawn at least five gvfsd processes and possible others and the D-Bus session will not close by closing the program opened in it and the number of D-Bus sessions in total is limited, so after a while, you won't be able to start programs.

mxmlnkn
  • 1,470
  • 1
  • 12
  • 10
3

Diagnosis

I guess you are showing an excerpt from you strace log, pointing to where the lag occurs. It would be useful if you posted the full strace command you used.

Actions that may lead to the culprit:

  1. While an straced nautilus is running, in another terminal use

    $ pidof nautilus
    $ lsof -p <pidno>
    

    where <pidno> is the PID returned by the previous command. That lets you inspect which file descriptors (answer to your question: Yes) the strace log refers to (source). You might want to pipe the output with less since it is typically long.

  2. Option -c can further help identifying bottleneck processes (source), yours is probably poll. From the man strace page:

    -c
    --summary-only
                Count time, calls, and errors for each system call and report a summary on program exit, suppressing the  regular
                output.  This attempts to show system time (CPU time spent running in the kernel) independent of wall clock time.
                If -c is used with -f, only aggregate totals for all traced processes are kept.
    

    This is an example of what the output looks like

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     18,29    0,009128           4      2058           mmap
     15,19    0,007580           7       989         1 poll
     11,96    0,005967           2      2465       318 openat
      6,76    0,003374           1      2187           close
    ...
    
  3. Check other instances of delayed nautilus and thunar launching, to confirm a similar pattern.

  4. Apply the above, and any further diagnostic commands, to a comparison with what a root instance (which you mention it works fine..., I guess you meant ... at times where with a normal user the delays already appeared) produces.

  5. Consider the problem might be related to availability of file descriptors, or the like, see this.

  6. Please post the complete output of

    $ bash -c 'time strace -fc nautilus'
    $ bash -c 'time strace -fc thunar'
    $ bash -c 'strace -fc time nautilus'
    $ bash -c 'strace -fc time thunar'
    
  7. Please mention whether you moved your home dir, if you had a different Ubuntu version and updated, etc., as it may be relevant. Also, try creating a temporary user to see if the problem shows up. This would be an additional proof to comparing with root.

    EDIT a: The comparison between what you get with users <you>, tester and root, for bash -c 'time strace -fc nautilus', shows the following essential difference in terms of what could be spending the ~30sec at launching:

    <you>:

    (org.gnome.Nautilus:2641915): Tracker-WARNING **: 18:30:21.999: Falling back 
    to bus backend, the direct backend failed to initialize: Could not find datab
    ase file:'/home/coljac/.cache/tracker/meta.db'.
    

    tester:

    (org.gnome.Nautilus:2976790): dbind-WARNING **: 20:45:38.814: Couldn't register with accessibility bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
    strace: Process 2976919 attached
    strace: Process 2976920 attached
    

    ** (org.gnome.Nautilus:2976790): WARNING **: 20:46:04.102: Unable to get contents of the bookmarks file: Error opening file /home/tester/.gtk-bookmarks: No such file or directory

    root:

    ** (org.gnome.Nautilus:2980894): WARNING **: 21:00:42.191: Unable to get contents of the bookmarks file: Error opening file /root/.gtk-bookmarks: No such file or directory
    

    ** (org.gnome.Nautilus:2980894): WARNING **: 21:00:42.191: Unable to get contents of the bookmarks file: Error opening file /root/.gtk-bookmarks: No such file or directory strace: Process 2980907 attached Nautilus-Share-Message: 21:00:42.241: Called "net usershare info" but it failed: Failed to execute child process “net” (No such file or directory)

    So, tester vs. root shows a warning that could help with diagnosis. The strange thing is that the same warning does not show for <you>.

Solution

I would wait until you post more info after the diagnosis suggested. In the meantime, I give a (nuke?) workaround. I am assuming your "several machines with very similar setups" are a sort of clones (at least in the beginning) of each other. If so, and you don't find a solution (after the diagnosis), and this bothers you enough, you might try cloning again from one of your working systems, with clonezilla, e.g.

Related

  1. https://www.linuxquestions.org/questions/linux-server-73/strace-question-poll-taking-a-long-time-on-an-open-command-939714/
  2. Nautilus does not start: .cache/tracker/meta.db not found
3

Thanks for the answers and clues here.

I believe I solved this by just changing

AutoMount=true to AutoMount=false in /usr/share/gvfs/mounts/trash.mount

I logged out and logged in and it was still ok. Had also uninstalled following packages

xdg-desktop-portal 
xdg-desktop-portal-gtk

as suggested in this Reddit post.

Zanna
  • 72,312
Krish
  • 31
1

There are a couple of items that might cause Nautilus to slow down. I've seen this happen on notebooks that are used primarily for artwork and contained — quite literally — millions of thumbnail files sitting in ~/.cache/thumbnails. Scrubbing the directory had an immediate impact on the speed of launching Nautilus (not to mention freeing up quite a bit of storage space).

Another item to look at would be anything in ~/.local/share/nautilus that might be causing an issue, such as looking up a network share that is no longer available.

As a last-ditch effort, if you have a ~/.cache/dconf/user file, you can see if there's a configuration problem in the dconf settings for your user account. The simplest way to test this would be to rename ~/.cache/dconf/user to something like ~/.cache/dconf/user-old and then restart the shell by pressing Alt+F2 to bring up a command input, type r, then hit Enter. If the problem persists, you can return the renamed file to its proper place.

Hopefully something here will help you out.

matigo
  • 24,752
  • 7
  • 50
  • 79
1

Following mxmlnkn excellent answer, I've implemented a simple oneshot systemd service that kills gvfdsd-trash and prevents it from ever running again. This is useful when an unattended upgrade reinstall /usr/libexec/gvfsd-trash..

/usr/local/bin/clean-gvfsd-trash :

$ sudo tee /usr/local/bin/clean-gvfsd-trash << EOF > /dev/null
#!/bin/bash
pkill gvfsd-trash
path=$(dpkg -L gvfs-daemons | grep gvfsd-trash|xargs realpath|uniq)
mv $path{,.bak} || true
EOF

$ sudo chmod u+x /usr/local/bin/clean-gvfsd-trash

/lib/systemd/system/clean-gvfsd-trash.service :

$ sudo tee /lib/systemd/system/clean-gvfsd-trash.service << EOF > /dev/null
[Unit]
Description=kill gvfsd-trash and prevent it from ever running again
Documentation=https://askubuntu.com/questions/1341909/file-browser-and-file-dialogs-take-a-long-time-to-open-or-fail-to-open-in-all-ap

[Service] ExecStart=/usr/local/bin/clean-gvfsd-trash Type=oneshot RemainAfterExit=no

[Install] WantedBy=multi-user.target EOF

Service activation:

$ sudo systemctl daemon-reload
$ sudo systemctl enable clean-gvfsd-trash.service
$ sudo systemctl start clean-gvfsd-trash.service
Nicolas
  • 111
0

I was seeing the same symptoms after an upgrade to 24.04 - launching the file manager and certain applications took about 26 seconds in almost all cases. This seemed to be related to some kind of DBUS timeout, as described here.

I tried solutions based on disabling or removing gvfsd, but these had no effect.

I found this in my journalctl output:

Apr 18 10:17:17 timdesk dbus-daemon[3347]: [session uid=1000 pid=3347] Activating service name='org.gnome.Nautilus' requested by ':1.37' (uid=1000 pid=3692 comm="/usr/bin/gnome-shell" label="unconfined")
Apr 18 10:17:17 timdesk nautilus[13787]: Connecting to org.freedesktop.Tracker3.Miner.Files
Apr 18 10:17:17 timdesk dbus-daemon[3347]: [session uid=1000 pid=3347] Successfully activated service 'org.gnome.Nautilus'
Apr 18 10:17:17 timdesk dbus-daemon[3347]: [session uid=1000 pid=3347] Activating via systemd: service name='org.freedesktop.portal.Desktop' unit='xdg-desktop-portal.service' requested by ':1.181' (uid=1000 pid=13787 comm="/usr/bin/nautilus --gapplication-service" label="nautilus (unconfined)")
Apr 18 10:17:17 timdesk systemd[3318]: Starting xdg-desktop-portal.service - Portal service...
Apr 18 10:17:17 timdesk dbus-daemon[3347]: [session uid=1000 pid=3347] Activating via systemd: service name='org.freedesktop.impl.portal.desktop.gnome' unit='xdg-desktop-portal-gnome.service' requested by ':1.182' (uid=1000 pid=13808 comm="/usr/libexec/xdg-desktop-portal" label="unconfined")
Apr 18 10:17:17 timdesk systemd[3318]: graphical-session.target - Current graphical user session is inactive.
Apr 18 10:17:17 timdesk systemd[3318]: Dependency failed for xdg-desktop-portal-gnome.service - Portal service (GNOME implementation).
Apr 18 10:17:17 timdesk systemd[3318]: xdg-desktop-portal-gnome.service: Job xdg-desktop-portal-gnome.service/start failed with result 'dependency'.

Which led to this thread: https://bbs.archlinux.org/viewtopic.php?id=285590

After removing xdg-desktop-portal-gnome.service, my launch times for the file manager, document viewer, image viewer and other apps all returned to normal (less than 1 second to launch).

sudo apt remove xdg-desktop-portal-gnome
Tim Bird
  • 111