4

I recently installed Ubuntu 16.10 and since then Ubuntu reboots itself. the output of: last | grep "Oct 31" is:

aegefel  tty7         :0               Mon Oct 31 15:15    gone - no logout
reboot   system boot  4.8.0-26-generic Mon Oct 31 15:14   still running
aegefel  tty7         :0               Mon Oct 31 15:02 - down   (00:04)
reboot   system boot  4.8.0-26-generic Mon Oct 31 15:02 - 15:06  (00:04)
aegefel  tty7         :0               Mon Oct 31 14:33 - crash  (00:28)
reboot   system boot  4.8.0-26-generic Mon Oct 31 14:33 - 15:06  (00:33)
aegefel  tty7         :0               Mon Oct 31 14:12 - crash  (00:20)
reboot   system boot  4.8.0-26-generic Mon Oct 31 14:12 - 15:06  (00:54)
aegefel  tty7         :0               Mon Oct 31 13:08 - crash  (01:04)
reboot   system boot  4.8.0-26-generic Mon Oct 31 13:08 - 15:06  (01:58)

Which leads me to believr it's caused by a crash

I don't know what cause this but it happened when I tried to see a movie or when I did a backup

How should I proceed?

EDIT 1

The command more /var/log/syslog* gives me:

Nov  6 18:18:17 aegefel-Akoya-E6424-MD99850 gnome-terminal-[2674]: Allocating size to GtkBox 0x55558d2b47b0 without calling gtk_widget_get_preferred_width/height(). How does the code know the size to allocate?
Nov  6 18:18:17 aegefel-Akoya-E6424-MD99850 gnome-terminal-[2674]: Allocating size to GtkBox 0x55558d2b47b0 without calling gtk_widget_get_preferred_width/height(). How does the code know the size to allocate?
Nov  6 18:18:31 aegefel-Akoya-E6424-MD99850 gnome-terminal-[2674]: Allocating size to GtkBox 0x55558d2b4120 without calling gtk_widget_get_preferred_width/height(). How does the code know the size to allocate?
Nov  6 18:18:31 aegefel-Akoya-E6424-MD99850 gnome-terminal-[2674]: Allocating size to GtkBox 0x55558d2b4120 without calling gtk_widget_get_preferred_width/height(). How does the code know the size to allocate?
Nov  6 18:18:36 aegefel-Akoya-E6424-MD99850 systemd[1]: Starting Stop ureadahead data collection...
Nov  6 18:18:36 aegefel-Akoya-E6424-MD99850 systemd[1]: Started Stop ureadahead data collection.

Then nothing happened during almost 1 minute, so I suppose the pc rebooted.

The command ls -alt /var/crash gives me for today:

total 21672
drwxrwsrwt  2 root     whoopsie     4096 Nov  6 14:26 .
-rwxrwxrwx  1 root     whoopsie        0 Nov  6 14:26 .lock

EDIT 2

This append only when my CPU is used at 40% - 50% or more (My CPU is an Intel Core i5 6267U 2.9GHz)

EDIT 3

The command sensors gives me the following:

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +37.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:         +34.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:         +36.0°C  (high = +100.0°C, crit = +100.0°C)

acpitz-virtual-0
Adapter: Virtual device
temp1:        +38.0°C  (crit = +98.0°C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +35.0°C  

The high temperature is equal to the critical. Maybe my laptop just overheat and the fan don't have the time to lower the temperature. I tried to lower the high temperature but this automatically lower the critical (the critical must be equal to the high)

EDIT 4

Here you have

And here are the crashes from the 20 november

EDIT 5

After some test, I think the problem is a GPU overheating. In fact, my laptop reboot only when I try to watch a movie, when I tested with some free games on my Laptop or when I used the Unreal Engine 4. The reason my PC didn't reboot with Blender is that Blender use, by default, the CPU (not the GPU). I have an Intel Iris Graphics 550 (Skylake GT3e) Any idea ?

Aegefel
  • 336

2 Answers2

2

If you are truly concerned about the rebooting due to kernel panics as the title of your post suggests, you can check the file /etc/sysctl.conf for a directive similar to kernel.panic = n where n is some number that indicates how many seconds to delay before rebooting in the even of a kernel panic. Research indicates that it's not supposed to reboot by default.

If instead, as I suspect you are more concerned with determining the root cause of these reboots (some hardware related failure is my opinion) you'll want to review the Machine check events in order to determine what hardware is malfunctioning. If you don't have the file /var/log/mcelog You may need to install the mcelog package by enabling the Universe repository (if not already enabled in your sources) and issuing the command sudo apt install mcelog Then moving forward these events will be logged to /var/log/mcelog

For clarity here's an excerpt from the man mcelog

X86  CPUs  report  errors  detected  by the CPU as machine check events
       (MCEs).  These can be data corruption detected in the  CPU  caches,  in
       main memory by an integrated memory controller, data transfer errors on
       the front side bus or CPU interconnect or other internal errors.   Pos‐
       sible  causes can be cosmic radiation, instable power supplies, cooling
       problems, broken hardware, or bad luck.

       Most errors can be corrected by the CPU by  internal  error  correction
       mechanisms. Uncorrected errors cause machine check exceptions which may
       panic the machine.

More information on the mcelog file format can be found here

Linux systems don't typically reboot due to kernel panic by default so you may widh to check the file /etc/sysctl.conf mentioned previously.

Sources:

http://www.techrepublic.com/blog/linux-and-open-source/auto-reboot-linux-after-a-kernel-panic/

http://packages.ubuntu.com

"mce: [Hardware Error]: Machine check events logged" appears in syslog. What should I do?

http://mcelog.org/logfile.html

Based on your mcelog, CPU's 1 and 3 in your system are overheating. throttling down, cooling off and throttling back up (all this is by design to protect the CPU from overheating). The root cause could be a poorly applied thermal compound between the CPU and heatsink, a loose heatsink, blocked vents, or overly dusty or failing cooling equipment (fan?). Another (unlikely) possibility is a failure in the thermal detection capabilities of the CPU.

Elder Geek
  • 36,752
1

The title of this topic is not clear.

Anyway, if you need an help to investigate on your system crash, and all previous comments were not useful, try these:

  1. Increase kernel log verbosity.
  2. Stop the kernel to automatically reboot with a crash/panic.
  3. Try to remotely login (e.g. ssh) in your system and check the logs.
  4. as @user.dz stated, use e.g. memtest86+ from http://www.memtest.org/ to deeply check your RAM.
  5. Because you said "...This append only when my CPU is used at 40% - 50% or more...", could be a PSU issue? I mean your system requires more power than PSU can give to it.
d a i s y
  • 5,551