Finally, you have to arrange for your system to start smartd as part of the system start-up process. This part depends on which distribution you use. On my Gentoo boxes, I use:. Now that the suite is installed, and the daemon is running, we should find the occational log entry in our syslog.
Here's what I see when I restart smartd:. Dec 1 localhost smartd[]: smartd received signal Terminated Dec 1 localhost smartd[]: smartd is exiting exit status 0 Dec 1 localhost smartd[]: smartd version 5.
Adding to "monitor" list. This is what a successful smartd start looks like. As long as the daemon is running, we'll see log entries indicating if the health status of my drive changes. But we don't always read our logs. That's OK, the smartmontools suite has a command-line tool that you can use interactively to find out how healthy your drives are.
For example, we can use smartctl to find out what type of drive we have:. Of course, most of this isn't really interesting and only serves to expose just how old my system really is. In the future, I'll strip out the heading information for further output But we can also use smartctl to ask drive about it's general state of health:. What it's telling you is that the IDE drive controller has detected problems. I got a passing result on a drive I knew to be bad.
However, this command doesn't immediately return any results. It simply tells you to come back late and ask for the results:. Testing has begun. Please wait 2 minutes for test to complete.
Test will complete after Mon Dec 1 Here is where you see that the drive in my workstation is doing just fine. An offline uncorrectable sector is a disk sector which was not readable during an off-line scan or a self-test.
This is important to know, because if you have data stored in this disk sector, and you need to read it, the read will fail. Please see the previous '-C' option for more details. To disable any of the 3 reports, set the corresponding limit to 0. Trailing zero arguments may be omitted.
By default, all temperature reports are disabled '-W 0'. This can be changed to Attribute 9 or by the drive database or by the '-v' directive, see below. The arguments to this Directive are exclusive, so that only the final Directive given is used. The valid values are: none - Assume that the device firmware obeys the ATA specifications. This is the default, unless the device has presets for '-F' in the device database.
Enabling this option tells smartd to evaluate these quantities in byte-reversed order. Some signs that your disk needs this option are 1 no self-test log printed, even though you have run self-tests; 2 very large numbers of ATA errors reported in the ATA error log; 3 strange and impossible values for the ATA error log timestamps.
Enabling this option tells smartd to evaluate this quantity in byte-reversed order. If this directive is specified, smartd will not skip the next scheduled self-test see Directive '-s' above in this case. Note that an explicit '-F' Directive will over-ride any preset values for '-F' see the '-P' option below.
This Directive may appear multiple times. Valid arguments to this Directive are: 9,minutes - Raw Attribute number 9 is power-on time in minutes. Here X is hours, and Y is minutes in the range inclusive. Y is always printed with two digits, for example '06' or '31' or '00'. Here X is hours, Y is minutes in the range inclusive, and Z is seconds in the range inclusive.
Y and Z are always printed with two digits, for example '06' or '31' or '00'. This format is used by some Samsung disks. The first is the number of load cycles. The second is the number of unload cycles. The difference between these two values is the number of times that the drive was unexpectedly powered off also called an emergency unload. As a rule of thumb, the mechanical stress created by one emergency unload is equivalent to that created by one hundred normal unloads.
This is primarily useful for the -P presets Directive. This may be useful for decoding the meaning of the Raw value. The form for example ',raw8' only prints the Raw value for Attribute in this form. N,raw16 - Print the Raw value of Attribute N as three bit unsigned base integers. The form for example ',raw16' only prints the Raw value for Attribute in this form.
N,raw48 - Print the Raw value of Attribute N as a bit unsigned base integer. The form for example ',raw48' only prints the Raw value for Attribute in this form. The valid arguments to this Directive are: use - use any presets that are available for this drive. Note that -a is the default for ATA devices. If none of these other Directives is given, then -a is assumed. Comment: ignore the remainder of the line. If you are not sure which Directives to use, I suggest experimenting for a few minutes with smartctl to see what SMART functionality your disk s support s.
If you do not like voluminous syslog messages, a good choice of smartd configuration file Directives might be: -H -l selftest -l error -f. If you want more frequent information, use: -a. It will send one email warning per device for any problems that are found. It warns all users about a disk problem, waits 30 seconds, and then powers down the machine. The remainder is flushed. The way in which the Raw values are printed, and the names under which the Attributes are reported, is governed by the various '-v Num,Description' Directives described previously.
Please see the smartctl manual page for further explanation of the differences between Normalized and Raw Attribute values. Use the smartctl utility to investigate. On Cygwin and Windows, the log messages are written to the event log or to a file. Upon startup, the smartd service changes the working directory to its own installation path. If smartd.
The debug mode '-d', '-q onecheck' does not work if smartd is running as service. The service can be controlled as usual with Windows commands 'net' or 'sc' ' net start smartd ', ' net stop smartd '. Pausing the service ' net pause smartd ' sets the interval between disk checks '-i N' to infinite. Continuing the paused service ' net continue smartd ' resets the interval and rereads the configuration file immediately like SIGHUP : Continuing a still running service ' net continue smartd ' without preceding ' net pause smartd ' does not reread configuration but checks disks immediately like SIGUSR1.
Due to a bug in the tzset 3 function of many unix standard C libraries, the time-zone stamps of smartd might not change. The work-around fails if the time-zone is set using the ' TZ ' variable or a file that it points to. Please report this problem to smartmontools-support lists. This should never happen.
It must be due to either a coding or compiler bug. Please report such failures to smartmontools-support lists. The exit status is then plus the signal number. It extends these to cover ATA-5 disks. Runs smartd in "debug" mode. Sets the interval between disk checks to N seconds, where N is a decimal integer. Tag Description [1]. Modify the script that starts smartd to include the smartd command-line argument '-l local3'. Do not fork into background; this is useful when executed from modern init methods like initng, minit or supervise.
Specifies when, if ever, smartd should exit. Intended primarily to help smartmontools developers understand the behavior of smartmontools on non-conforming or poorly-conforming hardware. Cygwin and Windows only: Enables smartd to run as a Windows service.
There should be one device listed per line, although you may have lines that are entirely comments or white space. Any text following a hash sign ' ' and up to the end of the line is taken to be a comment, and ignored.
Note: a line whose first character is a hash sign ' ' is treated as a white-space blank line, not as a non-existent line, and will end a continuation line. Specifies the type of the device. This 'nocheck' Directive is used to prevent a disk from being spun-up when it is periodically polled by smartd. Enables or disables Attribute Autosave when smartd starts up and has no further effect.
Here: Tag Description. Tag Description T. These Directives modify the behavior of the smartd email warnings enabled with the '-m' email Directive described above.
The environment variables exported by smartd are: Tag Description. The possible values that it takes and their meanings are:. For example:. Check for 'failure' of any Usage Attributes.
Report anytime that a Prefail Attribute has changed its value since the last check, 30 minutes ago. Report anytime that a Usage Attribute has changed its value since the last check, 30 minutes ago. Equivalent to turning on the two previous flags '-p' and '-u'. Ignore device Attribute ID when tracking changes in the Attribute values. When tracking, report the Raw value of Attribute ID along with its normally reported Normalized value. When tracking, report whenever the Raw value of Attribute ID changes.
Report if the current temperature had changed by at least DIFF degrees since last report. To track temperature changes of at least 2 degrees, use: -W 2.
Modifies the labeling for Attribute N, for disks which use non-standard Attribute definitions. Specifies whether smartd should use any preset options that are available for this drive. Equivalent to turning on all of the following Directives: '-H' to check the SMART health status, '-f' to report failures of Usage rather than Prefail Attributes, '-t' to track changes in both Prefailure and Usage Attributes, '-l selftest' to report increases in the number of Self-Test Log errors, '-l error' to report increases in the number of ATA errors, '-C ' to report nonzero values of the current pending sector count, and '-U ' to report nonzero values of the offline pending sector count.
Continuation character: if this is the last non-white or non-comment character on a line, then the following line is a continuation of the current one. Config file does not exist only returned in conjunction with the '-c' option. A compile time constant of smartd was too small.
But the really concerning thing was that I should have noticed the consequences in some logs. But, since I became a almost full-time employee I have been too lazy to work regularly with specific filter-options of systemd's command "journalctl" - so I missed the relevant messages within the flood of messages in the journal.
Time to reactivate rsyslog to get specific information into specific files - see below. The attentive admin thus gets a clear hint that his intention to watch disk health variables is not working as expected - even if the "smartd. I changed the line breaks a bit to stress the last statement.
To make things a bit more interesting, let us look at a different system MySYS which uses a combination of 3ware controlled hard-disk arrays and mdadm controlled SSD-arrays. Off-topic: The server host discussed above does not require a particular high disk throughput. Storage capacity in TB is more important. Too much money for a too low performance. The general impact on an i7 or i9 on the overall performance is negligible in my experience.
Especially, when you have a lot of fast RAM. In addition mdadm gives you much! But this is another story For disks attached to a 3ware-controller we need a special form of the directives in the "smartd. You may find something directly suitable for your purposes there. What about devices which are members of mdadm-controlled SW-Raid arrays? Without any special options! Actually, folks familiar with "mdadm" would have expected this for very basic reasons But this is yet another story We just accept the fact as one example ofthe power of Linux.
Disclaimer: Never copy the statements and later discussed smartctl-commands above without a thorough study of the literature and documentation on smart, smartctl and smartd. You have to find out about the correct settings for your system-configuration on your own.
Some options may even lead to data loss on older Samsung disks.
0コメント