What’s the best way to monitor and log which processes are responsible for high system load throughout the day? Tools like top and htop only provide immediate values, but I’m looking for a solution that offers historical data to identify the main culprits over time.

@sysadmin

#sysadmin #linux #server

1 point
*

I like zabbix. It can monitor what ever i like, using snmp, ipmi, rest apis or its own agent.

I have a team member insisting on using netdata, but outside of the nice dashboard it doesn’t provide anything. It is local only, and setting up alarms is a pain. And tbh it nags more than canonical stuff

permalink
report
reply
16 points
*

Netdata is excellent, simple and I believe FOSS. Just install locally and it should start logging pretty much everything.

permalink
report
reply
6 points

Clicked the link, started reading … closed the window when I read “Netdata also incorporates A.I. insights for all monitored data”.

permalink
report
parent
reply
9 points
*

Eesh. Yeah, that’s a nope from me, dawg.

Actually, it’s all self-hosted. Granted, I haven’t looked at the code in detail, but building NNs to help efficiently detect and capture stuff is actually a very appropriate use of ML. This project looks kinda cool.

permalink
report
parent
reply
1 point

Machine Learning might be marketed as “all fine and dandy”, but I’m not planning on running a monitor training system loose on my production server under any circumstances.

Not to mention that for it to be useful I’d have to give it at least a year of logs, which is both impossible and pointless, since the system running a year ago is not remotely the same as the one running today, even if not a single piece of our own code changed, which of course it did, the OS, applications and processes have been continually updated by system updates and security patches.

So, no.

permalink
report
parent
reply
3 points
*

this limited scope ML trained analysis is actually where “AI” excels, e.g. “computer vision” in specific medical scenarios

permalink
report
parent
reply
1 point

If the training data is available, yes, in this case, no chance.

permalink
report
parent
reply
3 points

I run this in a Docker container on my home network without connecting it to their cloud platform (despite their - increasingly strident, it feels - “encouragements” to do so). It’s very powerful, and the majority of low level configuration is done via text files. But 99% of it is automatic.

The UI is unique. It’s a single, long and scrollable page, which may be an issue for some.

There are other tools out there, too. I previously used one that integrates Grafana, Prometheus and Node Exporter, which is more complex to set up and configure.

permalink
report
parent
reply
7 points

atop should be available in your package manager and run as a daemon. It stores the history in /var/

permalink
report
reply
7 points

I like to use atop at the first step during investigation : https://www.atoptool.nl/

permalink
report
reply
5 points

In my time we used sar. I feel old when reading about all your new tools I never heard of.

permalink
report
reply

Sysadmin

!sysadmin@lemmy.world

Create post

A community dedicated to the profession of IT Systems Administration

No generic Lemmy issue posts please! Posts about Lemmy belong in one of these communities:
!lemmy@lemmy.ml
!lemmyworld@lemmy.world
!lemmy_support@lemmy.ml
!support@lemmy.world

Community stats

  • 116

    Monthly active users

  • 189

    Posts

  • 1.6K

    Comments