HANA News Blog

HANA systems: Linux swappiness

Jens Gleichmann • 24. März 2024

Page Priority

################################

German version (scroll down for English)

################################

Seit SLES15 SP4 und RHEL8 (auch bei anderen Linux Distributionen) hat sich das Verhalten beim Reclaim geändert (Details)

Vorher: vm.swappiness Default 60 [Range: 0 - 100]

Nachher: vm.swappiness Default 60 [Range: 0 - 200]


2 Arten von Pages:

  • anonymous page: dynamische Laufzeitdaten wie stack und heap
  • FS (Filesystem) page: Payload wie Applikationsdaten, shared libs etc.


Mit einem Wert von 100 wären damit anonymous pages und FS pages gleichgewichtet. Mit 0 - wie bisher auch - wird das swap Verhalten deaktiviert. Je höher also der Wert gesetzt wird, desto höher gewichtet man die anonymous pages. Es wird immer "kalte" Pages geben, welche einmalig in den Speicher geladen wurden und bei denen es Sinn macht sie früher auszulagern bevor der Speicher tatsächlich mal knapp wird.


Details

Früher (vor SLES11 SP3) wurden die anonymous pages nicht in das Verhalten miteinbezogen. Seit dem hatten man eine 1:1 Gewichtung der Kosten (file-cached pages : anonymous pages) eingeführt. Es wurde immer davon ausgegangen die file pages häufiger zu scannen und auszulagern als die anonymous pages, da es als recht teuer und damit als störend empfunden wurde. Es wurde also über Zeit erkannt, dass es in manchen Situationen doch Sinn macht dieses Verhalten zu ändern, daher hat man den Mechanismus granularer gestaltet.


Summary

Am Ende des Tages, wer hätte das gedacht, ist der neue Algorithmus effektiver und sinnvoller, sorgt aber dafür dass es zu höherem Paging Verhalten führt. Das macht aber nichts, da die Pages die ausgelagert werden tatsächlich mehr als "kalt" sind. Es muss sich also keiner Sorgen machen der bei gleichem Workload und Sizing nun erhöhtes Swapaufkommen bemerkt. Dazu gibt es auch neue metriken in vmstat mit denen sich das Verhalten monitoren lässt.


Empfehlung

Wenn es also viel I/O im System anliegt und der Anteil von FS pages gering ist, kann eine Erhöhung des vm.swappiness Parameters die Performance positiv beeinflussen. Daher empfehle ich den Parameter bei SLES15 SP4+ _nicht_ auf 0 oder 1 zu setzen. Tests haben gezeigt, dass sich mit dem Wert von 10 am besten mit HANA und dem damit verbunden Workload arbeiten lässt, allerdings kann auch ein höherer Wert dienlich sein (10 - 45 aus meinen Tests), da es darauf ankommt welche Third-Party Tools auf dem System zusätzlich laufen (AV, backup, monitoring etc.). Dies kann nur mit Tests und längeren Analysen beantwortet werden.


Fazit

Bleibt aber die Frage nach dem richtigen Monitoring offen. Früher hat man alarmiert sobald swap space genutzt wurde, da man davon aus ging, dass ein Speicherengpass vorliegt. Diese Frage muss sich nun jeder selbst stellen und für sich beantworten. Welche Metriken wurden dafür benutzt? Ab welchen Schwellwert alarmiere ich anhand der neuen Metriken? Kann mein Monitoring Tool diese neuen Metriken auslesen? Muss ich mir eine custom Lösung bauen? All das ist abhängig von der aktuellen Monitoringlösung.





################################

English version

################################

Since SLES15 SP4 and RHEL8(also in other Linux distributions) the behavior of reclaim has changed (details)

Before: vm.swappiness Default 60 [Range: 0 - 100]

After: vm.swappiness Default 60 [Range: 0 - 200]


2 types of pages:

  • anonymous page: dynamic runtime data such as stack and heap
  • FS (Filesystem) page: Payload such as application data, shared libs etc.



With a value of 100, anonymous pages and FS pages would be weighted equally. With 0 - as before - the swap behavior is deactivated. The higher the value is set, the higher the anonymous pages are weighted. There will always be "cold" pages that have been loaded into memory once and for which it makes sense to swap them out earlier before the memory actually runs out.


Details


Previously (before SLES11 SP3) anonymous pages were not included in the behavior. Since then, a 1:1 cost weighting (file-cached pages: anonymous pages) has been introduced. It was always assumed that the file pages would be scanned and outsourced more frequently than the anonymous pages, as it was perceived as quite expensive and therefore annoying. Over time it was recognized that in some situations it makes sense to change this behavior, so the mechanism was made more granular.


Summary


At the end of the day, who would have thought, the new algorithm is more effective and sensible, but ensures that it leads to higher paging behavior. But that doesn't matter because the pages that are being swapped out are actually more than "cold". So no one has to worry about noticing increased swap volumes with the same workload and sizing. There are also new metrics in vmstat that can be used to monitor behavior.


Recommendation


So if there is a lot of I/O in the system and the proportion of FS pages is low, increasing the vm.swappiness parameter can have a positive effect on performance. I therefore recommend not setting the parameter to 0 or 1 for SLES15 SP4+. Tests have shown that a value of 10 is best for working with HANA and the associated workload, although a higher value can also be useful (10 - 45 from my tests), as it depends on which third-party tools are used also run on the system (AV, backup, monitoring, etc.). This can only be answered with tests and longer analyses.


Conclusion


However, the question of the correct monitoring remains open. In the past, an alarm was raised as soon as swap space was used because it was assumed that there was a memory bottleneck. Everyone has to ask themselves this question and answer it for themselves. What metrics were used for this? At what threshold do I alert based on the new metrics? Can my monitoring tool read these new metrics? Do I have to build a custom solution? All of this depends on the current monitoring solution.


SAP HANA News by XLC

SAP HANA NSE - a technical deepdive with Q&A
von Jens Gleichmann 6. Januar 2025
SAP NSE was introduced with HANA 2.0 SPS04 and based on a similar approach like data aging. Data aging based on a application level approach which has a side effect if you are using a lot of Z-coding. You have to use special BADI's to access the correct data. This means you have to adapt your coding if you are using it for Z-tables or using not SAP standard functions for accessing the data in your Z-coding. In this blog we will talk about the technical aspects in more detail.
The SAP Enterprise Cloud Services Private Cloud Customer Center (PC3) - a new digital delivery
von Jens Gleichmann 5. Januar 2025
The SAP Enterprise Cloud Services Private Cloud Customer Center (PC3) - a new digital delivery engagement model dedicated to manage service delivery for RISE with SAP S/4HANA Cloud, private edition customers.
Proactive maintenance for SAP RISE will start now in 2025
von Jens Gleichmann 5. Januar 2025
Proactive maintenance for SAP RISE will start now in 2025 with minor tasks like updating SPAM/SAINT and ST-PI / ST-A/PI. For those companies which are familiar with frequent maintenance windows, they are good to have such time frames to hold the systems up-to-date and secure. However, for larger companies where such frequent maintenance windows are not common because every minute of downtime is costly and may only really be necessary once, the situation is quite different.
Dynamic Aging for NSE - combined with Threshold and Interval option
von Jens Gleichmann 28. Dezember 2024
Dynamic Aging makes it possible to automatically manage at which point in time older partitions can be moved to the 'warm' data store. The data in a new OTHERS partition is 'hot' data, that is, stored in memory with the load-unit attribute implicitly set to COLUMN LOADABLE. As an extension of the Dynamic Range Partitioning feature Dynamic Aging makes it possible to automatically manage when older partitions can be moved to the 'warm' data store (Native Storage Extension) with the load-unit attribute for the partition set to PAGE LOADABLE. Warm data is then stored on disk and only loaded to memory when required. Dynamic Aging can be used with both THRESHOLD mode (defining a maximum row count number in partition OTHERS) and INTERVAL mode (defining a maximum time or other numeric interval between each new partition). For example, for a partitioned table which is managed by dynamic partitioning and containing date/time information, you can specify an age limit (for example six months) so that when data in an ol
automatic maintenance of the 'others' partition
von Jens Gleichmann 28. Dezember 2024
You can create partitions with a dynamic others partition by including the DYNAMIC keyword in the command when you create the partition, this can be used with either a THRESHOLD value to define a maximum row count number or an INTERVAL value which can be used to define a maximum time or other numeric 'distance' value. The partition can be either a single level or a second level RANGE partition and dynamic ranges can be used with both balanced and heterogeneous partitioning scenarios.
HANA Range Partitioning details
von Jens Gleichmann 23. Dezember 2024
For heterogeneous partitioning schemas Dynamic Range Partitioning is available to support the automatic maintenance of the 'others' partition. When you create an OTHERS partition there is a risk that over time it could overflow and require further maintenance. Using the dynamic range feature the others partition is monitored by a background job and will be automatically split into an additional range partition when it reaches a predefined size threshold. The background job also checks for empty partitions and if a range partition is found to be empty it is automatically merged to neighboring empty partitions (the others partition is never automatically deleted).
A success story regarding BW/4HANA and different data tiering and optimization methodes.
von Jens Gleichmann 20. Dezember 2024
A success story regarding BW/4HANA and different data tiering and optimization methodes. 1) Removed overhead in key attributes which reduced the PK size (often more than 50% of the overall table size) 2) optimized the partitioning design 3) used NSE for write optimized ADSOs 4) introduced NSE for several ADSOs 5) optimized usage of inverted individual indexes
ACDOCA table growth - how to handle it
von Jens Gleichmann 10. Dezember 2024
ACDOCA table growth - how to handle it in a S/4HANA system
HANA 2.0 SPS08 Roadmap
von Jens Gleichmann 6. Dezember 2024
SAP HANA 2.0 SPS08 Roadmap and features Q4 2024
Partitioning process
von Jens Gleichmann 26. November 2024
SAP HANA scaling and tuning with proper partitioning designs
more
Share by: