Monday, February 15, 2010

Memory Management

MEMORY MANAGEMENT


The memory management subsystem is one of the most important parts of the operating system. Since the early days of computing, there has been a need for more memory than exists physically in a system. Strategies have been developed to overcome this limitation and the most successful of these is virtual memory. Virtual memory makes the system appear to have more memory than it actually has by sharing it between competing processes as they need it.
Virtual memory does more than just make your computer's memory go further. The memory management subsystem provides:

Large Address Spaces
The operating system makes the system appear as if it has a larger amount of memory than it actually has. The virtual memory can be many times larger than the physical memory in the system,
Protection
Each process in the system has its own virtual address space. These virtual address spaces are completely separate from each other and so a process running one application cannot affect another. Also, the hardware virtual memory mechanisms allow areas of memory to be protected against writing. This protects code and data from being overwritten by rogue applications.
Memory Mapping
Memory mapping is used to map image and data files into a processes address space. In memory mapping, the contents of a file are linked directly into the virtual address space of a process.
Fair Physical Memory Allocation
The memory management subsystem allows each running process in the system a fair share of the physical memory of the system,
Shared Virtual Memory
Although virtual memory allows processes to have separate (virtual) address spaces, there are times when you need processes to share memory. For example there could be several processes in the system running the bash command shell. Rather than have several copies of bash, one in each processes virtual address space, it is better to have only one copy in physical memory and all of the processes running bash share it. Dynamic libraries are another common example of executing code shared between several processes.

Shared memory can also be used as an Inter Process Communication (IPC) mechanism, with two or more processes exchanging information via memory common to all of them. Linux supports the Unix TM System V shared memory IPC.

Demand Paging
As there is much less physical memory than virtual memory the operating system must be careful that it does not use the physical memory inefficiently. One way to save physical memory is to only load virtual pages that are currently being used by the executing program. For example, a database program may be run to query a database. In this case not all of the database needs to be loaded into memory, just those data records that are being examined. If the database query is a search query then it does not make sense to load the code from the database program that deals with adding new records. This technique of only loading virtual pages into memory as they are accessed is known as demand paging.
When a process attempts to access a virtual address that is not currently in memory the processor cannot find a page table entry for the virtual page referenced. There is no entry in process X's page table for virtual page frame number 2 and so if process X attempts to read from an address within virtual page frame number 2 the processor cannot translate the address into a physical one. At this point the processor notifies the operating system that a page fault has occurred.

Swapping
If a process needs to bring a virtual page into physical memory and there are no free physical pages available, the operating system must make room for this page by discarding another page from physical memory.
If the page to be discarded from physical memory came from an image or data file and has not been written to then the page does not need to be saved. Instead it can be discarded and if the process needs that page again it can be brought back into memory from the image or data file.
However, if the page has been modified, the operating system must preserve the contents of that page so that it can be accessed at a later time. This type of page is known as a dirty page and when it is removed from memory it is saved in a special sort of file called the swap file. Accesses to the swap file are very long relative to the speed of the processor and physical memory and the operating system must juggle the need to write pages to disk with the need to retain them in memory to be used again.

Physical and Virtual Addressing Modes

It does not make much sense for the operating system itself to run in virtual memory. This would be a nightmare situation where the operating system must maintain page tables for itself. Most multi-purpose processors support the notion of a physical address mode as well as a virtual address mode. Physical addressing mode requires no page tables and the processor does not attempt to perform any address translations in this mode. The Linux kernel is linked to run in physical address space.
The Alpha AXP processor does not have a special physical addressing mode. Instead, it divides up the memory space into several areas and designates two of them as physically mapped addresses. This kernel address space is known as KSEG address space and it encompasses all addresses upwards from 0xfffffc0000000000. In order to execute from code linked in KSEG (by definition, kernel code) or access data there, the code must be executing in kernel mode. The Linux kernel on Alpha is linked to execute from address 0xfffffc0000310000.

Access Control
The page table entries also contain access control information. As the processor is already using the page table entry to map a processes virtual address to a physical one, it can easily use the access control information to check that the process is not accessing memory in a way that it should not.

Caches

If you were to implement a system using the above theoretical model then it would work, but not particularly efficiently. Both operating system and processor designers try hard to extract more performance from the system. Apart from making the processors, memory and so on faster the best approach is to maintain caches of useful information and data that make some operations faster. Linux uses a number of memory management related caches:
Buffer Cache
The buffer cache contains data buffers that are used by the block device drivers.
These buffers are of fixed sizes (for example 512 bytes) and contain blocks of information that have either been read from a block device or are being written to it. The buffer cache is indexed via the device identifier and the desired block number and is used to quickly find a block of data. Block devices are only ever accessed via the buffer cache. If data can be found in the buffer cache then it does not need to be read from the physical block device, for example a hard disk, and access to it is much faster.
Page Cache
This is used to speed up access to images and data on disk.
It is used to cache the logical contents of a file a page at a time and is accessed via the file and offset within the file. As pages are read into memory from disk, they are cached in the page cache.
Swap Cache
Only modified (or dirty) pages are saved in the swap file.
So long as these pages are not modified after they have been written to the swap file then the next time the page is swapped out there is no need to write it to the swap file as the page is already in the swap file. Instead the page can simply be discarded
Hardware Caches
One commonly implemented hardware cache is in the processor; a cache of Page Table Entries. In this case, the processor does not always read the page table directly but instead caches translations for pages as it needs them. These are the Translation Look-aside Buffers and contain cached copies of the page table entries from one or more processes in the system.
Memory Mapping
When an image is executed, the contents of the executable image must be brought into the processes virtual address space. The same is also true of any shared libraries that the executable image has been linked to use. The executable file is not actually brought into physical memory, instead it is merely linked into the processes virtual memory. Then, as the parts of the program are referenced by the running application, the image is brought into memory from the executable image. This linking of an image into a processes virtual address space is known as memory mapping.

Areas of Virtual Memory
Every processes virtual memory is represented by an mm_struct data structure. This contains information about the image that it is currently executing (for example bash) and also has pointers to a number of vm_area_struct data structures. Each vm_area_struct data structure describes the start and end of the area of virtual memory, the processes access rights to that memory and a set of operations for that memory. These operations are a set of routines that Linux must use when manipulating this area of virtual memory. For example, one of the virtual memory operations performs the correct actions when the process has attempted to access this virtual memory but finds (via a page fault) that the memory is not actually in physical memory. This operation is the nopage operation. The nopage operation is used when Linux demand pages the pages of an executable image into memory.

Demand Paging
Once an executable image has been memory mapped into a processes virtual memory it can start to execute. As only the very start of the image is physically pulled into memory it will soon access an area of virtual memory that is not yet in physical memory. When a process accesses a virtual address that does not have a valid page table entry, the processor will report a page fault to Linux.
The page fault describes the virtual address where the page fault occurred and the type of memory access that caused.

Dynamic memory allocation
In computer science, dynamic memory allocation (also known as heap-based memory allocation) is the allocation of memory storage for use in a computer program during the runtime of that program. It can be seen also as a way of distributing ownership of limited memory resources among many pieces of data and code.
Dynamically allocated memory exists until it is released either explicitly by the programmer, or by the garbage collector. This is in contrast to static memory allocation, which has a fixed duration. It is said that an object so allocated has a dynamic lifetime.
Garbage collection
In computer science, garbage collection (GC) is a form of automatic memory management. It is a special case of resource management, in which the limited resource being managed is memory. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program. Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp.
Memory management unit
A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware component responsible for handling accesses to memory requested by the CPU. Its functions include translation of virtual addresses to physical addresses (i.e., virtual memory management), memory protection, cache control, bus arbitration, and, in simpler computer architectures (especially 8-bit systems), bank switching.
Page table
A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are those unique to the accessing process. Physical addresses are those unique to the CPU, i.e., RAM.
Paging
This article is about computer virtual memory. For the wireless communication devices, see pager. Bank switching is also sometimes referred to as paging. Page flipping is also sometimes referred to as paging.
In computer operating systems there are various ways in which the operating system can store and retrieve data from secondary storage for use in main memory. One such memory management scheme is referred to as paging. In the paging memory-management scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. The main advantage of paging is that it allows the physical address space of a process to be noncontiguous.
Virtual memory

The program thinks it has a large range of contiguous addresses; but in reality the parts it is currently using are scattered around RAM, and the inactive parts are saved in a disk file.
Virtual memory is a computer system technique developed at the University of Manchester, which gives an application program the impression that it has contiguous working memory (an address space), while in fact it may be physically fragmented and may even overflow on to disk storage.
Developed for multitasking kernels, virtual memory provides two primary functions:
Each process has its own address space, thereby not required to be relocated nor required to use relative addressing mode.
Each process sees one contiguous block of free memory upon launch. Fragmentation is hidden.


Process Scheduling

PROCESS STATE DIAGRAM



PROCESS SCHEDULING





WHAT IS A PROCESS?

A process can simply be defined as a program in execution. it can be defined as a program currently making use of the processor at any one time. The diagram below shows the various states of a process:
A process can be on any of the following states:
Ready: This is when the process is ready to be run on the processor.
Running: This is when the process is currently making use of the processor.
Blocked: This is when the process is waiting for an input such as user response or data from another process. A process may be in the blocked state if it needs to access a resource.
Other variations of the above named states are:
Ready Suspend: This is when a process is swapped out of a memory by Memory Management system in order to free memory for other process.
Blocked Suspend: This is when a process is swapped out of memory after incurring an O/I wait
Terminate: This is when a process has finished its run.

Summary
  • Only one process at a time is running on the CPU

  • Process gives up CPU:

  • If it starts waiting for an event

  • Otherwise: other processes need fair access

  • OS schedules which ready process to run next

  • Time slice or quantum for each process

  • Scheduling algorithms

  • affect performance

SCHEDULING



Scheduling is a key concept in computer multitasking, multiprocessing operating system and real-time operating system designs. Scheduling refers to the way processes are assigned to run on the available CPUs, since there are typically many more processes running than there are available CPUs. This assignment is carried out by software known as a scheduler or dispatcher.



SCHEDULER



The scheduler is concerned mainly with:
CPU utilization - to keep the CPU as busy as possible.
Throughput - number of processes that complete their execution per time unit.
Turnaround - total time between submission of a process and its completion.
Waiting time - amount of time a process has been waiting in the ready queue. Response Time- amount of time it takes from when a request was submitted until the first response is produced.
Fairness - Equal CPU time to each thread.



Types of schedulers
Operating systems may feature up to 3 distinct types of schedulers: a long-term scheduler (also known as an admission scheduler or high-level scheduler), a mid-term or medium-term scheduler and a short-term scheduler (also known as a dispatcher). The names suggest the relative frequency with which these functions are performed.



1. Long-term Scheduler
The long-term, or admission, scheduler decides which jobs or processes are to be admitted to the ready queue; that is, when an attempt is made to execute a program, its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler. Thus, this scheduler dictates what processes are to run on a system, and the degree of concurrency to be supported at any one time - ie: whether a high or low amount of processes are to be executed concurrently, and how the split between IO intensive and CPU intensive processes is to be handled. In modern OS's, this is used to make sure that real time processes get enough CPU time to finish their tasks. Without proper real time scheduling, modern GUI interfaces would seem sluggish.



2. Mid-term Scheduler
The mid-term scheduler temporarily removes processes from main memory and places them on secondary memory (such as a disk drive) or vice versa. This is commonly referred to as "swapping out" or "swapping in" (also incorrectly as "paging out" or "paging in"). The mid-term scheduler may decide to swap out a process which has not been active for some time, or a process which has a low priority, or a process which is page faulting frequently, or a process which is taking up a large amount of memory in order to free up main memory for other processes, swapping the process back in later when more memory is available, or when the process has been unblocked and is no longer waiting for a resource.



In many systems today (those that support mapping virtual address space to secondary storage other than the swap file), the mid-term scheduler may actually perform the role of the long-term scheduler, by treating binaries as "swapped out processes" upon their execution. In this way, when a segment of the binary is required it can be swapped in on demand, or "lazy loaded".

3. Short-term Scheduler

The short-term scheduler (also known as the CPU scheduler) decides which of the ready, in-memory processes are to be executed (allocated a CPU) next following a clock interrupt, an IO interrupt, an operating system call or another form of signal. Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers - a scheduling decision will at a minimum have to be made after every time slice, and these are very short. This scheduler can be preemptive, implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process, or non-preemptive (also known as "voluntary" or "co-operative"), in which case the scheduler is unable to "force" processes off the CPU.

Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher. The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler. This function involves the following:

  • Switching context

  • Switching to user mode

Jumping to the proper location in the user program to restart that program The dispatcher should be as fast as possible, since it is invoked during every process switch. The time it takes for the dispatcher to stop one process and start another running is known as the dispatch latency.

Scheduling criteria
Different CPU scheduling algorithms have different properties, and the choice of a particular algorithm may favor one class of processes over another. In choosing which algorithm to use in a particular situation, we must consider the properties of the various algorithms. Many criteria have been suggested for comparing CPU scheduling algorithms. Which characteristics are used for comparison can make a substantial difference in which algorithm is judged to be best. The criteria include the following:
1. CPU Utilization. We want to keep the CPU as busy as possible.
2. Throughput. If the CPU is busy executing processes, then work is being done. One measure of work is the number of processes that are completed per time unit, called throughput. For long processes, this rate may be one process per hour; for short transactions, it may be 10 processes per second.
3. Turnaround time. From the point of view of a particular process, the important criterion is how long it takes to execute that process. The interval from the time of submission of a process to the time of completion is the turnaround time. Turnaround time is the sum of the periods spent waiting to get into memory, waiting in the ready queue, executing on the CPU, and doing I/O.
4. Waiting time. The CPU scheduling algorithm does not affect the amount of the time during which a process executes or does I/O; it affects only the amount of time that a process spends waiting in the ready queue. Waiting time is the sum of periods spend waiting in the ready queue.
5. Response time. In an interactive system, turnaround time may not be the best criterion. Often, a process can produce some output fairly early and can continue computing new results while previous results are being output to the user. Thus, another measure is the time from the submission of a request until the first response is produced. This measure, called response time, is the time it takes to start responding, not the time it takes to output the response. The turnaround time is generally limited by the speed of the output device.
It is desirable to maximize CPU utilization and throughput and to minimize turnaround time, waiting time, and response time. In most cases, we optimize the average measure. However, under some circumstances, it is desirable to optimize the minimum or maximum values rather than the average. For example, to guarantee that all users get good service, we may want to minimize the maximum response time. Investigators have suggested that, for interactive systems, it is more important to minimize the variance in the response time than to minimize the average response time. A system with reasonable and predictable response time may be considered more desirable than a system that is faster on the average but is highly variable. However, little work has been done on CPU-scheduling algorithms that minimize variance.

CPU-bound
most of its time doing computation - little I/O
I/O-bound
most of its time doing I/O - little computation
Multilevel scheduling
Classified into different groups :

  • foreground (interactive) vs.
  • background (batch)

each group has its own ready queue


Preemptive Vs Nonpreemptive Scheduling
The Scheduling algorithms can be divided into two categories with respect to how they deal with clock interrupts.

  • Nonpreemptive Scheduling
    A scheduling discipline is nonpreemptive if, once a process has been given the CPU, the CPU cannot be taken away from that process.
    Following are some characteristics of nonpreemptive scheduling
    In nonpreemptive system, short jobs are made to wait by longer jobs but the overall treatment of all processes is fair.
    In nonpreemptive system, response times are more predictable because incoming high priority jobs can not displace waiting jobs.
    In nonpreemptive scheduling, a schedular executes jobs in the following two situations.
    When a process switches from running state to the waiting state.
    When a process terminates.

  • Preemptive Scheduling
    A scheduling discipline is preemptive if, once a process has been given the CPU can taken away.
    The strategy of allowing processes that are logically runable to be temporarily suspended is called Preemptive Scheduling and it is contrast to the "run to completion" method.


Thursday, February 4, 2010

Presentation Data (HPFS & Ext3)

HPFS and Ext3
High Performance File System
HPFS or High Performance File System is a file system created specifically for the OS/2 operating system to improve upon the limitations of the FAT file system. It was written by Gordon Letwin and others at Microsoft and added to OS/2 version 1.2, at that time still a joint undertaking of Microsoft and IBM.

The HPFS file system was first introduced with OS/2 1.2 to allow for greater access to the larger hard drives that were then appearing on the market. Additionally, it was necessary for a new file system to extend the naming system, organization, and security for the growing demands of the network server market. HPFS maintains the directory organization of FAT, but adds automatic sorting of the directory based on filenames. Filenames are extended to up to 254 double byte characters. HPFS also allows a file to be composed of "data" and special attributes to allow for increased flexibility in terms of supporting other naming conventions and security. In addition, the unit of allocation is changed from clusters to physical sectors (512 bytes), which reduce lost disk space.

Under HPFS, directory entries hold more information than under FAT. As well as the attribute file, this includes information about the modification, creation, and access date and times. Instead of pointing to the first cluster of the file, the directory entries under HPFS point to the FNODE. The FNODE can contain the file's data, or pointers that may point to the file's data or to other structures that will eventually point to the file's data.

HPFS attempts to allocate as much of a file in contiguous sectors as possible. This is done in order to increase speed when doing sequential processing of a file.

HPFS organizes a drive into a series of 8 MB bands, and whenever possible a file is contained within one of these bands. Between each of these bands are 2K allocation bitmaps, which keep track of which sectors within a band have and have not been allocated. Banding increases performance because the drive head does not have to return to the logical top (typically cylinder 0) of the disk, but to the nearest band allocation bitmap to determine where a file is to be stored.
Additionally, HPFS includes a couple of unique special data objects:
1. Super Block
The Super Block is located in logical sector 16 and contains a pointer to the FNODE of the root directory. One of the biggest dangers of using HPFS is that if the Super Block is lost or corrupted due to a bad sector, so are the contents of the partition, even if the rest of the drive is fine. It would be possible to recover the data on the drive by copying everything to another drive with a good sector 16 and rebuilding the Super Block. However, this is a very complex task.

2. Spare Block
The Spare Block is located in logical sector 17 and contains a table of "hot fixes" and the Spare Directory Block. Under HPFS, when a bad sector is detected, the "hot fixes" entry is used to logically point to an existing good sector in place of the bad sector. This technique for handling write errors is known as hot fixing. Hot fixing is a technique where if an error occurs because of a bad sector, the file system moves the information to a different sector and marks the original sector as bad. This is all done transparent to any applications that are performing disk I/O (that is, the application never knows that there were any problems with the hard drive). Using a file system that supports hot fixing will eliminate error messages such as the FAT "Abort, Retry, or Fail?"error message that occurs when a bad sector is encountered.

Note: The version of HPFS that is included with Windows NT does not support hot fixing.

Among its improvements are:
  • support for mixed case file names, in different code pages
  • support for long file names (256 characters as opposed to FAT's 8+3 characters)
  • more efficient use of disk space (files are not stored using multiple-sector clusters but on a per-sector basis)
  • an internal architecture that keeps related items close to each other on the disk volume
  • less fragmentation of data
  • extent-based space allocation
  • separate datestamps for last modification, last access, and creation (as opposed to FAT's one last modification datestamp)
  • a B+ tree structure for directories
  • root directory located at the mid-point, rather than beginning of the disk, for faster average access
  • HPFS also can keep 64 KB of metadata ("extended attributes") per file.

IBM offers two kind of IFS drivers for this file system:

  • the standard one with a cache limited to 2 MB
  • HPFS386 provided with the server versions of OS/2

Windows Native Support
Windows 95 and its successors Windows 98, Windows Me can read/write HPFS only when mapped via a network share, but cannot read it from a local disk. They listed the NTFS partitions of networked computers as "HPFS", because NTFS and HPFS share the same filesystem identification number in the partition table.

Windows NT 3.1 and 3.5 have native read/write support for local disks and can even be installed onto an HPFS partition. This is because NT was originally going to be a version of OS/2.

Windows NT 3.51 can also read and write from local HPFS formatted drives. However, Microsoft discouraged using HPFS in Windows NT 4 and in subsequent versions. Microsoft even removed the ability of NT 3.51 to format an HPFS file system. Starting with Windows NT 4 the file system driver pinball.sys enabling the read/write access is not included in a default installation anymore. Pinball.sys is included on the installation media for Windows 2000 and can be manually installed and used with some limitations. Later Windows versions do not ship with this driver.

Microsoft retained rights to OS/2 technologies, including the HPFS file system, after they ceased collaboration. Since Windows NT 3.1 was designed for more rigorous (enterprise-class) use than previous versions of Windows, it included support for HPFS (and NTFS) giving it a larger storage capacity than FAT file systems. However, since HPFS lacks a journal, any recovery after an unexpected shutdown or other error state takes progressively longer as the file system grows. A utility such as CHKDSK would need to scan each entry in the file system to ensure no errors are present, a problem which is vastly reduced on NTFS where the journal is simply replayed.

Advantages of HPFS

  • HPFS is best for drives in the 200-400 MB range.
  • Support for long file names upto 256 characters.
  • Upper and lower case- HPFS preserves case, but it is not case sensitive
  • Native support for EA’s FAT is just too fragile to support this and the workplace spell depends on it heavily.
  • HPFS provides high performance.
  • Much greater integrity: Signature at the beginning of the system structure sectors, forwards and backwards links in fnode trees.
  • Much less fragmentation.

Disadvantages of HPFS

  • Because of the overhead involved in HPFS, it is not a very efficient choice for a volume of under approximately 200 MB. In addition, with volumes larger than about 400 MB, there will be some performance degradation.
  • You cannot set security on HPFS under WindowsNT.
  • HPFS is only supported under Windows NT versions 3.1, 3.5, and 3.51. Windows NT 4.0 cannot access HPFS partitions.

Ext3

The ext3 or third extended file system is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs File system in a 1998 paper and later in a February 1999 kernel mailing list posting, and the file system was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

Journaling results in massively reduced time spent recovering a file system after a crash, and is therefore in high demand in environments where high availability is important, not only to improve recovery times on single machines but also to allow a crashed machine's file system to be recovered on another machine when we have a cluster of nodes with a shared disk.

Advantages
Although its performance (speed) is less attractive than competing Linux file systems such as JFS, ReiserFS and XFS, it has a significant advantage in that it allows in-place upgrades from the ext2 file system without having to back up and restore data. Ext3 also uses less CPU power than ReiserFS and XFS. It is also considered safer than the other Linux file systems due to its relative simplicity and wider testing base.

The ext3 file system adds, over its predecessor:

  • A Journaling file system
  • Online file system growth
  • Htree indexing for larger directories. An HTree is a specialized version of a B-tree (not to be confused with the H tree fractal).

Without these, any ext3 file system is also a valid ext2 file system. This has allowed well-tested and mature file system maintenance utilities for maintaining and repairing ext2 file systems to also be used with ext3 without major changes. The ext2 and ext3 file systems share the same standard set of utilities, e2fsprogs, which includes a fsck tool. The close relationship also makes conversion between the two file systems (both forward to ext3 and backward to ext2) straightforward.

While in some contexts the lack of "modern" file system features such as dynamic inode allocation and extents could be considered a disadvantage, in terms of recoverability this gives ext3 a significant advantage over file systems with those features. The file system metadata is all in fixed, well-known locations, and there is some redundancy inherent in the data structures that may allow ext2 and ext3 to be recoverable in the face of significant data corruption, where tree-based file systems may not be recoverable.

What is a Journaling File system?

A journaling file system keeps a journal or log of the changes that are being made to the file system during disk writing that can be used to rapidly reconstruct corruptions that may occur due to events such a system crash or power outage. The level of journaling performed by the file system can be configured to provide a number of levels of logging depending on your needs and performance requirements.


What are the Advantages of a Journaling File system?

There are a number of advantages to using a journaling files system:


Both the size and volume of data stored on disk drives has grown exponentially over the years. The problem with a non-journaled file system is that following a crash the fsck (file system consistency check) utility has to be run. fsck will scan the entire file system validating all entries and making sure that blocks are allocated and referenced correctly. If it finds a corrupt entry it will attempt to fix the problem. The issues here are two-fold. Firstly, the fsck utility will not always be able to repair damage and you will end up with data in the lost+found directory. This is data that was being used by an application but the system no longer knows where they were reference from. The other problem is the issue of time. It can take a very long time to complete the fsck process on a large file system leading to unacceptable down time.

A journaled file system records information in a log area on a disk (the journal and log do not need to be on the same device) during each write. This is a essentially an "intent to commit" data to the file system. The amount of information logged is configurable and ranges from not logging anything, to logging what is known as the "metadata" (i.e ownership, date stamp information etc), to logging the "metadata" and the data blocks that are to be written to the file. Once the log is updated the system then writes the actual data to the appropriate areas of the file system and marks an entry in the log to say the data is committed.

After a crash the file system can very quickly be brought back on-line using the journal log reducing what could take minutes using fsck to seconds with the added advantage that there is considerably less chance of data loss or corruption.


What is a Journal Checkpoint?

When a file is accessed on the filesystem, the last snapshot of that file is read from the disk into memory. The journal log is then consulted to see if any uncommitted changes have been made to the file since the data was last written to the file (essentially looking for an "intention to commit" in the log entry as described above). At particular points the filesystem will update file data on the disk from the uncommited log entries and trim those entries from the log. Committing operations from the log and synchronizing the log and its associated filesystem is called a checkpoint.

What are the disadvantages of a Journaled Filesystem?

Nothing in life is is free and ext3 and journaled filesystems are no exception to the rule. The biggest draw back of journaling is in the area of performance simply because more disk writes are required to store information in the log. In practice, however, unless you are running system where disk performance is absolutely critical the performance difference will be negligable.

What Journaling Options are Available with the ext3 filesystem?

The ext3 file system provides three options. These are as follows:

Journal (lowest risk)

Both metadata and file contents are written to the journal before being committed to the main file system. Because the journal is relatively continuous on disk, this can improve performance in some circumstances. In other cases, performance gets worse because the data must be written twice - once to the journal, and once to the main part of the file system.

Ordered (medium risk)

Only metadata is journaled; file contents are not, but it's guaranteed that file contents are written to disk before associated metadata is marked as committed in the journal. This is the default on many Linux distributions. If there is a power outage or kernel panic while a file is being written or appended to, the journal will indicate the new file or appended data has not been "committed", so it will be purged by the cleanup process. (Thus appends and new files have the same level of integrity protection as the "journaled" level.) However, files being overwritten can be corrupted because the original version of the file is not stored. Thus it's possible to end up with a file in an intermediate state between new and old, without enough information to restore either one or the other (the new data never made it to disk completely, and the old data is not stored anywhere). Even worse, the intermediate state might intersperse old and new data, because the order of the write is left up to the disk's hardware. XFS uses this form of journaling.

Writeback (highest risk)

Only metadata is journaled; file contents are not. The contents might be written before or after the journal is updated. As a result, files modified right before a crash can become corrupted. For example, a file being appended to may be marked in the journal as being larger than it actually is, causing garbage at the end. Older versions of files could also appear unexpectedly after a journal recovery. The lack of synchronization between data and journal is faster in many cases. JFS uses this level of journaling, but ensures that any "garbage" due to unwritten data is zeroed out on reboot.

Does the Journal log have to be on the same disk as the file system?

No, the ext3 journal log does not have to be on the same physical device as the file system it is logging. On a Red Hat Linux the journal device can be specified using the journal_device= option with the -journal-options command line argument of the tune2fs utility.

Features of ext3

The ext3 file system is essentially an enhanced version of the ext2 file system. These improvements provide the following advantages:

Availability

After an unexpected power failure or system crash (also called an unclean system shutdown), each mounted ext2 file system on the machine must be checked for consistency by the e2fsck program. This is a time-consuming process that can delay system boot time significantly, especially with large volumes containing a large number of files. During this time, any data on the volumes is unreachable.

The journaling provided by the ext3 file system means that this sort of file system check is no longer necessary after an unclean system shutdown. The only time a consistency check occurs using ext3 is in certain rare hardware failure cases, such as hard drive failures. The time to recover an ext3 file system after an unclean system shutdown does not depend on the size of the file system or the number of files; rather, it depends on the size of the journal used to maintain consistency. The default journal size takes about a second to recover, depending on the speed of the hardware.

Data Integrity

The ext3 file system provides stronger data integrity in the event that an unclean system shutdown occurs. The ext3 file system allows you to choose the type and level of protection that your data receives. By default, Red Hat Linux 8.0 configures ext3 volumes to keep a high level of data consistency with regard to the state of the file system.

Speed

Despite writing some data more than once, ext3 has a higher throughput in most cases than ext2 because ext3's journaling optimizes hard drive head motion. You can choose from three journaling modes to optimize speed, but doing so means trade offs in regards to data integrity.

Easy Transition

It is easy to change from ext2 to ext3 and gain the benefits of a robust journaling file system without reformatting.

Why ext3?
Ext3 is forward and backward compatible with ext2, allowing users to keep existing file systems while very simply adding journaling capability. Any user who wishes to un-journal a file system can do so easily (not that we expect many to do so...). Furthermore, an ext3 file system can be mounted as ext2 without even removing the journal, as long as a recent version of e2fsprogs (such as the one included in Red Hat Linux 7.2) is installed.
Ext3 benefits from the long history of fixes and enhancements to the ext2 file system, and will continue to do so. This means that ext3 shares ext2's well-known robustness, but also that as new features are added to ext2, they can be carried over to ext3 with little difficulty. When, for example, extended attributes or HTrees are added to ext2, it will be relatively easy to add them to ext3. (The extended attributes feature will enable things like access control lists; HTrees make directory operations extremely fast and highly scalable to very large directories.)
Ext3, like ext2, has a multi-vendor team of developers who develop it and understand it well; its development does not depend on any one person or organization.
Ext3 provides and makes use of a generic journaling layer (jbd) which can be used in other contexts. ext3 can journal not only within the file system, but also to other devices, so as NVRAM devices become available and supported under Linux, ext3 will be able to support them.
Ext3 has multiple journaling modes. It can journal all file data and metadata (data=journal), or it can journal metadata but not file data (data=ordered or data=writeback). When not journaling file data, you can choose to write file system data before metadata (data=ordered; causes all metadata to point to valid data), or not to handle file data specially at all (data=writeback; file system will be consistent, but old data may appear in files after an unclean system shutdown). This gives the administrator the power to make the tradeoff between speed and file data consistency, and to tune speed for specialized usage patterns.
Ext3 has broad cross-platform compatibility, working on 32- and 64- bit architectures, and on both little-endian and big-endian systems. Any system (currently including many Unix clones and variants, BeOS, and Windows) capable of accessing files on an ext2 file system will also be able to access files on an ext3 file system.
Ext3 does not require extensive core kernel changes and requires no new system calls, thus presenting Linus Torvalds no challenges that would effecitvely prevent him from integrating ext3 into his official Linux kernel releases. Ext3 is already integrated into Alan Cox's -ac kernels, slated for migration to Linus's official kernel soon.
The e2fsck file system recovery program has a long and proven track record of successful data recovery when software or hardware faults corrupt a file system. ext3 uses this same e2fsck code for salvaging the file system after such corruption, and therefore it has the same robustness against catastrophic data loss as ext2 in the presence of data-corruption faults.

Size limits
Ext3 has a maximum size for both individual files and the entire filesystem. For the filesystem as a whole that limit is 232 blocks. Both limits are dependent on the block size of the filesystem; the following chart summarizes the limits:

Block size
Max file size
Max filesystem size





1 KB
16 GB
2 TB

2 KB
256 GB
8 TB

4 KB
2 TB
16 TB

8 KB
2 TB
32 TB



Disadvantages
Functionality
Since ext3 aims to be backwards compatible with the earlier ext2, many of the on-disk structures are similar to those of ext2. Because of that, ext3 lacks a number of features of more recent designs, such as extents, dynamic allocation of inodes, and block sub allocation. There is a limit of 31998 sub-directories per one directory, stemming from its limit of 32000 links per inode.

ext3, like most current Linux filesystems, cannot be fsck-ed while the filesystem is mounted for writing. Attempting to check a file system that is already mounted may detect bogus errors where changed data has not reached the disk yet, and corrupt the file system in an attempt to "fix" these errors.

Defragmentation
There is no online ext3 defragmentation tool that works on the filesystem level. An offline ext2 defragmenter, e2defrag, exists but requires that the ext3 filesystem be converted back to ext2 first. But depending on the feature bits turned on in the filesystem, e2defrag may destroy data; it does not know how to treat many of the newer ext3 features.

There are userspace defragmentation tools like Shake and defrag. Shake works by allocating space for the whole file as one operation, which will generally cause the allocator to find contiguous disk space. It also tries to write files used at the same time next to each other. Defrag works by copying each file over itself. However they only work if the filesystem is reasonably empty. A true defragmentation tool does not exist for ext3.

That being said, as the Linux System Administrator Guide states, "Modern Linux filesystem(s) keep fragmentation at a minimum by keeping all blocks in a file close together, even if they can't be stored in consecutive sectors. Some filesystems, like ext3, effectively allocate the free block that is nearest to other blocks in a file. Therefore it is not necessary to worry about fragmentation in a Linux system."

While ext3 is more resistant to file fragmentation than the FAT filesystem, nonetheless ext3 filesystems can get fragmented over time or on specific usage patterns, like slowly-writing large files. Consequently the successor to the ext3 filesystem, ext4, includes a filesystem defragmentation utility and support for extents (contiguous file regions).

Recovery
There is no support of deleted file recovery in file system design. Ext3 driver actively deletes files by wiping file inodes for crash safety reasons. That's why accidental 'rm -rf ...' may cause permanent data loss.

There are still several techniques and some commercial software like UFS Explorer Standard Recovery version 4 for recovery of deleted or lost files using file system journal analysis; however, they do not guarantee any specific file recovery.

There is no chance of file recovery after file system format.

Compression
Support for transparent compression is available as an unofficial patch for ext3. This patch is a direct port of e2compr and still needs further development, it compiles and boots well with upstream kernels but journaling is not implemented yet. The current patch is named e3compr.

No checksumming in journal
Ext3 does not do checksumming when writing to the journal. If barrier=1 is not enabled as a mount option (in /etc/fstab), and if the hardware is doing out-of-order write caching, one runs the risk of severe filesystem corruption during a crash.

Consider the following scenario: If hard disk writes are done out-of-order (due to modern hard disks caching writes in order to amortize write speeds), it is likely that one will write a commit block of a transaction before the other relevant blocks are written. If a power failure or unrecoverable crash should occur before the other blocks get written, the system will have to be rebooted. Upon reboot, the file system will replay the log as normal, and replay the "winners" (transactions with a commit block, including the invalid transaction above which happened to be tagged with a valid commit block). The unfinished disk write above will thus proceed, but using corrupt journal data. The file system will thus mistakenly overwrite normal data with corrupt data while replaying the journal. There is a test program available to trigger the problematic behavior. If checksums had been used, where the blocks of the "fake winner" transaction were tagged with a mutual checksum, the file system could have known better and not replayed the corrupt data onto the disk. Journal checksumming has been added to EXT4.

EXT3 distribution

The EXT3 filesystem patch distributions and design papers are available from ftp://ftp.kernel.org/pub/linux/kernel/people/sct/ext3

Alternately, these materials are available from ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/

The EXT3 author and maintainer, Stephen Tweedie, may be reached at sct@redhat.com

Wednesday, February 3, 2010

Windows XP

Windows XP
Windows XP is an operating system produced by Microsoft for use on personal computers, including home and business desktops, laptops, and media centers. It was released in 2001. The name "XP" is short for "eXPerience".
Windows XP is the successor to both Windows 2000 Professional and Windows Me, and is the first consumer-oriented operating system produced by Microsoft to be built on the Windows NT kernel and architecture. Windows XP was first released on October 25, 2001, and over 400 million copies were in use in January 2006, according to an estimate in that month by an IDC analyst. It was succeeded by Windows Vista, which was released to volume license customers on November 8, 2006, and worldwide to the general public on January 30, 2007. Direct OEM and retail sales of Windows XP ceased on June 30, 2008. Microsoft continued to sell XP through their System Builders (smaller OEMs who sell assembled computers) program until January 31, 2009. XP may continue to be available as these sources run through their inventory or by purchasing Windows Vista Ultimate or Business and then downgrading to Windows XP
.

FEATURES
Windows XP introduced several new features to the Windows line, including:
Faster start-up and hibernation sequences
The ability to discard a newer device driver in favor of the previous one (known as driver rollback), should a driver upgrade not produce desirable results
A new, arguably more user-friendly interface, including the framework for developing themes for the desktop environment
Fast user switching, which allows a user to save the current state and open applications of their desktop and allow another user to log on without losing that information
The ClearType font rendering mechanism, which is designed to improve text readability on Liquid Crystal Display (LCD) and similar monitors .

  • Built on the new Windows engine
  • Enhanced device driver verifier
  • Dramatically reduced reboot scenarios
  • Improved code protection
  • Side-by-side DLL support
  • Windows File Protection
  • Windows Installer
  • Enhanced software restriction policies
  • Preemptive multitasking architecture
  • Scalable memory and processor support
  • Encrypting File System (EFS) with multi-user support
  • IP Security (IPSec)
  • Kerberos support
  • Smart card support
  • Internet Explorer Add-on Manager
  • Windows Firewall
  • Windows Security Center
  • Attachment Manager
  • Data Execution Prevention
  • Windows Firewall Exception List
  • Windows Firewall Application and Port Restrictions
  • Fresh visual design

UPDATED FEATURES

1. GDI+ powered graphics architecture

With the introduction of Windows XP, GDI was deprecated in favor of its successor, the C++ based GDI+ subsystem. GDI+ adds anti-aliased 2D graphics, textures, floating point coordinates, gradient shading, more complex path management, intrinsic support for modern graphics-file formats like JPEG and PNG, and support for composition of affine transformations in the 2D view pipeline. GDI+ uses ARGB values to represent color.

2. Start menu and Taskbar

With Windows XP, the taskbar and the Start button have been updated to support Fitt's law. To help the user access a wider range of common destinations more easily from a single location, the Start menu was expanded to two columns; the left column focuses on the user's installed applications, while the right column provides access to the user's documents, and system links which were previously located on the desktop. Links to the My Documents, My Pictures and other special folders are brought to the fore. The My Computer and My Network Places (Network Neighborhood in Windows 95 and 98) icons were also moved off the Desktop and into the Start menu, making it easier to access these icons while a number of applications are open.

3. Windows Explorer

There are significant changes made to Windows Explorer in Windows XP, both visually and functionally. Microsoft focused especially on making Windows Explorer more discoverable and task-based, as well as adding a number of features to reflect the growing use of a computer as a “digital hub”.

4. Task pane and navigation pane
The task pane is displayed on the left side of the window instead of the traditional folder tree view when the navigation pane is turned off. It presents the user with a list of common actions and destinations that are relevant to the current directory or file(s) selected. For instance, when in a directory containing mostly pictures, a set of “Picture tasks” is shown, offering the options to display these pictures as a slide show, to print them, or to go online to order prints.

5. SEARCH

Microsoft introduced animated “Search Companions” in an attempt to make searching more engaging and friendly; the default character is a puppy named Rover, with three other characters (Merlin the magician, Earl the surfer, and Courtney) also available. These search companions powered by Microsoft Agent technology, bear a great deal of similarity to Microsoft Office’s Office Assistants, even incorporating “tricks” and sound effects. However, search companion can be turned off and the user can revert to using classic search.