White II FAQ

Table of Contents

0.0 AN INTRODUCTION TO WHITE II AND THE SUPPORTED LNS LINUX COMPUTING ENVIRONMENT:

Welcome to the White II FAQ! The intent of this document is to answer recurring questions to CS staff members about the new Supported Linux Computing Environment, as well as to provide an area with which to document answers for future reference. White II is a radical departure from tradition at LNS. Originally Linux use spread throughout the lab at the hands of students, faculty, and staff who wiped Windows to install the Free OS in order to solve a variety of problems as individual as the person performing the installation. Unfortunately, with this rapid growth came the inevitable security problems, account management headaches, and centralized backups fundamental to any large scale UNIX deployment -- but without the necessary infrastructure planning and staff beforehand to support every system as the userbase grew. With nearly two hundred Linux desktops, batch hosts, and servers deployed throughout the lab it is long since time for CS to resolve these fundamental problems, or we will be unable to scale our deployment much further. White II is the culmination of a two year long project to resolve these problems by first laying the foundation for a completely new infrastructure based on centralized account management, centralized home-directories served via AFS, centralized email access on all White II workstations, and centralized access to the LNS compute cluster, and then moving the client workstations over to this new infrastructure. It is now mostly done, though there are many problems to resolve, many kinks to iron out.

We at CS sincerely hope readers find the information in this document useful. However, we also hope readers will pose new questions so that the document remains fresh with relevant information current to any future situation at hand. Please do not hesitate to contact CS with a link as reference upon noticing any factual error, new question, or confusion over any of the contents published in this FAQ. Please direct any such correspondence to the current FAQ maintainer: J. Maynard Gelinas

1.0 GENERAL QUESTIONS ABOUT WHITE II AND THE SWITCHOVER:

1.1 What are the major differences between Green, White I and White II?

Green is an ancient install of Redhat-4.2 or Redhat-5.2 which was common throughout LNS some years ago. It's not supported by CS any longer, though there are a few straggling users on very old hardware which still use Green. CS stopped supporting Green primarily because we can't find and/or provide security updates for all of the software which shipped with that distribution. The original White upgrade from Green was a response to this problem, along with general demand from the user community to migrate to (at the time) a more modern Linux distribution. White I was based on Redhat-6.2, while White II upgrades LNS to Redhat-7.3 -- the most recent stable release of Redhat Linux. Unlike White I, White II also depends on several backend services only available across the network. These services include centralized authentication via Kerberos, a centralized account database via OpenLDAP, centralized home-directories served via AFS, centralized access to our mail server spool directory via NFS, and (soon) transparent access to the LNS batch cluster from every desktop. Such features further the goal of meeting the LNS Computer Users Committee request to better integrate the Linux computing environment throughout our lab, while reducing overhead costs over the long haul. It also means that White II is completely dependent on the network to function properly, though these days one can't do much with a standalone computer anyway.

1.2 What is Kerberos, AFS, and LDAP?

Kerberos FAQ

AFS FAQ

LDAP FAQ

1.3 How are Kerberos, AFS, and LDAP used with White II?

MIT Kerberos 5 is used as the primary authentication method. AFS, a distributed network filesystem, is used to serve home directories, application binaries, and project space. OpenLDAP is used to serve Username to UID/GID mappings, Full Names, and /path/to/shell (it's an /etc/passwd map without any passwords, served across the network).

1.4 Where are the servers located?

The primary Kerberos, AFS database, and OpenLDAP servers are all located in the CS machine room, 24-032b. These are the "top level" servers which hold master copies of each database. Redundant slave servers for these databases are located in each building so they are as close to the local user community as possible. The AFS fileservers are also located in each building, directly connected to each Cisco switch via a 1Gb SX fiber connection. Each fileserver has a hardware SCSI RAID array connected to the server host with about 250GB of disk available. Our mail server is also located in 24-032b, though it has no secondary slave servers located in each building it does implement disk mirroring for the mail spool and is backed up regularly.

1.5 What happens if a server fails?

If a Kerberos, LDAP, or AFS database server fails very little happens, the clients just fall over to another slave server and go on their merry way. However, if a fileserver fails every Read/Write volume served from that fileserver goes offline. Basically, this means that every home-directory volume served from that fileserver will be unavailable until that server comes back online or we restore all the affected volumes from tape to a new fileserver. Should the mail server hardware fail we have redundant hardware ready to put in place to resume service ASAP, and the mail spool disks are mirrored such that both disks would have to fail at once in order to lose data. If, in the unlikely event both spool disks fail at once, we will have to restore to the redundant system from tape backups CS performs nightly.

1.6 Doesn't this create a single point of failure for every client?

Yes and no. It creates a single point of failure for every client per building, because each building has its own fileserver. However, our backup server is also a redundant fileserver, completely empty and with more than enough disk space to store all the volumes from any one fileserver in production. So, in the worst case scenario where not only a fileserver host -- but all the data stored on the local RAID array itself -- is completely destroyed, we could restore from backups to our redundant fileserver and quickly get all the users' data available online. Data volumes aren't tied to any individual fileserver, so the very act of restoring each volume to any live fileserver makes that volume available to the entire AFS cell instantly. Note that for a RAID array to lose data the system must either experience two disk failures at once, or the host has to damage the filesystem(s) stored on that RAID beyond repair. The first case is extremely rare, the second less so for any one filesystem but extremely rare for all filesystems stored on said RAID. However, CS has planned for quick disaster recovery in the event of total failure of any and all hardware associated with a fileserver.

1.7 Why change at all? Weren't things running just fine before?

No, they weren't. Previously, each machine in LNS handled user authentication locally via /etc/passwd (a security issue in and of itself) with all home-directories stored on each machine's local disk. Backups were handled via dump and piped across the network to our backup server, an Alpha host with a DLT tape drive. Users were often confronted with multiple accounts each capable of having its own password, multiple home-directories with separate contents stored on individual machines, and even several mail boxes -- one per workstation. Worse, if a user turned his/her computer off for whatever reason (which happened often) CS was unable to backup data stored on that computer, further complicating the logistics of backup/restore operations -- never mind being unable to install critical security updates. CS spent huge amounts of time chasing down offline workstations, managing redundant accounts with all the inevitable forgotten passwords, and backups for each computer. It only got worse as new White hosts were added to the network. The system didn't -- couldn't -- scale. In addition, whenever the hardware of any single workstation died its user was often confronted with being unable to log in to access his/her files in order to get back to work, while waiting for tape restore operation to another host which often took days (and significant staff time) to complete. White II allows a user to simply sit down in front of any other White II workstation should any one desktop host fail. It scales to large numbers of client workstations far better than what we had with old Green and White I hosts (Athena Clusters provide good evidence of this), and it has a better backup system to boot. Given the large number of workstations purchased over the last year CS was forced to migrate to some kind of central authentication and network served home-directory system. White II using Kerberos, AFS and OpenLDAP turned out to be the solution best suited to our situation.

1.8 Couldn't you have used NIS or LDAP with NFS?

Only if you don't care about security. Only if you want your data to go offline every time the NFS server goes down for maintenance. Only if you want network traffic to exceed available bandwidth, even after our switch upgrade. Unfortunately, NFS has it's own set of problems which make it unsuitable for serving large amounts of data and large numbers of home-directories across a WAN. Note that we actually are using LDAP to pass out Username to UID/GID mappings, full names, shell location, etc; a network served "/etc/passwd" map without any encrypted passwords, authentication being handled by Kerberos.

1.9 And AFS is better how?

NFS with NIS or LDAP has none of these features.

1.10 Isn't serving home-directories across a network much slower than local disks?

Yes. AFS mitigates the issue somewhat by caching reads on the local workstation, but AFS writes will always be slower than writing to your local disk. Even an ancient workstation (LNS has many) which lacks IDE DMA will still find network writes to be slower than writing to a local disk. However, given our recent switch upgrade the network is quite modern. Most every workstation has it's own 100Mb Full Duplex connection directly to a Cisco switch, while every fileserver has a 1Gb Full Duplex SX fiber connection to each switch. In practice this means that any arbitrary workstation has a theoretical maximum AFS write transfer rate of ~3MB/sec or so after subtracting layer 2 and 3 network plus AFS protocol overhead. On the fileserver side it's roughly about five to seven times that, or between 15MB/sec to 21MB/sec (about 2/3rds the transfer rate of IDE ATA100 with DMA). So, a file server's network pipe should saturate given about five to seven users maxing out their network pipes at the same time, either reading new files (files not locally cached), or committing writes. Even in CTP, where we support about one hundred workstations against a single fileserver, we haven't seen any network bottlenecks across its 1Gb pipe. Should this change CS will consider channel bonding two 1Gb connections from the switch to the fileserver.

1.11 If I can log into any White II host throughout LNS, does this mean I can run compute intensive jobs on any host I like?

Only if you have the explicit permission of that workstation's primary user. The authentication system doesn't prevent you from misusing another person's workstation (though if it turns into a problem we can implement per user/per host restrictions), but if they find out and complain you'll likely get in trouble. Instead, ASK first and assume that other workstations throughout LNS are available only with the permission of its primary user. If you need a large compute job which requires several systems to finish in a reasonable time frame, why not consider submitting your job(s) to the LNS batch cluster instead?

1.12 I've heard something about "Project Space", what is this and how do I obtain some?

Project space is just an AFS volume assigned to you, or your project leader, for binary application or data storage. It's generally available down /lns/projects/{ctp,ppc,hig,etc} and is particularly convenient for storing your own special shared binaries or static data. Suppose you need to rebuild cernlib or ROOT with special tweaks: CS assigns you a set amount of project space by quota and you build your binaries down this tree. These binaries are now available to every host throughout LNS, including the batch cluster. This is but one example of how project space can be used to further the goals of the various groups while providing a consistent computing environment throughout LNS.

1.13 Why isn't there an F90/F95 compiler on White II?

Because there are no known free F90/F95 compilers which are of production quality -- yet. White II ships with GNU G77, an F77 compiler that works as advertised. And Intel offers an F90 compiler which seems like it may be legal for the LNS community to use for non-commercial work. But we in CS are not lawyers, and we've been bitten by "free for non-commercial use" licenses before. Suggestions welcome.

1.14 I've noticed that the picture on the login box keeps changing. Why?

This is to notify users that their workstation has been updated or upgraded. Look carefully at the corner of each photo and notice a date stamp in tiny print, this is the release date for that version of White. The login splash screen changes with each quarterly update as well. The login box photos are of important physicists throughout history, while the splash screen is an attractive photo reflecting the season of the year.

1.15 What's the difference between an update and an upgrade? And why so often?

CS's experience with Green convinced us that users in general prefer current software on their workstations. To further this goal we have implemented scheduled updates and upgrades of all desktop workstations throughout LNS. An update provides security fixes, bug fixes, and the occasional new feature. Upgrades are a complete wipe and reinstall of the workstation in order to migrate to a new release of the operating system. Scheduled updates happen quarterly. Full upgrades happen usually once every eighteen months or so after Redhat releases it's latest "stable" version. However, users should note that an unscheduled update can happen at any time in the event a critical security update from Redhat is released, or a major security hole is announced publicly. If our systems are vulnerable to remote compromise we must take immediate action. As such users should be aware that in order to maintain system security across our network they may be asked to log off at any time to facilitate the installation of a security update. Further, should a user walk away from a running session we may log them out in order to perform such an update.

1.16 Why should I have to log out for a system update?

Because sometimes successfully installing an update requires rebooting the host to take effect. For example, if CS updates the system kernel, core system libraries like libc, libm, openssl, etc a reboot is generally required. Or if CS updates XFree86, the graphical environment, this might require that a user log out and we restart X. We will only do this if it's absolutely necessary.

2.0 SUPPORTED HARDWARE FOR WHITE II:

2.1 What hardware works with White II?

White II is based on Redhat 7.3 and ships with the vendor supplied kernel. As such just about any hardware listed as "Certified" on the Redhat Hardware Compatibility List will likely function properly on White II. The real question is what works well? and to what hardware should CS limit their support?

2.2 Why doesn't CS support Dell, IBM, Micron, [insert favorite vendor here]?

Some vendors ship their computers pre-installed with Redhat Linux and provide direct support for that configuration. CS prefers to only deal with vendors who directly support Linux since many large commercial vendors require that the exact Windows installation, as shipped with the product, be installed before they will provide any technical support or an RMA. Most vendors won't even include Windows installation media with newly purchased workstations, they simply supply a "Windows repair" CD which can only be used with that specific workstation. To make matters worse, most large vendors won't even provide specific details of the hardware included with each line of workstations, either as line items in a quote or documented on their web page, preferring instead to distribute the cheapest components on the open market at any one time for any given product line. Given these factors it's nearly impossible for CS to support a wide range of products from large commercial vendors: we don't have the staff time to maintain a database of which "Windows restore" CDs relate to various systems throughout the lab. Nor can we ever know whether a vendor will ship any particular product line with the same hardware configuration at some future point, so we don't know if a product line that works today will work tomorrow. Until the big vendors become more Linux friendly our best recommendation will remain to purchase from small time chop shops who specialize in Linux sales and support.

2.3 Does CS recommend any vendors?

PSSC provides a range of Linux clusters and workstations to the open market. They are but one hardware supplier that LNS uses, SWT being another popular vendor. PCs For Everyone, a local Cambridge vendor also supports Linux, though unfortunately CS has experienced so many problems with this store that we can't recommend them to anyone. Upon calling PSSC ask to speak with "Janice" or "Alex" Lessor. If you tell them you want an LNS "White" box they'll know exactly what you want. At SWT ask for "Martin". CS hasn't purchased directly from SWT before, though some folks in NiG (Nuclear Interactions Group) have expressed satisfaction with their service.

2.4 What specific hardware configurations does CS support?

Most any modern PC should work as long as the ancillary hardware (video, sound, network) included with the computer is on Redhat's HCL. But there are a few "gotchas":

CS has also learned not to provide a complete listing of specific supported hardware because hardware support in the kernel changes too rapidly.

2.5 Why doesn't CS support the use of a SCSI boot disk?

The goal of White is to unify Linux support across all of LNS with a single bootable OS installation image. While every PC motherboard supports booting off of a variety of standard IDE devices, support for booting from a SCSI device is handled by each SCSI chipset slightly differently. To write a functional boot block to a SCSI disk the proper disk geometry must be fed to the boot loader and the SCSI controller must have a loadable module set up in an "initial ramdisk image" (initrd). None of these problems exist with IDE simply because IDE is native to all PCs and as such IDE support is always compiled directly into the kernel. SCSI boot disks complicate the installation process making CS less able to script updates and upgrades across all LNS workstations.

Note that CS does support the use of SCSI disks for data01 storage.

2.6 Why does CS discourage the use of IDE CD burners?

Here are a few good reasons:

  1. IDE CD burners tend to consume huge amounts of CPU while burning a disk. The upshot of this is that it's pretty easy to burn a CD coaster by doing something mildly CPU intensive (such as rendering a complex web site) while burning a CD. SCSI offloads most of this CPU intensive work to the SCSI controller in hardware, making for a more reliable CD burning experience.
  2. Every SCSI CD burner out there will work with a normal SCSI controller. One needn't worry about IDE CD burner support, as long as the SCSI card is supported.
  3. The SCSI emulation subsystem in Linux is a terrible, horrible, gawdawful, hack. To get an IDE CD burner functioning one must install a driver for that specific IDE CD burner, which is then tied to a SCSI emulation system in the kernel. The CD burner then looks like a SCSI device to the end user.
If you are determined to purchase an IDE CD burner please forward the quote to CS so we can verify with the vendor the specific manufacturer and model number of the CD burner they plan to ship with the system and its driver status in the kernel tree. If you do not involve CS in this purchase we may be unable to support that hardware on delivery.

All that said, IDE CD burners sure are cheap! As of this writing many are available in the $60 range, while SCSI burners still cost about $300 a pop. You get what you pay for.

2.7 I need support for some esoteric hardware. What are my chances?

CS can't answer that question until we see the exact name and model number of the hardware you wish to purchase. However, a truism of Linux is that the newer the hardware the less likely it will be properly supported. Linux does a great job at supporting old hardware no one remembers, but bleeding edge PCI cards and devices are usually not so well supported. Please contact CS if you have any questions.

2.8 Will CS vet a quote before I purchase a machine?

Yes! Though we prefer to vet quotes for purchasing blocks of machines, such as five or ten at a time, users may forward quotes to CS at any time if they have a question about hardware or vendor support. We will contact that vendor with any questions we may have and reply back to you with any problems we notice. Note that if you follow through on purchasing a machine which won't support Linux CS will be forced to leave any unsupported hardware as unconfigured (what else can we do?).

3.0 BACKUP POLICIES AND SUGGESTED TIPS:

3.1 What are the new supported backup policies?

Backups are now managed centrally via the AFS backup and restore subsystem. Like our previous backup system, we perform Yearly, Monthly, weekly, and daily incremental dumps to tape. Our primary backup server is an x86 host running Debian 3.0 Linux with an IBM-LT Tape drive that provides 100GB of uncompressed storage per tape, and a 3Ware IDE RAID controller card supporting eight 160GB IDE disks in a RAID 5 array. The array is built across seven disks with the extra being used as a hot swap failover, giving a total of 960GB of usable disk space. That disk space is then split into two segments:

The goal, of course, is speedy recovery in the event of hardware failure. However, it's also common for users to request a restore of archived files and/or directory trees after an accidental deletion. Both services are expected of any backup system.

3.2 How is this different from backups under Green and White I?

The AFS backup subsystem can only backup files and directory trees which exist inside an AFS volume and doesn't perform dumps of local Linux/UNIX filesystems. Under Green and White I CS performed full and incremental dump operations in /export/home on each live workstation late every evening, the output of which was piped across the network to an Alpha host running Digital UNIX (Tru64) and ultimately to a DLT tape drive connected to said host.

3.3 What happens after I request a restore?

There are two potential scenarios. In the first case a user requests restoration of his/her home-directory from a specific point in time:

In the second case an entire fileserver fails due to a hardware malfunction causing total data loss. In this case CS will:

3.4 How do "clone volumes" relate to the AFS backup subsystem?

As touched upon in Sections "1.9 And AFS is better [...]" and "10.9 Today I accidentally [...]", a clone volume is a complete Read Only copy of the original stored as "volume-name.backup" within AFS. This volume can be mounted within the AFS filesystem tree just like any other, hense ~/.clone. However, unlike an image copy of a volume it doesn't consume twice the disk space. Instead, during the first clone operation, the original is renamed with a .backup stamp while file system metadata is copied over to a new "original" volume as a set of pointers to the relevant portions of the clone. After cloning, new changes to the original volume are appended as incremental deltas to the master copy. The next clone operation merges the two into a single master copy before starting the cloning procedure on the master yet again. While a volume is being cloned it may defer writes for a short period of time to maintain data consistency between the clone and the master copy, though this usually doesn't last for more than a few seconds. The more changes (additions or deletions) to a volume since the previous clone operation the more work must be done to finish a clone, thus the longer it might take. Clones are performed either individually per volume or en masse on every volume throughout the entire cell. Since mass cloning normally happens at midnight before a backup, this should rarely affect users. The clone provides a consistent Read Only copy of each user's home-directory which the backup system then dumps to tape. As noted previously, one common problem with dump, tar, and cpio on UNIX systems is the possibility (though rare) that a user might commit a write (change) a file at the exact instant it is being backed up, thus corrupting that file on tape, and potentially the entire backup. Dumping a Read Only clone instead of the original prevents this possibility from occuring.

3.5 Can I transfer my AFS home-directory to a remote AFS site?

Yes. There is special provision within AFS to dump an AFS volume to "plain text" (ASCII) format for transfer from one AFS cell to the next. If you happen to be moving offsite to a remote location where they have an AFS cell set up you can request that we dump your volume to a file for restoration at the remote site. This ASCII format is designed to support the saving of all AFS filesystem ownership and permissions, such as ACLs and group membership. The ASCII format also handles restoration on different architectures with their varying endianness and word length issues, so if your new remote location runs their AFS fileservers on proprietary RISC hardware from Sun, IBM, Compaq (Digital), HP, SGI (whatever) this is the only way to properly restore your volume at the remote site.

3.6 Why doesn't CS back up local workstation disks any longer?

In the past CS only supported backing up the home-directory partition of each workstation. Since all home-directories are now stored in AFS, and since AFS has it's own backup and restore subsystem, our need for the old system has vanished. That said, there are also good reasons why we shouldn't use dump under Linux (from Linus's mouth no less), which amounts to random backup corruption during a dump operation.

Many users are surprised to learn that CS never backed up data01 directories or email stored on local workstations. The reason for this is simple: lack of available network bandwidth to our old backup server and lack of tape capacity. Many workstations have large data01 disks -- some as large as 40GB or even 60GB -- which simply can't be backed up across the network. Considering that DLT tapes only store 35GB uncompressed (actually between 15GB and 20GB given the tapes we purchase), and the large amount of cheap disk being purchased on local workstations, CS was forced to limit backups to /export/home (which on Green was 300MB and White I 1GB per workstation). So, the upshot is that we don't have the infrastructure to back up such large disks, and we can't scale the infrastructure as we add workstations due to lack of network bandwidth and tape capacity.

3.7 Can users backup their own local files then?

Yes! There are two ways to handle this problem:

3.8 Can you give explicit instructions using tar?

Assuming your workstation is named mycomputer.lns.mit.edu and the secondary storage is on othercomputer.lns.mit.edu here are the steps to follow:

  1. $ cd /export/data01
  2. $ tar -cpvIf /net/othercomputer/data01/mydirectory/mycomputer-data01.tar.bz2 mydirectory
This will create a "full" tar dump of the entire contents of your data01 directory, bz2 compressed. To extract this tarfile type:
  1. $ cd /export/data01
  2. $ tar --overwrite -xpvIf /net/othercomputer/data01/mydirectory/mycomputer-data01.tar.bz2
Note that this extraction will overwrite any file(s) in /export/data01/mydirectory, so please be careful. See man tar for complete details on backups including appending incrementals in case you desire archivals over time.

3.9 Can you give explicit instructions for using Tivoli?

Full information on MIT/IS support for IBM Tivoli backups can be found here. The basic steps toward getting TSM running on your workstation follows:

  1. Before deciding on this route please read the MIT/IS Policies and Terms of Service.
  2. If you agree to these terms then register for TSM service with MIT/IS.
  3. Finally, you must download the proper client for your platform. Choose x86 Linux, and in most cases select the LATEST version.
  4. Once you have an account name and configuration from MIT/IS, CS will install and configure the client software on your workstation in order to communicate with MIT/IS's Tivoli backup servers. You are responsible for configuring which directory trees the TSM client will backup.

4.0 NEW USER MIGRATION QUESTIONS AND COMMON PROBLEMS:

4.1 I can't seem to SSH into my workstation any longer, it just hangs. What's happened?

Most likely this is because you're using the old name "hostname.mit.edu" instead of "hostname.lns.mit.edu". Be sure to use the new .lns.mit.edu Fully Qualified Domain Name. If this still doesn't work please let CS know by entering a WREQ, the host may have crashed.

4.2 Why did you switch to a new domain name like this?

The transition is part of a nearly two year plan to migrate off of the MIT 18.77 network onto ESNet. Partly because it's cheaper, and partly so we can manage our own DNS SOA (Start Of Authority). By managing our own DNS maps we can add and remove hostnames at will, without waiting for MIT IS.

4.3 I can't login to my workstation any longer. What happened to my account?

Authentication in White II is now centrally managed with Kerberos instead of local accounts on every host as before. Users will have to request a new account from CS in order to login to their workstation. It would be best if CS could speak with the user over the phone during the account creation process so we can make certain the user changes his/her password to something secure. CS will also gladly create the account in person, which resolves the issue of changing temporary passwords. Once the account is created it is active on every LNS White II host, so no more account requests for various other workstations are necessary.

4.4 Why didn't CS just copy my old password to the new system?

Kerberos stores passwords with a completely different encryption method from standard UNIX /etc/passwd maps. Since CS can't (and shouldn't be able to) decrypt a user's password to pass to the kerberos account creation tool, it's impossible for CS to migrate the old account passwords over to the new system.

4.5 I've just logged into my new White II account, but all my files are gone! Where are my files?

Your new home-directory is provided to you from one of our RAID hosts and served via AFS to your workstation. As such it's completely new and free space. You didn't lose disk space, we only gave you more disk stored on our central RAID hosts. Your files are located in /export/oldhome/username or /export/data01/username on whichever workstation you saved them upon, and should have remained unchanged through the upgrade. CS took great pains to avoid destroying personal data on each host. Please contact us if you believe you are missing any personal files.

4.6 Why didn't you just copy my old home-directory over into the new home-directory space?

Because we have no idea which home-directory is your primary, nor on what workstation it resides. Since the home-directories were previously stored locally, and since most users had multiple accounts on multiple machines, CS staff had no idea which host stored the most important data for each user. And we don't have enough central disk space to store all home-directories from every workstation for each user.

4.7 I know which machine contains my most important files. How do I transfer all of my files at once? What do I do about Netscape bookmarks? And what about my Gnome configuration?

Here is what CS does:

$ cd /export/oldhome/my_username

Gnome and browser settings cannot be moved over. CS recommends that before copying over the original home-directory, one delete the old Gnome configuration and move aside the original .netscape and .mozilla directories (we'll copy your bookmarks from these later).

  $ rm -rf .gnome*
  $ rm -rf .nautilus*
  $ rm -rf .gconf*
  $ rm -rf .gtkrc
  $ mv .netscape .netscape-orig
  $ mv .mozilla .mozilla-orig

Copy the contents of your old home-directory to AFS space:

$ tar -cpf - . | (cd /afs/lns.mit.edu/user/my_username && tar --overwrite -xvpf - )

( ...a bunch of stuff gets copied... )

Change directory to your new home-directory:

$ cd ~

Create a new Netscape profile:

$ netscape

(Accept the license and allow it to populate a new .netscape directory, then quit the application)

$ cp .netscape-orig/bookmarks.html .netscape

Launch Mozilla in order to import your Netscape preferences into a Mozilla.

$ mozilla

Done!

4.8 Where are my digital certificates? Why not just copy .netscape and leave it as is?

Netscape and Mozilla embed the path to your home-directory throughout .netscape and .mozilla. We can't fix this. It's much easier just to save your bookmarks and let Netscape create a new configuration tree on its own. Unfortunately, the upshot of this is that you'll have to obtain new digital certificates and perform other mundane configuration chores within Netscape/Mozilla.

4.9 I'm getting "Quota Exceeded" errors during the copy! What do I do now?!?!

Clear up some space in your old home-directory by deleting redundant files and/or moving them into your data01 directory. Copy it over again. Additionally, you can ask CS for a larger quota; all accounts get 500MB by default. Increasing your quota is a simple enough operation as long as we have enough disk space to hand out.

4.10 Hey, after the upgrade RSA authentication with ssh doesn't work! Now I can't login without entering a password, which breaks my local CVS repository!

Yeah, we know. Unfortunately, there's little we can do to resolve the situation. What's going on is that your RSA public and private keys are stored in ~/.ssh, which is locked up privately in an AFS volume. The sshd running on every workstation doesn't have permission to read from ~/.ssh until after you authenticate, which means it can't read your public and private keys in order to handle key exchange. There's nothing we can do about this without creating a major security hole. Should a solution present itself in OpenSSH we'll implement it pronto. Note that there is discussion toward setting up a departmental CVS server to resolve the primary problem here.

4.11 Why is Ghostview and xpdf so slow? Is this because of AFS?

CS has noticed that Ghostview and xpdf render postscript and pdf files at blazingly slow speeds. It's so slow one can actually watch as it scrolls down during the render process. This is not due to slow AFS filesystem access. It seems limited to Redhat-7.3 and is such a common annoyance the Linux newsgroups are full of discussion on the issue; here is one Google thread on the subject. Reading down, one person suggests the Gostscript package from Redhat's beta Rawhide distribution, but no supported update from Redhat is forthcoming to fix this problem. If we can get the beta package to compile properly we'll likely install it at the next White update, otherwise the fix will have to wait until after Redhat releases an updated package for RH7.3 or we upgrade to RH8.x.

4.12 Every time I login I keep getting a gconf error and then Nautilus dies. What's that all about?

The text of the error message should read:

GConf error:
Failed to contact configuration server
(a likely cause of this is that you 
have an existing configuration server
(gconfd) running, but it isn't reachable
from here - if you're logged in from
two machines at once, you may need to enable
TCP networking for ORBit)
TCP/IP networking is enabled by default! Type cat /etc/orbitrc and see for yourself. The worst part about this bug is that once it happens you'll likely have to completely wipe your .gconf* directories in order to restore gconf back to normal. Unfortunately, you must do this after having logged out. Gconf just doesn't deal well with multiple login instances across multiple machines using the same home-directory. Hopefully Gnome-2.0 will resolve this problem. To wipe ~/.gconf* properly follow these directions:

  1. Completely log out of your Gnome session.
  2. Press: [ctl][alt][F1] at the same time to open a text based console
  3. log into your account
  4. rm -rf .gconf*
  5. log out
  6. press [alt][F7] at the same time to go back to the graphical login box.

If this doesn't solve the problem contact CS with a WREQ. Hopefully we'll be able to do better. Maybe. Welcome to Free Software! :)

4.13 Why do Nautilus desktop icons for "Home", "Trash", and "Start Here" disappear upon being clicked?

Good question, unfortunately CS doesn't have an answer. This is one of those strange bugs which just doesn't seem to make any sense whatsoever. It's not a result of an IP Filtering issue, as it happens no matter what IPTables filters are installed in the kernel. The problem doesn't happen if your home directory is set explicitly to a local disk partition. However, if the home-directory exists on any type of network filesystem -- be it NFS or AFS -- the problem crops right up. Strangely, if the filesystem is local but served through the automounter the problem persists. And finally, in the last twist of weirdness, one CS member has noticed that from his home workstation (which is running White II across an AT&T Broadband cable-modem) the problem disappears! We're stumped, and questions posted to the Gnome Nautilus mailing list elicited no responses. Do you, kind reader, have any ideas? Please contact CS with suggestions if so.

Note: Nautilus hasn't actually died, it just isn't painting icons on the desktop any longer. As long as Nautilus is still configured to "Use Nautilus to draw the desktop" (see: Preferences -> Edit Preferences -> Windows & Desktop) you should be able to right click on the desktop to get a context sensitive Nautilus menu.

4.14 I can login to any White II host except my workstation. What's wrong?

If you can log into any White II host but your own, it's likely that the system clock on your workstation has lost synchronization against our Kerberos and AFS servers. Kerberos requires that each workstation's clocks be synchronized within five minutes of the KDC (Key Distribution Center), or all authentication on that system will fail. Each workstation runs ntpd (network time synchronization daemon) against our time server, but it's possible for ntpd to die and the system clock to then drift over time. If you're having this problem take note of the clock in the upper right hand corner of the GDM login box. If this is off by five minutes or more in comparison to other nearby systems then you can assume this to be the cause. Please enter a WREQ and CS will take care of the problem ASAP.

5.0 KERBEROS TICKET AND AFS TOKEN QUESTIONS:

5.1 Sometimes authenticating on login takes longer than before. What's going on?

Authentication is now managed by central Kerberos servers, just like with an Athena workstation. Authentication may or may not take longer depending on our server load.

5.2 Sometimes, some number of hours after login, my machine says "permission denied" when I try to access my home-directory. What's going on?

Kerberos works by handing out a master authentication "ticket", called the Ticket Generating Ticket (or TGT) on login, which is then used to generate "service tickets" for various other kinds of network service, AFS being one. This master ticket has a Time To Live (TTL) set by default to ten hours. Once the ticket expires, authentication fails, and the user is forced to re-authenticate. See the Ticket Expiration in the Kerberos FAQ for additional details. Since the TGT has a TTL of ten hours it's likely that a "permission denied" message when trying to access your home-directory, even after login, is due to your ticket having expired. The simplest fix is to simply log in in the morning and log off in the evening, instead of leaving your session running overnight.

5.3 Isn't there a way around this?

No. Kerberos works this way by design. We're working on extending ticket length to 24 hours, but CS cannot change Kerberos at the protocol level. Users can re-authenticate with command line tools, so if your TGT expires and you have a local shell available it's easy enough to generate a new ticket and AFS token.

5.4 Isn't it possible to extract a Kerberos principal into a keyfile for automatic authentication?

Yes. But it's a BAD idea to implement. We use this Kerberos feature to pass administrative level authentication out to our slave servers for cron jobs and such. However, those keyfiles are kept strictly on critical servers with carefully chosen filesystem permissions, on which no one but CS administrative staff has access. Think about it: if CS provided you with a keyfile for your Kerberos principal you would have to store that file on some workstation in one of its local filesystems such as /export/data01 or /export/oldhome. And if anyone EVER gained access to that file they could become YOU on every LNS system at any time they chose. Now consider that data01 and oldhome directories are NFS exported across the LNS network in the clear, no encryption. CS simply cannot do this, no matter how convenient to the LNS community, for very good security reasons.

5.5 If my AFS token expires over time how am I supposed to run long compute jobs?

Simple: authenticate normally by logging in. Move your data and binaries into your /export/data01 directory (or any filesystem local to that workstation). Start your job reading and writing from the local filesystem. Now when your ticket expires the job won't need access to your home-directory and can continue on its merry way.

Catch-22: but now I can't log out because my job is running and I can't read my home-directory because my ticket expired! What am I supposed to do?

In a local shell type:

       $ kinit
         password: (type in your password)
       $ aklog
       $
kinit will create a new TGT after authentication, aklog then uses this ticket to generate an AFS authentication token from one of our AFS database servers. Once you have the new TGT and AFS token you're good to go, without having logged out and back in.

5.6 Could you please repeat how to re-authenticate a live session with kinit and aklog?

Sure!

In a local shell type:


   $ kinit 
     password: (type in your password)
   $ aklog
   $
Done!

5.7 How do I find out when my tickets will expire?

See: man klist

   $ klist

Ticket cache: FILE:/tmp/krb5cc_1126_GyTlzH
Default principal: gelinas@LNS.MIT.EDU

Valid starting     Expires            Service principal
08/13/02 10:11:00  08/14/02 10:11:00  krbtgt/LNS.MIT.EDU@LNS.MIT.EDU
	renew until 08/14/02 11:09:36


Kerberos 4 ticket cache: /tmp/tkt1126_71e9Iu
Principal: gelinas@LNS.MIT.EDU

  Issued              Expires             Principal
08/13/02 10:11:00  08/14/02 07:26:00  krbtgt.LNS.MIT.EDU@LNS.MIT.EDU
08/13/02 10:09:36  08/13/02 21:54:36  afs@LNS.MIT.EDU
[gelinas@pea gelinas]$ 

5.8 I know of this other site which allows longer tickets (fifteen hours, twenty four hours, etc). Why didn't you change the default ten hour limit to something longer?

We're working on it. For legacy reasons the Redhat pam_krb5afs.so PAM authentication module converts a Kerberos 5 TGT into a Kerberos 4 TGT before generating an AFS token. Recent versions of OpenAFS can actually generate the AFS token directly from a Kerberos 5 TGT. For example, if your session expires and you re-authenticate with kinit and aklog, upon running klist you'll notice that the Kerberos 4 TGT and AFS token is still expired, while the Kerberos 5 TGT and AFS token is fresh. You'll also notice that if you specify kinit -l 24h you'll get a twenty four hour Kerberos 5 TGT and AFS token. Logging in from the graphical GDM login box, however, performs a Kerberos 5 TGT conversion to Kerberos 4, and then generates the AFS token from the Kerberos 4 TGT. It's this conversion process along with a bug in the pam authentication module which limits the AFS token to only it's default of ten hours.

5.9 So when are you going to extend our ticket length to twenty four hours by default?

As soon as Redhat fixes the pam_krb5afs.so module such that it either:

a) Performs a direct Kerberos 5 to AFS token conversion. Problem solved.

b) Properly accepts a longer default TTL using the "ticket_lifetime=" stanza in the pam configuration (see: man pam_krb5afs).

The Kerberos 4 spec limits ticket TTL to 22 hours, so if the pam module is fixed without feature a) we couldn't offer a full 24 hour ticket. But either of these ought to allow us to change the ticket length default with reasonable results. CS has emailed the pam module maintainer and he says that it will be fixed "real soon now". When that may be is anyone's guess.

6.0 REMOTE KERBEROS/AFS AUTHENTICATION:

6.1 I have an AFS account at a remote location (athena.mit.edu, cern.ch, ir.stanford.edu, etc). How do I log into that account?

It's hard for CS to give explicit instructions because each site can implement a variety of AFS configuration options which affect authentication. AFS ships with a Kerberos 4 implementation by default which some sites use. Others have standardized on a hacked version of AFS with Kerberos 5, including MIT Athena and LNS. And some other sites use DCE, which runs yet another version of Kerberos and AFS. Confusion reigns. The OpenAFS project has officially stated it's intention to migrate OpenAFS to Kerberos 5 in the near future, so the various compatibility issues ought to be resolved over time. White II includes utilities to support the traditional Kerberos 4/AFS system along with MIT Kerberos 5/AFS used here, but we make no promises that it will actually work at any arbitrary remote site. If you run into trouble contact a System Administrator at the remote site and ask for documentation or help. If he or she can't solve the problem alone have him or her contact CS. Together we'll likely be able to solve the problem. But maybe not -- no promises.

Assuming you have an account at the remote site, if they are using traditional Kerberos 4 as supplied with the AFS distribution then to log in one would type:

$ klog your username -cell remote cell name
$ aklog -c remote cell name

Most sites which uses Kerberos 4 as shipped with the AFS distribution expect a "cell name", which by convention should be the same as their DNS domain, usually in lower case.

For a site which uses Kerberos 5 with AFS (such as MIT Athena) type:

$ kinit principal@KERBEROS.REALM
$ aklog -c remote cell name

In this situation "principal" refers to your username at the remote site while "KERBEROS.REALM" refers to the remote Kerberos Realm such as ATHENA.MIT.EDU, MEDIA-LAB.MIT.EDU, or ANDREW.CMU.EDU. This is not the same as a DNS domain name, though it's a common naming convention for Kerberos administrators to assign the realm name to be the same as the local DNS domain in UPPER CASE. To add further confusion to the mix, the remote cell name will almost certainly be the same as the remote Kerberos Realm name, but in lower case yet again!

An explicit example for logging into one's Athena account from a White II host follows:


  $ kinit gelinas@ATHENA.MIT.EDU
    Password for gelinas@ATHENA.MIT.EDU: (my password here)
  $ aklog -c athena.mit.edu
  $ ls /afs/athena.mit.edu/user/g/e/gelinas 
  Mail  OldFiles  Private  Public  README.mail  welcome  www
  $ touch !$/foo
  touch /afs/athena.mit.edu/user/g/e/gelinas/foo
  $ rm !$
  rm /afs/athena.mit.edu/user/g/e/gelinas/foo
  $ ls /afs/lns.mit.edu/user/gelinas
    (.... my files list out ...)
Notice how after authenticating with the remote Kerberos Realm and obtaining an AFS token, my current lns.mit.edu authentication is still functional.

Be aware that in both cases for this to work one must have the relevant AFS database servers listed in the workstation's CellServDB file. For the case with an MIT Kerberos 5 server handling authentication, one must also have Kerberos 5 servers for that realm listed in /etc/krb5.conf. CS happens to know that Athena and cern.ch logins work, but there are many other AFS cells in the known universe.

6.2 How come when I cd to a remote AFS cell I know exists, it isn't there in /afs?

It's not in our CellServDB, a database of all "known" AFS cells distributed by OpenAFS and Transarc. It should be though. Let us know and we'll make certain this is fixed in an update to your (and all LNS lab) hosts.

6.3 OK, so how do I access lns.mit.edu from a remote location? They don't have us listed as /afs/lns.mit.edu either!?!?

See the previous question and contact their CS counterpart. We've just requested entry into the official CellServDB file, unfortunately this will take time to distribute out to all the remote AFS cells. However, we have set up AFSDB records in our DNS server maps, so any site using DNS to handle AFS lookups should be able to see us now. Again, contact a System Administrator at the remote site for details on solving this problem.

6.4 Is it possible to configure my home PC running Linux, Windows, MacOS X, Sun Solaris, or Digital UNIX to access our AFS cell?

Yes! With caveats. Be aware that OpenAFS doesn't support all platforms and all OS environments, though they claim that Windows NT/2K, MacOS X, Sun Solaris, and Digital UNIX (called Tru64 these days) are all supported above and beyond Linux. CS makes no promises that this will actually work, nor do we claim we have even tried it on anything other than Redhat-7.3 and Debian-3.0 on Intel. So, for the esoteric and commercial platforms you're on your own. Best of luck.

To get this working the software must first support your platform, you must install said software, and it must be configured such that it knows about our Kerberos Realm and AFS cell. If you really want to get fancy you could even set up OpenLDAP and log right into your LNS desktop across the network. For Redhat and Debian Linux, here is what you need:

Now you have the software. Here is all the configuration information you need to access our servers:

The relevant portion of /etc/krb5.conf:

       LNS.MIT.EDU = {
  kdc = kerberos.lns.mit.edu:88
  kdc = kerberos-1.lns.mit.edu:88
  kdc = kerberos-2.lns.mit.edu:88
  admin_server = kerberos.lns.mit.edu:749
                default_domain = lns.mit.edu
        }

The relevant portion of our CellServDB (stored in /usr/vice/etc by default and /etc/openafs on Debian):
>lns.mit.edu              #MIT/LNS Cell
198.125.160.133                 #afsdb1.lns.mit.edu.
198.125.162.11                  #afsdb2.lns.mit.edu.
198.125.161.19                  #afsdb3.lns.mit.edu.
The relevant portion of our /etc/ldap.conf:
# Your LDAP server. Must be resolvable without using LDAP.
host afs1.lns.mit.edu bldg26raid1.lns.mit.edu ctpraid1.lns.mit.edu

# The distinguished name of the search base.
base dc=lns,dc=mit,dc=edu
Finally, you must synchronize the system clock on your computer against our Kerberos servers to within five minutes, or Kerberos authentication will fail. One way to do this is to synchronize against the central MIT time server like so:
  $ rdate -s time.mit.edu
  $
There you go! Note that this data may change at any time. Also note that you are responsible for administering this software and these changes, and that any problems you may experience are yours to resolve. Finally, don't do this without a cable-modem, DSL line, or some direct Internet connection. Modem users will find the performance unacceptable. That's an understatement.

6.5 Can I access the lns.mit.edu AFS cell from my laptop?

Assuming you've installed the software on your laptop according to the previous question, the issue then becomes: what is my current network environment? Most laptops move from site to site, network to network, with temporary connectivity, hostnames, IP addresses, etc. To further complicate the situation, many sites (including LNS) use a gateway/firewall/router with Network Address Translation doling out RFC1918 private non-routable IP Addresses via DHCP. In this situation your laptop isn't directly connected to the Internet. Remote sites attempting to connect to your laptop will fail unless the gateway has a special rule configured to forward packets assigned to a specific port to your host on the internal network.

All that said, discussion threads on the OpenAFS mailing list suggests that an AFS client does work behind a NAT'd network as long as the database server(s) are on the public Internet (ours are). But CS hasn't tested this configuration, nor do we support this kind of operation. We would be interested in hearing any reports from user(s) who wish to try it out, but -- like in the previous question -- you're on your own.

7.0 EMAIL MIGRATION QUESTIONS:

7.1 Why did you change email so much? Why can't I forward mail to my workstation any longer?

For similar reasons to the Kerberos/AFS migration, we changed email because the previous situation of users sending email to multiple machines on multiple accounts was simply impossible to administer. Here are a list of some problems we faced:

These problems only grew worse as the number of workstations throughout LNS increased, forcing CS to implement some kind of simple and consistent resolution for the user community as a whole.

7.2 So what have your done to fix the problem?

The biggest change is that CS is now NFS exporting the mail spool from mail.lns.mit.edu (the old mitlns.mit.edu with new hardware) to all LNS White II workstations. These hosts mount this spool directory at /var/spool/mail, giving every White II system a single unified mail spool directory. With White II, users who leave their mail on our mail server, no matter which workstation they may login to, will have access to the same email as if it were stored locally on that workstation. All White II hosts change outbound email headers to read "From: user@lns.mit.edu" so that replies go to mail.lns.mit.edu instead of your workstation (this was true for White I as well). The sendmail daemon is turned off to prevent inbound email from being accepted at the workstation (it wouldn't have write access across NFS to the spool directory anyway), and also as a security precaution.

This is the culmination of an eighteen month long project to completely unify email access across all LNS White hosts. Users have been notified for well over a year that they are not to forward mail to their workstations any longer in order to clear out remote address books and such of email addresses pointing to specific workstations. CS believes that any pain experienced now by the user community should be minimal, and the benefits in reduced confusion for new users from the new consistency of our email environment will far outweigh any problems experienced in the short term.

7.3 Why does pine, no matter which workstation I login to, set my outbound header to "From: me@samehost.mit.edu"?

Check your ~/.pinerc. The problem is almost certainly specific to your pine configuration and not a systemic problem.


   $ grep user-domain .pinerc
     user-domain=samehost.mit.edu
   $
pine will automatically rewrite the outbound header to whatever is specified in "user-domain=" for every mail message you send. Change it either by editing the config file or through pine setup like so:

  1. $ pine

    (The pine main menu appears)

  2. Step the menu down twice from "L - FOLDER LIST" to "S - SETUP", then press "Enter".
  3. Press "C" for "Config"
  4. Step the menu down once from "Personal-name =" to "user-domain =", then press "Enter".
  5. At the bottom of the screen "Enter the text to be added :" will appear. Remove the entry by pressing "Back Space" until the field is empty. Press "Enter" when finished.
  6. Press "E" to "Exit Setup".
  7. Select "Yes" to commit changes.

The pine main menu should now appear as before and the changes should take effect immediately. If your ~/.pinerc is not the culprit please contact CS ASAP with a WREQ.

7.4 I forward my mail to: my Athena mbox, another University account, hotmail, my ISP, etc. How does the new email system affect me?

It doesn't. Once you forward your email off of our mail server at LNS we have no administrative authority over that email, nor can we provide any services to you should you have a problem with email stored at a foreign location.

7.5 One of those remote sites then forwards my mail back to my local workstation. Why doesn't that work?

Because the workstation's email MTA daemon, sendmail, is turned off. Please stop forwarding email to your local workstation, your mail won't be delivered and will simply bounce. If you wish to read mail locally then leave it on mail.lns.mit.edu and it will appear local to whatever workstation you happen to be using

7.6 How do I leave my email on the LNS mail server properly?

  1. Log into ralph2.mit.edu.
  2. type "setfwd"
  3. Enter "3" to leave mail on server
  4. Type in your password (it will prompt you)

As long as setfwd returns with no error messages the change should take effect within an hour or so. Please see the LNS mail web page for further details.

7.7 Why won't the mail server allow me to login?

When attempting to ssh into the mail server it appears to allow the login and then closes the connection like so:

[gelinas@pea gelinas]$ ssh gelinas@mail.lns.mit.edu
gelinas@mail.lns.mit.edu's password: (my password here) 
Linux mitlns 2.4.12 #1 SMP Mon Nov 5 11:57:59 EST 2001 i686 unknown

Most of the programs included with the Debian GNU/Linux system are
freely redistributable; the exact distribution terms for each program
are described in the individual files in /usr/share/doc/*/copyright

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
You have mail.
Connection to mail.lns.mit.edu closed.
[gelinas@pea gelinas]$
This is because all users have their shell set to /bin/false on the mail server for security reasons. If mail is left on the server according to instructions in 7.6 How do I leave my email [...] then that email is available directly on every White II host, as well as via IMAP and APOP.

7.8 How do I change my Athena email forwarding such that it won't forward Athena mail to my workstation any longer?

Athena Mail forwarding is explained here. You might also want to read all of the Athena mail policies as well.

7.9 Before the upgrade my workstation had some of my mail, now it's gone. What happened to my mail and how do I get it back?

CS saved all the local mail on each workstation before performing the upgrade. If you had mail stored locally on that workstation, you should find it in your oldhome directory under the name "workstationname-localmail". If it's still not there please open a WREQ and let us know. The mail is almost certainly stored on the workstation and can be recovered easily.

8.0 WEB BROWSING UNDER LINUX:

8.1 Netscape 4 is old and crusty. When will CS support Netscape 6 or 7? What's Mozilla?

Netscape 4 is so old and crusty that Redhat will likely stop shipping and supporting the application in the next major release of Redhat Linux. This means CS must plan a migration to some browser which MIT Athena supports, if only for MIT digital certificates and SAP access. Currently MIT supports the old Netscape 4.79 and recently has announced support for Netscape 6.2.3 The question then becomes: will MIT support the Free Mozilla or will we have to install Netscape 6 or 7?

CS doesn't know the answer to that question. Both Netscape 6 and 7 are based on Mozilla, a Free web browser being developed by AOL/Netscape and the general Free Software community. Mozilla is more recent than the Netscape code and offers some features which AOL removes from their Netscape product line. One such feature blocks popup advertisements, another supports "Tabbed" browsing. If Netscape remains the only browser MIT supports we will have to support both Mozilla and Netscape indefinitely, otherwise CS will recommend a wholesale migration to Mozilla.

If you wish to install Netscape 6 for personal use on an individual workstation then simply download the installer and follow these installation instructions from the Netscape site. Be sure to choose either /export/data01/your_directory or /usr/local as the installation path so you have write permission for the installation.

8.2 What version of Mozilla is installed? How do I block popup advertisements and use "Tabbed Browsing"?

White II ships with Mozilla 1.0, the latest stable release of Mozilla. This is more recent than what ships with Redhat 7.3, but considering the importance of a modern web browser to the user community CS decided to upgrade to a third party package directly from the developers at mozilla.org. This browser supports two important features which every user will like: blocking popup advertisements and tabbed browsing. To block popup ads follow these instructions:

  1. From the "Edit" menu select "Preferences"
  2. Select "Advanced"
  3. In the "Advanced" submenu select "Scripts and Windows"
  4. Unselect checkboxes for "Open unrequested windows" and "Move or resize existing windows"

Done! For tabbed browsing simply press [ctl][t] at the same time to open a new tab instead of a full browser window. This allows you to have multiple web pages open within the same window rather than opening multiple browser windows for each new web site ([ctl][n] will do this).

8.3 With either Netscape or Mozilla, does it make sense to store the browser cache in my AFS home-directory?

No! For best possible performance users should change the path of their browser cache to /export/data01/my_directory if possible. Here's how:

For Mozilla:

  1. From the "Edit" menu select "Preferences".
  2. From the "Advanced" menu select "Cache".
  3. Look for "Disk Cache Folder" and press the "Choose Folder" button.
  4. In the file selector box click your way to /export/data01 (or /export/oldhome) -- it will find your username.
  5. Press "OK" in the file selector box and then "OK" again in the "Preferences" box.

Done!

For Netscape 4:

  1. From the "Edit" menu select "Preferences".
  2. From the "Advanced" menu select "Cache".
  3. Look for "Cache Folder" and press the "Choose" button.
  4. A standard Motif file dialog box will open allowing you to enter an explicit path to a cache location. Select /export/data01/your_directory.
  5. Press "OK" in the file dialog box and then "OK" again in the "Preferences" box.

Done!

8.4 Both Netscape and Mozilla embedd xpdf and/or Acrobat Reader in my browser window. How do I change that behavior?

This is due to plugger, a nifty Free netscape/mozilla pluggin manager which embeds various applications into the browser window. See: man plugger for details on how to configure the application. Modifying plugger's behavior is handled by editing pluggerrc, which is placed down in different locations down ~/.netscape or ~/.mozilla (the browser configuration) depending on which browser is to be affected. Should you wish to disable this feature please note that you'll have to then manually configure support for each of these MIME types as a "helper" application in the browser.

Try [ctl][t] to add a new tab in Mozilla, or [ctl][n] to create a new browser window in either Netscape and Mozilla, in order to read the pdf/ps file while browsing the web at once.

8.5 What's Galeon, and why should I consider Galeon over Mozilla?

Galeonis an alternate web browser designed specifically for use with Gnome, which uses Mozilla's Gecko rendering engine to render and display web pages. It's a good deal faster than Mozilla because it uses native Gnome/GTK GUI widgets instead of Mozilla's platform independent XUL widget toolkit.

8.6 Does White II support Windows Media, Apple Quicktime, Realplayer, Shockwave, etc. multimedia formats?

Yes and no. CS purchased a site license for Codeweavers Crossover Pluggin recently, a commercial application based on the Wine codebase. It supports all of these formats and more. Unfortunately, it wasn't delivered on time for our White II deployment so installation will have to wait until the next update. Details will be forthcoming as CS prepares to distribute Crossover Pluggin to the LNS user community.

9.0 LNS BATCH CLUSTER:

9.1 New batch system coming

With the latest desktop update, CS is rolling out a new batch system which will integrate our dedicated batch nodes with desktop workstations into a single unified cluster. Desktops will run jobs when idle and dedicated nodes will continue to run jobs 24/7. The batch system is called Condor, and development is spearheaded at the University of Wisconsin. As workstations come up with the new desktop software so will they automatically bind themselves to the new condor queueing server. Once the migration is complete, LNS users will have well over two hundred CPUs in the cluster available when idle for compute intensive jobs.

The new system offers several advancements over OpenPBS, our previous batch software, including job checkpoint and migration (for binaries linked against the condor checkpoint libraries), dynamic job suspension and restart (for when users require dedicated use of their desktops), and a great deal of flexibility in job resource allocation.

We will update the FAQ with more detailed information ASAP. However, users who wish to jump in and begin submitting jobs immediately may wish to start by reading the Condor Users's Manual. To submit jobs one need only create a condor_spec file containing the starting directory, the execution binary, the number of running instances per job, and log output files. Then run condor_submit with that condor_spec file as an argument. Given our AFS environment please note these special requirements:

  1. You must store your input data files, job output, log files, and the condor_spec file in NFS space and not in your AFS home directory. Condor jobs will be unable to access any files in your AFS homedirectory and will fail should you try.
  2. You must set an Initialdir in the spec file in NFS space, and not AFS space for the previous reason.
  3. You must submit jobs from within NFS space and not AFS space for the same reason.
  4. Please note that marie.lns.mit.edu is now part of the condor cluster and is a perfectly fine host from which to start jobs if your workstation has yet to be upgraded. Any workstation which has been upgraded will work to submit jobs from.
An example follows:

marie$ cd /net/pea/data01/gelinas/condor_jobs

(pea is my desktop host pea.lns.mit.edu, use your own data01 directory on any White host within LNS, including any previous White system)

Create a condor_spec file containing sample lines like so:

marie$ vi condor_spec 

Executable = setiathome     # Binary you plan to run on remote host

Universe = vanilla          # Vanilla denotes binaries which have
                            # NOT been linked against the condor 
                            # libs and as such cannot be checkpointed
                            # to a file and then migrated to other idle
                            # hosts in the condor pool.
       
Initialdir = /net/pea/data01/gelinas/condor_jobs/setiathome
                            # Initialdir sets the CWD for jobs. 

Queue 1                     # Start one binary instance on a remote host
                            # for this job submission. 

There are many other variables which allow you to set log output, job input, resource requirements, etc. Please see the Users Manual for details. To perform the job submission one need only enter the condor_submit command with the spec file as input like so:

marie$ condor_submit condor_spec
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 11.
Condor_q provides a list of jobs submitted from a workstation and their current status:
marie$ condor_q


-- Submitter: marie.lns.mit.edu : <198.125.160.28:59172> : marie.lns.mit.edu
  11.0   gelinas        10/21 17:13   0+00:00:07 R  0   0.1  setiathome        

1 jobs; 0 idle, 1 running, 0 held
Other useful commands include: condor_status, condor_rm (delete a job), and condor_compile (for linking your binaries against the condor libs). Our binaries are stored down /lns/condor/bin in AFS project space.

Please note that we're using setiathome for testing and don't actually run it to complete work units.

Finally, if you're running an application binary compiled from source please consider researching how to run condor_compile against your object files in order to link the condor libraries against your binary. By doing so the condor system will be able to checkpoint the state of your job into a file periodically. This checkpoint file can then be copied to another condor host on demand, allowing your job to run whenever CPU resources are available instead of only when the assigned execution node is idle as with vanilla jobs. Another advantage of condor standard jobs is that periodic checkpointing provides a failsafe against job crashes; should the application segfault, or the node crash, the execution state of your last checkpoint would have been saved and thus could be restored with little loss. If you're running jobs which could take weeks or months to complete this feature could be a real life saver.

10.0 MISC QUESTIONS AND TIPS:

10.1 What happened to StarOffice?

Sun has stopped supporting the old StarOffice-5.2 and doesn't allow downloads any longer. Fortunately, Sun released the code to StarOffice and OpenOffice is the result. This has better Microsoft Office compatibility and separates each application into it's own window and process space. This means no more huge Staroffice desktop to deal with, and the application is smaller and runs faster.

10.2 Great! How do I set up OpenOffice instead?

Run Setup from the application menu, or type:

/lns/openoffice/OpenOffice.org1.0/setup

Here are step by step instructions once the setup program starts:

  1. The introductory splash screen comes up and you are prompted to hit "Next" or "Cancel". Hit "Next".
  2. A text box pops up with information about OpenOffice you might want to read. Once done press "Next".
  3. A "ClickWrap" license explaining the SISSL, GPL, LGPL, and BSD licenses, which you must accept in order to install the application. This is Free Software. Read it if you like (we encourage this) but you needn't worry about the contents of this particular license. Click "Accept".
  4. You are now prompted to enter personal information. You needn't do this, though it may be of some help when creating form letters and such. Enter what information you prefer and click "Next".
  5. A dialogue box will prompt you to choose a "Workstation Installation" or a "Local Installation" in your home directory. Choose the "Workstation Install" (the default) instead of the "Local installation", since copying all of the files into your home directory (served across the network) is no faster than starting OpenOffice from our central AFS servers (and could be much slower in certain situations).
  6. It now prompts you for the directory in which you wish to install OpenOffice components. The default is fine, click "Next".
  7. You've answered all the necessary settings questions before installation and the installation wizard tells you so. Click "Install".
  8. Later in the installation a dialog box will claim it can't find a Java runtime environment for OpenOffice. It then defaults to "Java and Javascript are not supported" should you press "OK". Instead press "Browse" and click your way to /usr/java/jdk1.3.1_02, then click "OK". The previous dialog box will recognize the Sun Java Development kit and its associated radio button will default to "Already installed on the system". Now click "OK". At this point the installer does a bunch of pretty stuff most other GUI installers do. Go grab a cup of coffee.
  9. Installer finishes chugging away and a dialog box pops up to tell you that the installation is finished. Click "Complete".
You may now use OpenOffice by selecting its various components from the application menu or typing:

/lns/openoffice/OpenOffice.org1.0/program/soffice

10.3 OpenOffice loads much too slowly. Is there a way to fix this?

OpenOffice loads across AFS space, so it should be cached on your local workstation after the first read. However, if this isn't good enough and you have a good deal of free disk space in your data01 directory consider a "Local installation" down /export/data01/your_directory. This will install all the OpenOffice components into the local disk and provide the best performance possible, at the cost of about 100MB or so. Make certain you then start OpenOffice from the new installation path, like so:
   $ /export/data01/your_directory/OpenOffice.org1.0/soffice
instead of from the "Applications" menu, ~/OpenOffice.org1.0/soffice, or /lns/openoffice/OpenOffice.org1.0/program/soffice.

10.4 What is VMWare and how do I obtain a license?

VMWare is a commercial application which emulates a complete PC inside a window on your desktop, or even full screen. This can be used to run Windows, Linux, BSD, BeOS, FreeDOS -- just about any OS environment available for the typical x86 PC. Most LNS users of VMWare run Windows and use it primarily to run Microsoft Office. Be aware that LNS doesn't provide licenses for Microsoft Windows or Office, so users wishing to obtain a VMWare license to run Windows/Office will have to purchase Microsoft licenses on their own.

10.5 I need a package installed on my workstation not provided with White II. Can I do this on my own?

As long as the application doesn't need root privileges for the installation (most don't) you should have no problem. Most open source packages come in source form and default to /usr/local as the installation tree. On White I and White II /usr/local should be set to mode 777 (rwxrwxrwx) giving all users full write authority. Many binary packages expect installation down /usr/local as well, or can at least be directed to install down this tree. One other option is to install in your personal /export/data01 directory, though then you must set directory permissions for the binaries should you wish to share the application with others on that workstation. If it's a large installation you might want to consider installing in data01 as /usr/local normally defaults to a 1GB disk partition.

10.6 I have a bunch of stuff on host1 that I want copied over to host2. Do I really have to scp this stuff by hand?

Between hosts within LNS, no! And you didn't for White I either. We at CS have taken great pains to make file transfer between hosts as easy as possible. All local data01 and home-directories are exported via NFS to every LNS host. In addition every White host (I and II) host runs an automounter which will mount and unmount any White NFS exported filesystem on demand. Here is an example:

   $ cd /net/ctpflyingmonkey/oldhome/me
   $ ls
(my files on ctpflyingmonkey list out)
   $ cp somefile /net/ctpdingo/oldhome/me
(file is copied from ctpflyingmonkey to ctpdingo)
   $ cd /net/ctpblue/data01/me
   $ cp anotherfile ~
(file in my data01 directory on ctpblue is copied to my AFS home-directory)

Combined with cp -r, tar, or cpio and you can move entire directory trees from one host to the next.

To copy files to hosts outside the LNS network you'll have to use traditional scp, ftp, or email an attachment. To copy entire directory trees experience with tar or cpio can be of much benefit. See: man tar or man cpio for details on the various commands. One easy way which saves ownership and permissions properly:


   $ cd /path/to/original
   $ tar -cpf - . | (cd /path/to/copy && tar --overwrite -xvpf - )
(...Bunch of stuff gets copied...)
   $
The first tar operation creates a tarball from "." (the current working directory) and send it to "-", denoting stdout. From stdout it's piped to a subshell in which the first action is to change directory to an empty location for the copy and then should that succeed extract the contents of "-" (or in this case stdin) with the second tar command.

10.7 How do I mount a floppy or CD without root privileges?

Users should have the authority to mount and unmount floppies and CDs without root privileges. See: /etc/fstab:
/dev/cdrom		/mnt/cdrom		iso9660	noauto,owner,ro,user 0 0
/dev/fd0                /mnt/floppy		auto	noauto,owner	0 0
pam_console.so (one of the authentication modules used in gdm) changes ownership of /dev/cdrom and /dev/fd0 during login so users (for that session) have ownership over these devices. When combined with the "owner" stanza in /etc/fstab end users can simply mount and umount these devices at will. Gnome Nautilus (when it works) makes things a bit easier for end users by automatically mounting and unmounting CDROMs when inserted, just like Windows Explorer. Users can also right click the desktop to get a context sentitive Nautilus menu which allows explicit mounting and unmounting of CDs and floppy disks.

Floppy users should also know about mtools (see: man mtools), a series of programs which allows the manipulation of MSDOS formatted floppies within Linux. An example follows:

$ mdir a:
(Contents of floppy list out)

$ mcopy file.txt a:
(File from the current directory named "file.txt" is copied to a:, normally the first floppy disk drive)

10.8 I really don't like {/bin/bash,/bin/tcsh,/bin/zsh,etc}. How do I change my shell?

Run chsh at a command line prompt like so:

   $ chsh -s /bin/bash
   SASL/GSSAPI authentication started
   SASL SSF: 56
   SASL installing layers
   SASL/GSSAPI authentication started
   SASL SSF: 56
   SASL installing layers
   $
This changes your shell to /bin/bash in our lop level LDAP database. Give it time to propagate out to our slave servers (an hour or so), then log out and log back in. Your shell should be changed!

10.9 Today I accidentally deleted a file(s) in my home-directory. How do I get my file(s) back without requesting a restore from CS?

Very simple. As long as the file wasn't created and then deleted today, or as long as you didn't delete the file previous to the latest clone operation, you can recover that file (or an entire directory tree) like so:


    $ cd ~/.clone
    $ cp deleted_file ..
    $ cd ..
Done! Note that .clone contains an image of your entire home-directory tree as of the last clone operation (normally midnight every evening), so keep in mind that if you accidentally delete it today you have until midnight tomorrow to make use of your clone volume. cp -r, tar and cpio are good tools for copying entire directory trees should you mistakenly perform an rm -rf. See FAQ entry on copying from host1 to host2 for an example on the use of tar in this situation.


J. Maynard Gelinas - Last modified August 20, 2002