Welcome to NimbleDais!

nim.ble[nim-buhl] – adjective – agile, active
da.is[dey-is, dahy-, deys] – noun – a raised platform

NimbleDais is a hub for “All Things Technology” specifically in the fields of operating systems, virtualization, data mining, machine learning, mobile and Web 2.0. My plan is to pick a subject in one of these areas and examine it under the hood. I’ll publish technical reports, block diagrams, code snippets and perform analysis on them. You’re free to use all the materials presented here unless stated otherwise. I hope to make this appealing for novices and experts alike.

If you spend some time and find this useful then I have achieved my goals. Please shoot me an email if you find something in error or require more explanation.

Leave a Comment

Last Project in Twenty Eleven

My work on emulating the Microsoft Kinect system on Windows 7. Not bad for the few hours I put in..

kinect

, ,

Leave a Comment

Cassandra Data Model

Cassandra is a distributed key-value store. When I hear the term ‘key-value’, I am reminded of maps or associative arrays. If you are thinking along the same lines then you are not ways off from understanding the Cassandra’s data model.

Column and Column Family
At a primitive level, the value object for the term ‘key-value’ refers to a column name-value-timestamp entity. All three elements – name, value and timestamp are provided by the applications.

image

Both the column names and values are byte arrays. In addition, the application also specifies the sort ordering on the columns. The columns are sorted by time or column name. Since a column name is a byte array and with a sort order on a column name, it can be useful to store a value in the column name! The column value field may not be used and can be an empty byte array. It is also possible to write a custom sort method for column names. The timestamp value is used for read consistency in case of conflicts and so it is important for applications to use synchronized clocks. Most applications do not use the timestamp values though.

A collection of columns along with the row keys forms a column family. This is analogous to a table in the RDBMS system. Within a column family, each row is uniquely identifiable by a key. Each row in a column family may have a different set of columns. This is the schema-less model in Cassandra. The schema for a column family though is defined upfront and cannot be modified without restarting the Cassandra process.

image

Every operation on a single row is atomic. In addition, rows are sorted by their key values and placed on different nodes in a cluster according to its Partitioner – RandomPartitioner or OrderPreservingPartitioner.

Cassandra also provides for indexes of the type KEYS on the column values called secondary indexes. This allows for range queries on the column values.

Super Column and Super Column Family
A super column is an associative array of columns. Similar to a column, it is a name-value entity but does not have any timestamps associated with it. The value for a super column is a map of columns.

A super column family is a collection of super columns along with the row keys. Similar to columns, the super column names as well as the sub-columns names inside super columns are sorted. In addition, the super column names are indexed though the sub-column names inside super columns are not indexed. Thus all the data for a row key in a super column is accessed together. A super column is typically used for de-normalizing data from other column families.

image

Keyspaces
A keyspace is a top-level container for column or super column families. A keyspace is analogous to a schema in the RDBMS system.

References
Cassandra Data Model
WTF is a SuperColumn? An Intro to the Cassandra Data Model

, , ,

Leave a Comment

SVM From the Ground Up

A couple of months back I wrote a post on SVM but have always wanted to write one that explains SVM to people with some knowledge of linear algebra but no background on machine learning. This is my attempt to do so.

Linear Regression
Consider a training set with m training examples. The notation X denotes the input variables or features for a training set while Y denotes the output variables or target variables.
An ith training example is denoted as (x(i), y(i)). The number of features in a training set is denoted by n. Define learning hypothesis as:

image

The function h(x) is a predictor for the corresponding value of y. When the target variable is continuous, the learning problem is called a regression. When the target variable is discrete, the learning problem is called a classification problem. When y is a linear function of x, then:

image

Gradient Descent
To pick the parameters, define a cost function as:

image

The goal is to minimize the cost function This is the least squares cost function giving rise to the ordinary least squares regression model. An algorithm to minimize the cost function, called gradient descent, works as follows. Start with some initial guess for the parameters. Plug them into the cost function. Make some changes to the parameters so as to make the cost function smaller. Continue this until the parameter values minimize the cost function. Now, a property of gradient descent is that depending on the initial guess, the end result may be different. This is called reaching the local minima. Gradient descent is formally defined as:

image

The learning rate controls the rate of change and is often hand-coded. For m training examples, the update function can be written as:

image

The meaning of convergence is that the change in the value of the parameters doesn’t change the value of the cost function by a lot between successive iterations. Now gradient descent may not seem that useful if the parameter values are different for the initial guess but in this case, the cost function is a convex quadratic function and has a single global minima with no other local minima. This means that gradient descent will always converge to the same value for these parameters. The algorithm given above is called the Batch Gradient Descent. For a very large training set, the algorithm called Stochastic Gradient Descent is used and is as given below:

image

The advantage of this approach is that all parameter updates happen with the first training example. This may not converge to the global minima but usually wanders around the global minima. In practice this works well especially for a large training set.

It is possible to mathematically derive the parameters by taking the derivative of the cost function and equating to 0. Doing so yields the following equation in a closed-form:

image

This is the non-iterative form of gradient descent. This equation is also called the normal equation. It is important to note that the term XTX may not always be invertible. This is the case when the features in the training set are dependent.

Probabilistic Interpretation
This section examines the probabilistic interpretation of linear regression. This will also explain why the cost function (of least squares) is defined as such. Define an error term that captures un-modeled effects or even random noise. In that case the relation between the input features and target variables is given as:

image

Due to the central limit theorem, assume that the error terms are independent and identically distributed (IID)  from the Gaussian probabilistic distribution. The Gaussian distribution density of the error terms is given by:

image

This can be viewed as a function of the model parameters by defining the likelihood function. The likelihood of the parameters is the same as the probability of data. This can be written as:

image

From the principle of maximum likelihood estimation, choose the model parameters to maximize the likelihood function. In other words, choose the model parameters to make the data highly probable. For mathematical convenience, maximize the log likelihood (instead of likelihood) function defined as:

image

Thus maximizing the log likelihood is the same as minimizing

image

which is the least squares cost function defined above. Note that the model parameters do not depend on the variance as long as it is positive.

Classification and Logistic Regression
A classification problem is the one where the target variable is discrete. For simplicity, consider a binary classification problem where y can take any of the two values, 0 or 1. Correspondingly, change the hypothesis so that it outputs a value in the range [0,1]. Furthermore, instead of the output of the hypothesis being the output for the target variable, it is the probability of a value for the target variable.

image

Similar to the least squares regression, the model parameters can be derived using the maximum likelihood estimate of the parameters.

image

For convenience, maximize the log likelihood instead of the likelihood function. The log likelihood for the model parameters is given as:

image

In order to maximize the log likelihood, use the gradient ascent technique. The formula for gradient ascent is given as:

image

Therefore taking the derivative of the log likelihood for the gradient ascent rule yields:

image

The gradient ascent rule therefore appears as:

image

This derivation may appear similar to the derivation for the batch gradient descent, but that is only cosmetically true. In here, the hypothesis function is a sigmoid function as opposed to the linear function for batch gradient descent.

, , , ,

Leave a Comment

Native Boot from VHD

Microsoft shipped a new feature in the client and server SKUs of Windows 7 called the Native Boot from VHD. If imitation is the sincerest form of flattery, then this feature ranks pretty high as a number of Linux distributions have adopted it. Still, from speaking to the technologists in the valley, I don’t believe this feature is widely known or used. This post describes this feature and its use cases, shortcomings and opportunities for the entrepreneurs in the wild.

Basic Functionality
A VHD file is a virtual hard disk file. A VHD file is traditionally used as a disk for virtual machines on Hyper-V or Xen hypervisors. This means that a virtual machine (VM) boots from the contents of the VHD files in a hypervisor. The VHD files are also used as data containers in virtual environments.

Windows 7 has a feature called Native Boot from VHD that allows a physical machine to boot from the VHD files. This does not require any use of hypervisor or virtualization software thereby providing near-native performance! The VHD files reside in the file system on a local disk.

In order to boot a physical machine with VHD, a copy of Windows 7 is installed to a VHD file and a new boot entry for the VHD file on a local disk is created for the Windows loader.

Cascading VHDs
The VHD files come in three flavors – Fixed, Expandable and Differencing. A fixed VHD is sized at the time of creation and permanently stays the same size irrespective of the size of contents inside the VHD. An expandable VHD starts with a minimum size and grows as contents are added to the VHD file. A differencing VHD has one or more read-only parent VHDs and stores the differences in the disk blocks from the parent VHDs. For differencing VHDs, a parent VHD can be either of fixed, expandable or differencing VHD.
The differencing VHDs provide for some interesting use cases that are described in the next section.

Using VHDs
1. Basic Use
A basic use of the Native Boot from VHD feature is to have a new instance of Windows 7 on a machine. An advantage of this approach is that there is no need to look into a crystal ball and create multiple boot partitions for the desired number of Windows instances on a physical machine. Adding a new Windows instance is as simple as dropping a new VHD file and creating a boot entry for it. A second advantage is that a Windows instance inside a VHD file does not incur any performance penalty as suffered from a virtualization solution. A third advantage is that there is no restriction on the order of Windows installations governed by the operating system version. In a traditional setup, a Windows installation overwrites the boot loader thereby requiring a newer version of Windows to be installed only after the older version of Windows. With Native Boot from VHD, since the Windows installation is to a VHD file that does not update the boot loader on a local disk, the Windows version ordering restriction is not present. The Native Boot from VHD feature does require the latest boot loader among all the Windows 7 VHD instances to be installed to a local disk.

2. Steady State
A differencing VHD can be used to enhance the security of a system. A physical machine can be configured to boot from a cascade of VHDs. This cascade is composed of a set of read-only VHDs and a read-write differencing VHD. The content of the read-write VHD is discarded on every boot thereby resetting the state of the system. This is referred to as Steady State further in the post.

3. Cloud-based Desktop
A virtual machine disk in a VHD file format is supported by a variety of hypervisors like ESX, Xen and Hyper-V. Once the desktop environment is cut into a set of VHD files, it is a smooth transition to run the desktop environment inside the data centers or on local machines. This provides high flexibility for managing the cloud based desktop environments.

4. Storage
Use of VHDs improves the manageability of desktop environments. The IT department in an organization only needs to cut and manage a fixed set of VHDs for the base operating systems. When all users use the same set of VHDs, there is an opportunity to save on the disk space with de-duplication feature of the storage subsystem.

Opportunities
The use of VHDs in the enterprise is promising but presents certain road-blocks for wide-scale deployment. Instead of enumerating its shortcomings, I have presented them as opportunities for some budding entrepreneur.

1. VHD Modifications
Microsoft provides the disk management tools for creating VHDs. These tools are easy to use but do not provide an ability for offline modification or conversion of VHD from one type into another. Such tools will be useful in preventing loss of valuable time and efforts put into creation of VHDs. This should not be difficult to accomplish since Microsoft has published the VHD specification.

2. Windows Deployment
The whitepaper on Native Boot from VHD outlines the steps to install Windows 7 to a VHD file and a boot entry for the Windows loader. An automated tool that accomplishes this will go a long way in simplifying the use of this feature.

Further, if a Windows instance in a set of VHD files is required to boot on a variety of physical machines, then the Windows instance is required to be ‘syspreped’. This is an important step in preparing the Windows image for booting on different hardware configurations.

3. Steady State
The use-case outlined above is of providing a steady state by discarding the contents of a differencing disk on every boot. Unfortunately, Windows 7 does not support this out-of-the-box. Still it is not difficult to accomplish this by allocating and switching between two differencing VHDs for a set of VHDs. To be precise, allocate two differencing VHDs and create two boot entries, one for each differencing VHD. Boot into any one configuration. This will force all run-time changes to the active differencing VHD. Clear the contents of the other differencing VHD and set the system to boot into the other configuration on subsequent boot. Repeat this on each boot. This ensures that the system always boots in a clean state each time.

4. Windows Installation on a Physical Machine
The whitepapers from Microsoft on Native Boot from VHD assume that there is a copy of Windows 7 installed to a local disk. When a physical machine boots from a set of VHD files, the VHD files form a virtual disk for that Windows instance. Thus none of the files from the local disk installation of Windows is used except for the boot loader that comprises of the Master Boot Record (MBR), boot manager and BCD store.  Thus any commercial solution that uses the Native Boot from VHD feature should not require a copy of Windows 7 to be installed to a local disk. I have successfully accomplished this at my work place by putting a copy of MBR, boot manager and BCD store to the local disk. This not only saves the local disk space but eliminates the work to keep the local copy of Windows up to date with security patches.

5. VHDs on a Network Share
The above step outlines a technique to do away with the Windows installation on a local disk.  It is possible to take this a step further and do away with requiring the VHDs to be copied to a local disk on a physical machine. In this case a physical machine boots from a set of VHDs located on a network share. There are two ways to accomplish this. One is by updating the pre-boot environment to support SMB or CIFS protocol and having the Windows loader read the files necessary to boot Windows from a network share. Unfortunately, Microsoft does not ship the headers and libraries to build applications in the pre-boot environment. A second approach is to leave all boot critical files for a Windows instance on a local disk (along with the MBR, boot manager and BCD store) and keep the remaining files on a network share. Once the boot critical files in Windows are loaded and the necessary components initialized, the remaining files can be read from the network share. This will be a cornerstone feature for any solution that uses the Native Boot from VHD. This feature provides huge economic gains in terms of manageability and security.

6. Patch Management
A differencing VHD can be used for distributing patches and security updates in an enterprise environment. Since this differencing VHD is required to run on a variety of hardware, it needs to be ‘syspreped’ before distribution. Once the patch VHD is ready, a new boot entry is created on every physical machine. With the new boot entry, the patch VHD is layered on top of existing Windows VHDs. A new read-write differencing VHD is layered on top of the patch VHD for machine specific run-time changes. This allows for easy distribution of patches in a scalable fashion.

Once the number of patch VHD grows, it is possible to fold them into a single VHD for ease of distribution.

7. Application VHDs
A set of VHDs can be used as containers for distributing applications in an enterprise environment. Each application can have its own set of VHDs or a set of VHDs can bundle a suite of applications for ease of distribution and security. The application VHDs need to be ‘syspreped’ before distribution.

8. VHD Management
A commercial solution that uses the Native Boot from VHD feature ought to provide a VHD management solution. The management solution tracks the set of changes in each VHD and the hierarchical ordering of VHDs. The management solution also provides a set of tools for defragmenting the disk blocks inside the VHD files and compressing them. In absence of such a solution, management of VHDs will become cumbersome and offset the advantages of the Native Boot from VHD feature.

9. Locked down Windows Instances
In addition to steady state, a commercial solution can bundle a solution for protecting the local file system along with group policies for a complete lockdown of a Windows instance. This type of solution has a huge demand in educational and commercial venues like schools, libraries, airports and hospitals.

Conclusion
Silicon Valley has a knack for identifying opportunities for weaknesses in products, filling holes, improving ease-of-use and making it commercially viable. The use of Native Boot from VHD feature is a gold mine in this regard ;)

References
Understanding Virtual Hard Disks with Native Boot
Virtual Hard Disk Image Format Specification
What Is Sysprep?

, ,

Leave a Comment

Discriminative, Generative and Discriminant

One fine morning, on my way to work, I found myself in a state of utter confusion over a set of machine learning algorithms that I had read few days back. The confusion was over the use of a certain word and its derivatives. So here it goes..

Of the many machine learning methods, there are learning algorithms that model p(y|x), the conditional distribution of y over x. These algorithms are called discriminative learning algorithms. Few examples of such algorithms are logistic regression and perceptron algorithms. There are other algorithms that try to model p(x|y) and p(y). These algorithms are called generative algorithms. An example of a generative algorithm is the Gaussian Discriminant Analysis or GDA. The confusion that crept in my head was with the use of the word ‘discriminant’ for a generative algorithm versus discriminative algorithm. I am hoping that with this post, I have set my confusion to rest.

,

Leave a Comment

Support Vector Machine

A support vector machine (SVM) is a supervised learning method used for classification and regression. A set of input data with labels is used to train an SVM model. Each input in the training set is marked as belonging to one of the two input classes.  The model then assigns a new input to one of the two classes.

SVM as Hyperplane
An SVM model is a representation of input as points in high dimensional space. Points are mapped so that the two categories are divided by a clear gap that is as wide as possible. New input is mapped into the same space and predicted as belonging to a class based on the side of the gap they fall on. A support vector machine creates a hyperplane that provides a wide separation between the nearest training points of any class. In the figure below, the hyperplane represented by a solid line provides a better categorization than the dotted one.

image

Consider a formal definition of SVM. A training dataset D is a set of n points of the form:

image

where yi indicates the class to which a point xi belongs. The goal is to find a maximum-margin hyperplane that divides the points belonging to class yi = 1 and yi = –1. An equation for a hyperplane is given as: w.x – b = 0 where w is a vector normal to the hyperplane and b/||w|| represents the offset of the hyperplane from the origin along the normal vector.

Form two parallel hyperplanes at the margin of the data points such that there are no points between them. These hyperplanes can be described by the equations:
w.x – b = 1 and w.x – b = –1.

image

The distance between the hyperplanes at the margin of the data points is 2/||w||. So in order to maximize the margin, minimize ||w||. The constraints for the data points are given by the equations: w.xi – b >= 1 and w.xi – b <= –1.
Writing this as a single equation yields: yi(w.xi – b ) >= 1.
Thus the constrained optimization problem can be described as:
Minimize ||w|| (in w, b), subject to yi(w.xi – b ) >= 1.
Substituting ||w|| by 1/2||w||2 for convenience, the constrained optimization problem becomes:
Minimize 1/2||w||2 (in w, b), subject to yi(w.xi – b ) >= 1.

Primal Lagrangian Form
The constrained optimization problem can be written in a primal Lagrangian form as:

image

In this form, the Lagrangian multipliers play an adversarial role in enforcing the margin constraints. Taking partial derivatives of the Lagrangian function w.r.t w and b and equating them to 0 yields:

image

Note that w is a linear combination of the data points xi. Per the Karush-Kuhn-Tucker (KKT) complementary slackness condition:image

When yi(w.xi – b) <> 1 => the Lagrange multiplier is 0 and xi is irrelevant. When the Lagrange multiplier <> 0 => yi(w.xi –b) = 1 and xi is a support vector. Thus support vectors correspond to the non-zero values of Lagrange multipliers.

Substituting the above relations in the primal form gives:

image

Dual Lagrangian Form
A dual Lagrangian function is defined as the minimum value of the Lagrangian over primal variables. As per Slater conditions, it follows that:

image

The left hand side, solved above, is the primal form while the right hand side is the dual form. The dual form of the solution is obtained by:

image

References
Support Vector Machine in Wikipedia
Constrained Optimization Problems from Andrew Ng’s CS229 course

, , ,

Leave a Comment

Design Patterns in Map-Reduce

Map-Reduce is a programming paradigm for parallel processing on large scale data. Map-Reduce programs typically run on a cluster of machines built from commodity hardware. There is a master controller that delegates data processing task to a set of mappers and reducers. The master controller keeps tab on the data portions handed to each and whether they are successfully processed. In the event that an instance of a mapper or reducer fails, the master controller clears its intermediate state and restarts one. The master controller selects a machine in a cluster to run an instance of a mapper or reducer based on its proximity to data. Typically a user specifies the number of reducers while the system selects the number of mappers.

Broadly speaking, mappers read files or chunks of files and generate a sequence of key-value pairs. This intermediate form of data is grouped such that a reducer processes all the data for a single key value. The reducer performs combination or reduction on the key-value pairs generating a single key and associated values as its output. The output from reducers may be fed into other instances of mappers and reducers. Thus it is possible to cascade a series of mappers and reducers for some implementation.

There are a number of challenges with using map-reduce for data processing. First and foremost is the problem of retrofitting a solution to the map-reduce paradigm. Second is forming a solution that is efficient and scalable. I’ve seen cases where a map-reduce solution on a cluster of machines underperforms the execution time on a single machine.

The map-reduce programming is not for any problem related to large datasets. In fact it is mostly suited for specific type of problems that deal with large amounts of mostly read-only data in large files that can be redundantly placed on a large number of machines. Further the algorithm needs to be such that it can be easily parallelized.

Over the years, some interesting design patterns have emerged as commonly applied solutions to recurring problems with large data set using map-reduce. In this blog, my goal is to provide a collection of these design patterns. I’ve another section here that provides specific examples for the use of map-reduce programming. These examples may or may not correspond to a design pattern described here. My goal is to add newer design patterns and examples as I come across them over the next several months. So keep visiting this post for more information. I will quote the sources for the material in here for your reference. If you know the techniques that are not covered here, kindly point them to me and I will update my entries. This is pretty light weight for now but stay tuned.

Design Patterns
Combiners
In the case where a reducer applies an associative and commutative operation, there is an option to push some of the reducer functionality into a mapper. This sub-task in a mapper is the combiner. This doesn’t eliminate the need for reducers but is absolutely good for reducing the communication overhead between the mappers and reducers.  In the word count example,  a combiner aggregates the key-value pairs for a document or an instance of a mapper.

Examples
Word Count
In this example, the input is a very large repository of documents and the goal is to count the number of occurrences for each word in the collection.  A file in the repository is an input element to a mapper. The mapper parses the document and generates a sequence of key-value pairs such that the key is a word in a document and its value is 1 for frequency. Thus the output of a mapper is a sequence of key-value pairs of the form:

(w1, 1), (w2, 1),…,(wn, 1)

The master controller merges the output of mappers into as many intermediate files as the number of reducers. This is done in a way that each reducer processes all the key-value pairs for the same key.

The reducer takes the sequence of key-value pairs and generates as output a key-value pair with the same key and aggregated sum of values for the sequence.

This gives the frequency of all words in a repository using map-reduce.

Matrix-Vector Multiplication
Given a matrix M of size n x n and a vector v of length n, the matrix-vector product is a vector of length n whose ith element xi is given by:

mm

In the first case, assume that n is a large number though vector v can still fit in main memory and is part of an input to a mapper. A mapper takes as input a row-wise chunk of matrix M of size j and vector v and produces a key-value pair of the form (i, mij * vj1). Thus, each term of sum for xi1 is output with the same key. The reducer sums all the values associated with a key. This completes the matrix-vector multiplication using map-reduce.

In the second case, assume that n is so large that vector v cannot fit in main memory. It is still possible to use the algorithm in the previous case, though this may require frequent disk accesses. A better approach is to partition the columns of a matrix and rows of a vector into k blocks such that a single block of a vector can fit in main memory. The implementation of a mapper and reducer is same as in the previous case with the differences being the size and splitting of data chunks for these processes.

image

In the third case, partition the matrix into k2 blocks and vector into k blocks. A mapper gets a single square block of the matrix and the corresponding vector block. A single vector block is sent to k mappers. Both the matrix and vector blocks can fit in main memory for fast computation.

image

Matrix Multiplication
Given a matrix M with element mij in row i and column j and matrix N with element njk in row j and column k, the product P = MN is the matrix P with element pik in row i and column k given as:
 

image

A requirement for matrix multiplication is that the number of columns of M equal the number of rows of N. In here, I have provided two somewhat similar solutions for matrix multiplication using Map-Reduce.

The first solution is a cascade of two map-reduce operations. In the first stage, the mapper generates a key-value pair of the form (j, (M, i, mij)) and (j, (N, k, njk)). The reducer examines the value sets from M and combines them with the value sets from N to produce a key-value pair of the form (j, (i, k, mij * njk)). In the second stage, the mapper groups all the value sets for a single key of the form (j, [((i1, k1), v1), ((i2, k2), v2),…,((ip, kp), vp)]) and generates a key-value pair of the form ((i1, k1), v1), ((i2, k2), v2),…,((ip, kp), vp). The reducer in the second stage produces a sum of values associated with a key. This completes the matrix multiplication using map-reduce in two stages.

The second solution performs the matrix multiplication in one map-reduce pass. The mapper produces key-value pairs for each element mij in M of the form ((i,k), (M, j, mij)) for k = 1, 2,…, up to the number of columns of N. Similarly the mapper produces key-value pairs for each element njk in N of the form ((i, k), (N, j, njk) for i = 1, 2,…, up to the number of rows of M. The reducer has two lists of values corresponding to matrices M and N for each key (i,k). The reducer matches the values on the lists for the same values of j and multiplies the third element in the value set. This is then summed to produce a value output for the key (i,k). This completes the matrix multiplication using map-reduce in one pass.

References
Mining of Massive Datasets – Anand Rajaraman and Jeffrey D. Ullman

Leave a Comment

Singular Value Decomposition (SVD)

During my friend Charles’s visit to the bay area, we had an interesting conversation on SVD as a learning technique. For those of you who are not familiar with SVD, let me first describe what SVD is and how it can be used.

In linear algebra, the Singular Value Decomposition is a factorization of a rectangular matrix of the form:

image

  • U is an orthogonal matrix. The columns of U are orthonormal eigenvectors of AAT and also called the left singular vectors of A. UTU = I
  • V is an orthogonal matrix. The columns of V are orthonormal eigenvectors of ATA and also called the right singular vectors of A. VTV = I
  • S is a diagonal matrix. The diagonal entries are called the singular values of A. The non-zero singular values are the square roots of eigenvalues from U or V in descending order
  • The diagonal matrix is uniquely determined by A though the matrices U and V are not.

Let’s look at an example to better understand the concept of SVD.
Define matrix A as:
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]   -3   -2   -1

Correspondingly, AT is:
     [,1] [,2]
[1,]    1   -3
[2,]    2   -2
[3,]    3   -1

In order to find U, compute AAT:
     [,1] [,2]
[1,]   14  -10
[2,]  -10   14

The next step is to find the eigenvalues for AAT. Solving the equation for eigenvectors gives the eigenvalues for a matrix. The equation for eigenvector is defined as:

image

Writing this in an equation form yields two linear equations:
14 * x1 – 10 * x2 = lambda * x1
=> (14 – lambda) * x1 – 10 * x2 = 0
-10 * x1 + 14 * x2 = lambda * x2
=> –10 * x1 + (14 – lambda) * x2 = 0

Equating the determinant of coefficient matrix to 0 gives lambda values or eigenvalues as 24 and 4. Plugging these values into the two linear equation above yields:
x1 = –x2 and
x1 = x2

Now form a set of eigenvectors, corresponding to each eigenvalue, such that the resultant matrix is easier to work with. Note that the eigenvector for the largest eigenvalue is placed in the first column, eigenvector for the second largest value is placed in the second column and so on.
     [,1] [,2]
[1,]   -1   -1
[2,]    1   -1

Now we need to convert the above matrix into an orthonormal matrix. This is done by applying the Gram-Schmidt orthonormalization method shown below:

image

Normalizing the first column vector gives:
           [,1]
[1,] -0.7071068
[2,]  0.7071068

Applying the Gram-Schmidt process to the second column vector gives:
     [,2]     
[1,]   -1
[2,]   -1

Normalizing the second column vector gives:
           [,2] 
[1,] -0.7071068
[2,] -0.7071068

The resultant matrix U is given as:
           [,1]       [,2]
[1,] -0.7071068 -0.7071068
[2,]  0.7071068 -0.7071068

Similarly, it’s possible to calculate V based on ATA. The resultant matrix V is:
           [,1]          [,2]       [,3]
[1,] -0.5773503  7.071068e-01  0.4082483
[2,] -0.5773503  6.234162e-19 -0.8164966
[3,] -0.5773503 -7.071068e-01  0.4082483

Correspondingly, VT is:
           [,1]          [,2]       [,3]
[1,] -0.5773503 -5.773503e-01 -0.5773503
[2,]  0.7071068  6.234162e-19 -0.7071068
[3,]  0.4082483 -8.164966e-01  0.4082483


The diagonal matrix S is composed of square root of non-zero eigenvalues on the main diagonal in descending order. The eigenvalues for both U and V is the same:
         [,1] [,2] [,3]
[1,] 4.898979    0    0
[2,] 0.000000    2    0

Putting all together:
U * S * VT = A
> U %*% S %*% t(V)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]   -3   -2   -1

Visualization of SVD

image

The matrix decomposition can also be visualized as three geometric transformations – a rotation followed by a stretch followed by another rotation. VT describes an orthonormal basis in the domain, U describes an orthonormal basis in the co-domain and S describes the stretch for vectors in VT that leads to vectors in U.

An example below illustrates a transformation of a circle by a matrix. This is shown in two different ways. The first part is a transformation by a matrix. The second part illustrates each stage of transformation by the ordered matrix components.

svd

Applications of SVD
Dimensionality Reduction/Data Compression
SVD is a method for identifying and ordering the dimensions along which the data points exhibit the most variation. Once the dimensions are identified, it’s possible to find the best approximation of the original data points using fewer dimensions.

In the SVD method explained above, S is a diagonal matrix with non-zero eigenvalues in descending order. These values indicate the variance of the linearly independent components along each dimension. Restricting the number of singular values in S has an effect of dimensionality reduction on the data set. This method is also called the reduced SVD method.

In an example below, the least significant eigenvalue in S (last value along the diagonal) is dropped thereby reducing the size of matrix S as shown by the shaded portion. This results in further reduction in size of VT again shown by the shaded portion. Overall, this reduces the rank of the product matrix providing a good approximation to matrix A. This type of compression is therefore lossy unless the dropped eigenvalues are all 0s.image

Noise Filtering/SVD Filtering
In any experiment related to data collection, measurement errors manifest in form of noise. The technique of reduced SVD discards insignificant eigenvalues thereby not only leading to dimensionality reduction but noise filtering as well! The details on the SVD reduction technique are given in the previous sections.

Latent Semantic Indexing/Analysis
SVD can be seen as a method for transforming correlated variables into a set of uncorrelated ones that better expose the various relationships among the original data items. With SVD, the representation of items that share substructure become more similar to each other and items that were dissimilar become more dissimilar as well. In practical terms, the method of SVD on a document corpus yields various topics for those documents. This aspect is the SVD learning technique along the lines of unsupervised clustering.

References
Singular Value Decomposition in Wikipedia
Introduction to the Singular Value Decomposition
Singular Value Decomposition Tutorial

, , ,

Leave a Comment

Arbitration of Message Signaled Interrupts in Windows

Line Based Interrupt

A traditional line based interrupt (LBI) is a side-band signal from a device to an interrupt controller. The interrupt controller in turn grounds a pin attached to a processor thereby interrupting the processor’s sequence of instruction execution. The processor saves a context for the current instruction, checks the interrupt controller for the device that needs attention, performs a lookup on its interrupt descriptor table (IDT) and invokes the interrupt service routine (ISR) for the driver that owns the device. This is a brief and succinct description of how line based interrupts work. The traditional interrupt model is plagued with a number of problems namely:

  • A mis-configured device may wrongly trigger an interrupt line thereby causing enough interruptions to the processor that the system can no longer perform any useful work. This is typically referred to as an interrupt storm.
  • An interrupt, by the very nature of being side-band and asynchronous, may get processed before the cause for the interrupt is complete. As an example, a device posts a set of DMA write transactions and triggers it’s interrupt line. The driver’s ISR runs wherein it evicts the write buffers based on the status register in the device but the DMA writes are still in transition. This may cause a plethora of problems ranging from memory corruption and data loss to system instability and escalation of privileges.
  • Devices share interrupt lines which forces an ordering on the servicing of ISRs. The operating system typically traverses an ISR chain until it finds a driver that asserts its interrupt after which the traversal moves back to the head. This puts the drivers at the tail of an ISR chain at some disadvantage.

Message Signaled Interrupts (MSI)

In the early 2000, the PCI-SIG committee defined an in-band posted write message as an interrupt mechanism for PCI/PCIe devices known as MSI. MSI is a bus master cycle similar to DMA write at a fixed target with fixed payload. The target is primarily a local APIC address and the payload comprises of an interrupt vector for IDT lookup. Also note that MSI is an edge-triggered interrupt mechanism.

A key advantage of this approach is that an interrupt will never reach its destination before any of the prior posted writes. For the above example this ensures that a driver never evicts it’s write buffers before the completion of write operation. A second advantage is that with an MSI interrupt a device can convey more information than the mere fact that it needs attention from the processor. A third advantage is that due to an abundance of data values, devices no longer need to share interrupt vectors.

MSI Capability Structure

The PCI SIG committee has defined the MSI capability and MSI-X capability structures for identification and configuration of message signaled interrupts functionality in a capable device. A device can have both MSI and MSI-X capability structure. An MSI capability has a 16-bit message control field, 32/64-bit message address field, 16-bit message data field and optional 32-bit fields for masking and pending of MSI interrupts. MSI can support up to 32 vectors. The message address and data fields are described in detail later.

MSICapability

The MSI-X is a separate extension to basic MSI functionality. MSI-X provides the ability for each vector to use an independent address and data values.

MSI-X Capability
MSIXCap

MSI-X Table Structure

MSIXTable

Windows PnP Resource Arbitration

Before delving into the arbitration of MSI resources, let’s look at the process of device enumeration in Windows. A bus driver enumerates its devices and reports them to the PnP subsystem. For each device on the bus, the PnP manager queries the bus driver for boot resources (resources that are currently in use) and resource requirements. The PnP manager also queries the hardware and compatible IDs of a device. Using these IDs, the PnP manager loads a function driver and optional filter drivers thereby building a device stack. The PnP manager then sends a filter resource requirements request with the set of requirements originally retrieved from the bus driver. This is an opportunity for other drivers in the stack to update the requirements list. Thereafter, for each resource type, the PnP manager invokes the corresponding resource arbiter to reserve resources for a device. Once the PnP manager has successfully reserved resources of each type in the requirements list, it sends a Start Irp with the resource lists(raw and translated) to the device stack. The bus driver programs raw resources into the device while the function driver records the translated resources for use by the driver. This is a simplified view of PnP resource arbitration in Windows.

A resource arbiter is registered with the PnP manager and can also be thought of as a producer of a resource. An arbiter keeps track of range of resources reserved for the devices in it’s hierarchy. When arbitrating a resource, the PnP manager traverses up the device hierarchy to locate the nearest arbiter for a resource requirement of a device. An arbiter may have its resource allocated from another arbiter (of the same resource type) higher up in its hierarchy.

Similar to arbiters there are translators in the device hierarchy that are also registered with the PnP manager. The job of a translator is to map resource ranges from the north of its hierarchy to the south of the hierarchy and vice versa.

MSI Resource Arbitration

The PCI bus driver reads the MSI/MSI-X capability on a device and generates an MSI resource requirement for it. Here is an illustration of an MSI resource requirement for a network card:
Preferred Descriptor 7 – Interrupt (0×2) Device Exclusive (0×1)
  Flags (0×07) – LATCHED MESSAGE
  0xfffffffe – 0xfffffffe
  IrqPolicySpecifiedProcessors 0/1

The MinimumVector – MaximumVector range of 0xfffffffe – 0xfffffffe is a token request for a single MSI message. The requirement policy provides a set of target processors for an MSI resource. In this case, the requested target set has processor 0.

There is only one MSI resource arbiter in the entire hierarchy of devices. This arbiter is present in the ACPI driver in Windows. There is no translator for an MSI resource in Windows. Here is the output of an MSI resource assigned to a device in the interrupt resource arbiter in ACPI:
0: kd> !arbiter 4

DEVNODE fffffa80017e2d90 (ACPI_HAL\PNP0C08\0)
  Interrupt Arbiter “ACPI_IRQ” at fffff88000faf3c0
    Allocated ranges:
      0000000000000002 – 0000000000000002   B   fffffa80021dbe40

00000000fffffffa – 00000000fffffffa       fffffa80021bc060  (e1yexpress)

Possible allocation:
  < none >

Again, the 00000000fffffffa value is just a token for an MSI resource assignment in the arbiter. The MSI tokens are designed as such to not overlap with the IRQ values that fall in the range 0 – ff. Let’s examine the raw and translated resource descriptors for an MSI resource in the device node:
Raw resource descriptor:
Entry 5 – Interrupt (0×2) Device Exclusive (0×1)
  Flags (0×07) – LATCHED MESSAGE
  Message Count 1, Vector 0xfffffffa, Group 0, Affinity 0×1

Translated resource descriptor:
Entry 5 – Interrupt (0×2) Device Exclusive (0×1)
  Flags (0×07) – LATCHED MESSAGE
  Level 0×8, Vector 0×80, Group 0, Affinity 0×1

The Vector field value in the translated descriptor is the same value that’s passed to the IoConnectInterruptEx call for the fully specified connection parameter while registering a message service routine. As a result the IDT table has the following entry for vector 80:
0: kd> !idt

80:    fffffa8002290690 ndis!ndisMiniportMessageIsr (KINTERRUPT fffffa8002290600)

Check out the whitepaper on enhancements to the interrupt architecture for details on MSI resource requirement and assignment policies.

MSI Address and Payload

For MSI resources, the PCI bus driver uses an API exported by the ACPI driver to obtain the MSI address and payload information corresponding to MSI messages. This information is programmed in the device during the processing of IRP_MN_START_DEVICE Irp. Let’s examine the contents of an MSI capability in a device as programmed by the PCI driver:
0: kd> !pci 100 0 19 0

d0: CapID          05 MSI Capability
d1: NextPtr        e0
d2: MsgCtrl        64BitCapable MSIEnable MultipleMsgEnable:0 (0×1) MultipleMsgCapable:0 (0×1)
d4: MsgAddr        fee00000
d8: MsgAddrHi      0
dc: MsData         4080

The MSI Message Address field is interpreted as:
MSIMesgAddr

The Destination ID field identifies the target processor. The ID for processor 0 on this system is 0 as seen in the output of local APIC in the debugger:
0: kd> !apic
Apic @ fffe0000  ID:0 (50014) …

The RH field is a redirection hint. When this field is 0, the interrupt is directed to the processor listed in the Destination ID field.

The DM field is a Destination Mode field. This field is ignored when the RH bit is 0.

The MSI Message Data register field is interpreted as:
MSIMessData

The vector field value is same as the vector value in the translated descriptor.

The Fixed Delivery Mode indicates that the entire set in the destination field receives the interrupt message.

MSI messages have edge triggered semantics and so the Level for Trigger Mode field is set to 1.

Thus in order to trigger a Message Signaled Interrupt, the device(in this example) sends a payload of 0×4080 to the address 0xFEE00000.

Multiple Messages

A device can request and be assigned multiple MSI/MSI-X messages. For MSI, the number of messages assigned to a device is programmed by the PCI driver in the Multiple Message Enable field of the MSI capability in the device. There is a single value for the Message Address field in the capability structure. The device uses this same address for all the messages assigned to it. There is a single value for the Message Data field as well in the MSI capability structure. The device increments the lower bits of the Message Data value for the allocated messages. In fact, the encoding of the Multiple Message Enable field indicates the lower bits in the Message Data field that can be modified by the device to generate the allocated number of messages.

An interesting way of looking at the maximum number of MSI messages that can be assigned to a device in Windows is by examining the IDT table. An IDT table is a 16 by 16 table for each of 0 to 255 IRQs. Row 0 represents a range of vectors from 00 to 0F. Row 1 is a range of vectors from 10 to 1F and so on. Thus each row has at most 16 vectors at the same IRQL. This is also a limiting number for the maximum number of MSI messages that can be assigned to a device in Windows.

For MSI-X, the MSI-X Table contains at least one entry for every allocated vector. The Message Address and Message Data fields are used unmodified in the generation of an MSI-X message by the device. The number of MSI-X messages that can be assigned to a device in Windows Vista is 1024. MSI-X also provides for interesting use of IRQ policies and processor affinity. Check the reference section for more information on this.

References
Interrupt Architecture Enhancements in Windows
Introduction to Message-Signaled Interrupts
NUMA I/O Optimizations
Mainstream NUMA and the TCP/IP stack, Part IV: Parallelizing TCP/IP

, , ,

Leave a Comment

Xen Hypervisor and GPU Pass-through for Windows 7

For the last few weeks, I’ve been working on GPU pass-through for Windows 7 on Xen hypervisor. I’ve outlined the step-by-step procedure to replicate this at your end. Enjoy!

Hardware Details

  • Intel Quad Core i7 920s
  • DX5850 chipset
  • 6GB RAM
  • NVidia Quadro FX 3800 without FLR capability

Build and Setup

  1. Host Operating System
    a. Installation
    I picked Ubuntu as the host operating system and installed the Ubuntu 9.10 Karmic Koala x86_x64 version on my machine. I created two partitions for this setup – a primary partition of type ext3 journaling file system and a smaller partition as swap space. The primary partition is mounted as root partition. After selecting the defaults the installation completed without any problems. Here’s the output of the linux version on my first login:
    $ uname –a
    Linux 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:05:01 UTC 2009 x86_64 GNU/Linux

    Save a copy of grub.cfg at /boot/grub since a kernel update automatically overrides this file.

    b. Updates
    Once the system is up and running the update manager accumulates a list of fixes for the system. Install these updates to upgrade to Ubuntu release 10.04.1 LTS. The output of the linux version in this case is:
    $ uname –a
    Linux 2.6.31-22-generic #67-Ubuntu SMP Fri Oct 16 18:51:35 UTC 2010 x86_64 GNU/Linux

    c. Build Environment
    Install these packages for the build environment:
    $ sudo apt-get install -y openssh-server acpidump libcurl4-openssl-dev xserver-xorg-dev mercurial gitk build-essential libncurses5-dev uuid-dev gawk gettext texinfo bcc dpkg-dev debhelper iasl bridge-utils pciutils-dev xvnc4viewer python-dev xfig tetex-base tetex-bin uuid-dev

    d. Xen Kernel
    Update the host operating system with PVOPS kernel for Xen hypervisor. Download the kernel sources from Jeremy’s share:
    $ sudo git clone git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git linux-2.6-xen

    Switch to a stable branch:
    $ sudo git checkout -b xen/stable-2.6.32.x origin/xen/stable-2.6.32.x
    $ sudo git status
    # On branch xen/stable-2.6.32.x
    nothing to commit (working directory clean)

    Configure the kernel with these changes:
    $ sudo make menuconfig
    Processor type and features —>
        [*] Paravirtualized guest support —>
            [*] Xen guest support
            [*] Enable Xen privileged domain support
            [*] Enable support for Xen PCI passthrough devices
            [ ] KVM paravirtualized clock
            [ ] KVM Guest support
            Processor family (Generic-x86-x64) —>
            (X) Generic-x86-64
        [ ] AMD IOMMU support
        [ ]    AMD MCE features
        [ ]    AMD microcode patch loading support
        [ ] Old style AMD Opteron NUMA detection

    Power management and ACPI options —>
        [ ]    Power Management Debug Support

    Bus options (PCI etc.) —>
        <*> PCI Stub driver
        < > Xen PCI Frontend
        [ ] Interrupts on hypertransport devices
        [ ] PCI IOV support
        < > PCCard (PCMCIA/CardBus) support —>

    Networking support —>
        [ ] Amateur Radio support —>
        < > CAN bus subsystem support —>
        < > IrDA (infrared) subsystem support —>
        < > Bluetooth subsystem support —>
        [ ] Wireless —>
        < > WiMAX Wireless Broadband support —>
        < > RF switch subsystem support —>
        < > Plan 9 Resource Sharing Support (9P2000) (Experimental) —>

    Device Drivers —>
        < > Parallel port support —>
        [ ] ISDN support —>
            Graphics support —>
            < > /dev/agpgart (AGP Support) —>
            < > Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) —>
            < > Lowlevel video output switch controls
            < > Support for frame buffer devices —>
                Display device support
                < > Display panel/monitor support
        [ ] Auxiliary Display support —>
        <*> Xen /dev/xen/evtchn device
        [*] Backend driver support
        <*>    Xen backend network device
        <*>    Block-device backend driver
        <*>    Block-device tap backend driver
        <*>    PCI-device backend driver
                        PCI Backend Mode (Passthrough) —>
                            (X) Passthrough
        [*]          PCI Backend Debugging
        <*>    Xen filesystem
        <*>    userspace grant access device driver

    Security options –>
        [ ] NSA SELinux Support

    [ ] Virtualization —>

    Further, edit the .config file with the following changes (note the marker to comment these config options):
    # CONFIG_VGA_ARB=y
    # CONFIG_VGA_CONSOLE=y

    Build the kernel:
    $ sudo make –j8
    $ sudo make modules_install install
    $ sudo chmod g-s /usr/src -R
    $ sudo make deb-pkg
    $ sudo dpkg –i ../linux-image*2.6.32.25*.deb
    $ sudo depmod 2.6.32.25
    $ sudo update-initramfs –c –k 2.6.32.25

    Update the grub entry to boot into PVOPS kernel for Xen. Add the following to /etc/grub.d/40_custom
    menuentry “Ubuntu, Xen Linux 2.6.32-25-generic” {
            recordfail=1
            if [ -n ${have_grubenv} ]; then save_env recordfail; fi
            set quiet=1
            insmod ext2
            set root=(hd0,1)
            search –no-floppy –fs-uuid –set 9308233e-80de-4dd4-a78e-3159c946fdbb
            linux /boot/vmlinuz-2.6.32.25 root=UUID=9308233e-80de-4dd4-a78e-3159c946fdbb ro quiet splash
            initrd /boot/initrd.img-2.6.32.25
    }

    $ sudo update-grub

    On reboot confirm the linux version:
    $ sudo reboot
    $ uname –a
    Linux 2.6.32.25 #6 SMP Mon Oct 25 15:43:45 PDT 2010 x86_64 GNU/Linux

  2. Hypervisor
    a. Download the source code
    Download the source for the latest stable version of Xen hypervisor
    $ sudo hg clone http://xenbits.xensource.com/xen-4.0-testing.hg
    $ cd xen-4.0-testing.hg
    $ sudo hg identify
    tip                            21368:5221fcc3df64
    RELEASE-4.0.1                  21324:b536ebfba183

    $ sudo hg tags
    tip                            21368:5221fcc3df64
    RELEASE-4.0.1                  21324:b536ebfba183

    Switch to the release version:
    $ cd xen-4.0-testing.hg
    $ sudo hg update -r RELEASE-4.0.1

    b. Build the hypervisor
    $ sudo make xen
    $ sudo make tools
    $ sudo make install-xen
    $ sudo make install-tools PYTHON_PREFIX_ARG=
    $ sudo update-rc.d xend defaults 20 21
    $ sudo update-rc.d xendomains defaults 21 20

    c. Deploy and boot
    Update the grub entry:
    $ sudo vi /etc/grub.d/40_custom
    menuentry “Xen 4.0.1 / Ubuntu 10.04.1 kernel 2.6.32.25″ {
            recordfail=1
            if [ -n ${have_grubenv} ]; then save_env recordfail; fi
            set quiet=1
            insmod ext2
            set root=(hd0,1)
            search –no-floppy –fs-uuid –set 9308233e-80de-4dd4-a78e-3159c946fdbb
            multiboot /boot/xen.gz dom0_mem=1024M
            module /boot/vmlinuz-2.6.32.25 dummy=dummy nopat root=UUID=9308233e-80de-4dd4-a78e-3159c946fdbb ro console=hvc0 earlyprintk=xen nomodeset
            module /boot/initrd.img-2.6.32.25
    }

    Disable the hidden menu feature in grub
    $ sudo vi /etc/default/grub
    # GRUB_HIDDEN_TIMEOUT=0

    $ sudo update-grub

    Verify that the system booted in Xen Domain 0 (Some of the output below is useful in debugging failures):
    $ sudo xm list
    Name                        ID   Mem VCPUs      State   Time(s)
    Domain-0                     0  1024     8     r—–     17.5

    $ ls /proc/xen
    capabilities  privcmd  xenbus  xsd_kva  xsd_port

    $ cat /proc/xen/capabilities
    control_d

    $ ls -alt /dev/xen
    total 0
    drwxr-xr-x 16 root root   3920 2010-10-26 10:48 ..
    drwxr-xr-x  2 root root     80 2010-10-26 10:47 .
    crw-rw—-  1 root root 10, 61 2010-10-26 10:47 evtchn
    crw-rw—-  1 root root 10, 60 2010-10-26 10:47 gntdev

    $ cat /proc/misc
    60 xen/gntdev
    61 xen/evtchn

  3. Windows 7 in a Virtual Machine
    a.
    Install and run Windows 7 in a virtual machine
    Create a disk image file of size 50 gigs:

    $ dd if=/dev/zero of=win7.img bs=1M count=51200

    Create a Xen configuration file for HVM based Win7 virtual machine. Start with a template from Xen sources:
    $ cp /etc/xen/xmexample.hvm /home/vinod/work/win7.hvm

    Here is the content after updates to the config file. The config options are at their defaults in most cases. Note that the virtual ethernet interface is disabled for now.
    kernel = “hvmloader”
    builder = ‘hvm’
    memory = 4096
    name = “Win7_64″
    vcpus = 4
    #vif = [ 'type=ioemu, bridge=xenbr0' ]
    disk = [ 'file: /home/vinod/work/win7.img,hda,w', 'file:/home/vinod/work/win7x64ent.iso,hdc:cdrom,r' ]
    device_model = ‘qemu-dm’
    boot = “dc”
    sdl = 0
    opengl = 1
    vnc = 1
    vncpasswd = ”
    stdvga = 0
    serial = ‘pty’
    tsc_mode = 0

    In order to connect to Windows 7 over VNC:
    $ sudo xvncviewer 127.0.0.1

    Switch the boot order once the OS is installed. Also switch the default shell to bash:
    $ sudo readlink /bin/sh
    dash
    $ sudo ln -s -f /bin/bash /bin/sh
    $ sudo readlink /bin/sh
    /bin/bash

    b. Configuring Virtual Ethernet Interface for Windows 7
    The virtual machine creation fails when the virtual ethernet interface is enabled. Here is the error output:
    $ sudo xm create win7.hvm
    Using config file “./win7.hvm”.
    Error: Device 0 (vif) could not be connected. Could not find bridge device xenbr0

    Turns out there isn’t any bridge device by the name ‘xenbr0’ instead there is ‘tmpbridge’ that seems to be remnant of a failed operation:
    $ sudo brctl show
    bridge name     bridge id               STP enabled     interfaces
    tmpbridge       8000.000000000000       no

    To debug this issue add the following two lines in the script at /etc/xen/scripts/network-bridge:
    exec 2>/tmp/log
    set –x

    Thereafter reattempt to create the virtual machine:
    $ sudo service xend stop
    $ sudo service xend start
    $ sudo xm create win7.hvm

    Examine the log at /tmp/log. Also shown are the lines around the error(in bold):
    + ifdown eth0
    ifdown: interface eth0 not configured
    + ip link set eth0 name peth0
    RTNETLINK answers: Device or resource busy
    ++ _release_lock /var/run/xen-hotplug/network-bridge
    ++ trap sigerr ERR
    ++ rm -rf /var/run/xen-hotplug/network-bridge
    ++ sigerr
    ++ exit 1

    op_start () {
        if [ "${bridge}" = "null" ] ; then
            return
        fi
        if link_exists “$pdev”; then
            # The device is already up.
            return
        fi
        claim_lock “network-bridge”
        create_bridge ${tdev}
        preiftransfer ${netdev}
        transfer_addrs ${netdev} ${tdev}
        # Remember slaves for bonding interface.
        if [ -e /sys/class/net/${netdev}/bonding/slaves ]; then
            slaves=`cat /sys/class/net/${netdev}/bonding/slaves`
        fi
        # Remember the IP details for do_ifup.
        get_ip_info ${netdev}
        if ! ifdown ${netdev}; then
            ip link set ${netdev} down
            ip addr flush ${netdev}
        fi
        ip link set ${netdev} name ${pdev}
        ip link set ${tdev} name ${bridge}

    It’s pretty obvious from the log that the network bridge service failed to bring down the network interface since it was not configured. Update the file that contains the interface configuration information:
    $ sudo vi /etc/network/interfaces

    Add the following two lines:
    auto eth0
    iface eth0 inet dhcp

    On reboot examine the bridge information:
    $ sudo brctl show
    bridge name     bridge id               STP enabled     interfaces
    eth0            8000.001cc0a2647d       no              peth0

    This time the bridge name is eth0. Update the VM config file and create a VM:
    vif = [ 'type=ioemu, bridge=eth0' ]

    $ sudo xm list
    Name                                        ID   Mem VCPUs      State   Time(s)
    Domain-0                                     0  1024     8     r—–     83.3
    Win7_64                                      2  4096     4     -b—-    110.8

    $ sudo brctl show
    bridge name     bridge id               STP enabled     interfaces
    eth0            8000.001cc0a2647d       no              peth0
                                                            tap2.0
                                                            vif2.0

    Viola! Both the Dom0 and Windows 7 VM have network access over the same physical ethernet. For more explanation on bridge configuration in Xen refer to this link: http://wiki.xensource.com/xenwiki/XenNetworking

  4. PCI Express Device as Pass-through for Windows 7
    Coming Soon!
     
  5. GPU as Pass-through for Windows 7
    Coming Soon!

, , , , ,

Leave a Comment