NASC's Affymetrix Service

Help and Advice

NASC can help

NASC Affymetrix service is able to offer help and advice on both experimental design and analysis of Affymetrix data. We can assist with the initial design of experiment, such as growth conditions, sampling, replicates etc. Assistance with analysis includes help with data obtained from the Affymetrix gene chip service, downloaded from the NASCArrays website. We are able to assist users on analysis of CSV spreadsheet data using excel and analysis of .CEL files using free or commercial analysis packages. We can help with low level analysis such as normalisation, analysis of replicate variation, finding up/down regulated genes, and more complex analysis on request. For extended analysis of array data we recommend Bioconductor.

Some very good tutorials on installing are available at the Bioconductor site and it is FREE SOFTWARE. There are many scripts and graphical interfaces available for R/Bioconductor, we particularly recommend AffylmGUI which is a Graphical User Interface for affy analysis using the limma Microarray package. If you have any analysis questions, please contact affy@arabidopsis.info.

Some ideas for analysis of data

Introduction

Microarray analysis is a large subject. The subject is made even larger because there is no one "correct" way of analysing data. There are many pieces of software available for analysing data at various costs- a few pieces are free, many are free if you are an academic user, and some are sold. The purpose of this web page is to demonstrate simple analysis you can do with our sample datasets, and some free and commonly available software that you can use.

We're assuming you have some spreadsheets from us already. You can get spreadsheets from NASCArrays. A description of how to do this is available on the download help page.

For a more detailed description, please have a look at this page: AffyAnalysisWithExcel.html. And for any help or advice on any of the above topics, please contact affy@arabidopsis.info.

About the spreadsheets

A description of what you get in the Affymetrix spreadsheets is on NASCArrays' data help page.

Normalisation performed on Affymetrix Results

All the spreadsheets presented in the database have been trivially normalised using the Affymetrix standard procedure. A so-called "Scaling Factor" was applied using the Affymetrix software. This is calculated by removing the top and bottom 2% of signal values, then calculating a value that adjusts the mean of the remaining 96% to 100. All the results are multiplied by this factor to give normalised results. This has the effect of allowing different experiments to be comparable. More detailed normalisation is available, however most literature on the subject tends to focus on two colour arrays.

Analysis with Excel

A lot of the obvious questions from microarrays can be answered only using Excel (or other spreadsheets). This section will step through how to answer simple questions from the data spreadsheets downloadable from NASCArrays.

The first important features in Excel are the sort buttons.

Excel buttons for sorting

Pressing these buttons sorts the entire spreadsheet according to the column the currently selected cell is in. So for instance, if you select a cell in the signal column, and press "sort descending" (the button labelled Z-A), the entire spreadsheet will sort on the signal. Scrolling to the top will show you the gene that was expressed most on the chip, and then in descending order. Choosing "sort ascending" (A-Z) will show the gene that was expressed the least at the top. Any of the columns can be sorted. To get the spreadsheet back the way it started, sort ascending on the spotid.

Caution! Do not select an entire column and press one of the sort buttons. When this happens just that column will be sorted, leaving the rest of the spreadsheet as it was before. This will scramble the data on the spreadsheet making it meaningless.

Other spreadsheet programs have this facility too, but they may work differently.

Excel can be used to do comparisons between two sets of results. One easy way is by doing a "log ratio". Download an experiment, or two slides from your selection, and find a pair of slides you want to compare. Find a free column and calculate the log of the ratio of the signal column of one slide by the signal column of the other. For example, type "=log(A1/B1)", where A and B are signal columns into the top empty space of the free column, then copy-and-paste this. This is the "log ratio". A value of greater than ±2 is traditionally considered a significant change, although this is being replaced by more statistical methods.

Once you've calculated the log ratio into a column, you can also find the most upregulated and downregulated gene by sorting on the log ratio.

If the experiment you are studying has replicates, you can do the student's T-test. What you need to do in this case is to have two groups of replicates to compare between. Use the "ttest" function for this.

You can download a sample spreadsheet (pictured) with log-ratio and t-test calculated as an example.

An screenshot of an example difference spreadsheet

For a more detailed description, please have a look at this page: AffyAnalysisWithExcel.html.

For more complex analysis of array data we recommend Bioconductor.

Some very good tutorials on installing are available at the Bioconductor site. There are many scripts and graphical interfaces available for R/Bioconductor now. We particularly recommend AffylmGUI which is a Graphical User Interface for affy analysis using the limma Microarray package.