Open Data Sets in the Microfabrication (Semiconductor) Industry for Data Science Part - 1

Written by Jaganadh Gopinadhan

Introduction

The pandemic brought unique challenges in many industries. One of the primary digital infrastructure building industries, the semiconductor, was also affected. The U.S. alone reported a severe shortage of semiconductors, and the Federal Government announced plans to support the industry.

Advanced data and analytics applications are always part of the Microfabrication Industry Leaders’ roadmap. One of the critical challenges the industry is always trying to solve is improving the yield. All the yield problem innovations are directly related to applying advanced Data Mining or Data Science/Artificial Intelligence solutions. Unlike other industries, availability of open data, complexity in the domain, and Intellectual Property are limiting factors in open innovation.

There are very few available open data sets in the field, but many research papers and literature are available. Access to quality data is limited to academicians or industry experts. The current notes series attempts to introduce open data-sets in the industry use case scenarios and related literature. The notes will be adhering to known/published research papers. Due to the very nature of the industry and intellectual properties, the notes may be limiting the code and approach discussion.

Eigenvector Metal Etch Data

This Data Set is one of the oldest Microfabrication/Semiconductor process-related open data. The data was released by Eigenvector, a niche industry analytics company. The information was used in a research study and paper published in 1999[1]. The first author Barry M. Wise is one of the company’s founding members. The data was taken from LAM 9600 Metal Etcher[2]. Etching is used in microfabrication to chemically remove layers from the surface of a wafer during manufacturing[4]. The data was released in Matlab format, suitable for analysis using the PLS Toolbox[3] developed by Eigenvector. There are three .mat files; MACHINE_Data.mat, OES_DATA.mat, and RFM_DATA.mat. A detailed note on the data and attributes available in reference [1] and [2]. Since the data is in Matlab format, we created a python based parser to convert the data in pandas DataFrames. This parser should enable Data Miners and Data Scientists to play around with data using Open Source tools like Python or R. The Eigenvector Etch Data Parser The Eigenvector Etch Data Parser is developed to read Matlab data files published by Eigenvector[2]. The data is from a LAM 9600 Metal Etching Machine and was collected in 1995’s. The parser reads each file and converts the calibration data (sensor data) and test data (sensor data) into a single DataFrame. The parser introduced an additional field in the data ‘fault_name’, which helps the user identify the normal/calibration wafers and test wafers(with defects). We tested the parser in Python3 environments only; if you are looking for Python2 compatibility, please test and create a bug/pull request as applicable. The source code is released under Apache 2.0 license and is available at https://github.com/jaganadhg/egvsemicon.

Data Mining/Data Science and Next Steps

Detailed notes describing the data and parser are available at https://github.com/jaganadhg/egvsemicon/blob/main/EGV_Data_exploreer.ipynb. We are not venturing into any detailed analytics solution in the scope of current notes—the industry practices simple techniques from univariate analysis to employing Deep Learning to solve the problems. From the data description, one can infer the nature of data preprocessing and feature engineering techniques. In the same domain, understanding or active guidance from field processing engineers may benefit you in starting an exciting project. A good starting point will be the original paper [1]. Competing Interests This notebook is intended to introduce the Egionvector Metal Etch Data Parser[8] and the data [2]. The authors declare that no proprietary information related to the authors, affiliated company, or its approach, methodologies, and IPR is discussed in these notes. The authors declare that they have no competing interests.

Reference

[1] B.M. Wise, N.B. Gallagher, S.W. Butler, D.D. White, Jr. and G.G. Barna, “A Comparison of Principal Components Analysis, Multi-way Principal Components Analysis, Tri-linear Decomposition and Parallel Factor Analysis for Fault Detection in a Semiconductor Etch Process”, J. Chemometrics, 13, 379­396 (1999) [2] https://eigenvector.com/resources/data-sets/

[3] https://eigenvector.com/software/pls-toolbox/

[4] https://en.wikipedia.org/wiki/Etching_(microfabrication)

Written on January 30, 2022
The Opinions Expressed In This Post Are My Own And Not Necessarily Those Of My Employer.
[ Microfabrication  Semiconductor  Data Science  Machine Learning  Artificial Intelligence Data Set  ]