Common issues with automated STDF data upload


Semiconductor companies thinking of providing a modern yield analysis system to help their engineers become more efficient should seriously consider automating the upload of their STDF data to a relational yield management database. This will enable them to have faster queries and analysis because the data analysis software does not have to parse the STDF data every time the user wants to open it. It also allows centralized storage of the semiconductor data and helps avoid having to download all of the data to the analysis software.

However, when implementing a system for automated storage of STDF data to a central database, the implementing team should be aware of various issues that must be resolved or planned for. In this article I’ll show you what to look out for.

The first issue

How to organize the STDF data into a database schema.

Merely following the STDF spec and storing the data using the STDF structure will be inefficient because the design of STDF is to allow saving of data as efficiently as possible while testing. However, in the database, your concern is how to retrieve the data as quickly as possible while storing the data with the least amount of storage possible. The design of the database to store the STDF data also needs to consider the possibility of test program revisions, variation of the number of tests stored per part due to failed parts and the ability to keep track of test limit changes across datalogs for later comparison, among other data variations.

The second issue

The amount of overhead that becomes necessary when storing the individual test results.

The test results data will be the largest data that will be stored on the database. On top of that, there is overhead of the indices that must be used for fast queries. In our experience with yieldHUB, the test results typically comprise 60% of the database storage.

The third issue


When the database grows in excess of a few hundred gigabytes, backup can become a problem. The backup data generated needs to stay online for a number of days or even weeks and the storage requirements can easily exceed that of the database itself. Depending on the type of database used, full backup can lock the database or render it read-only so the application needs to handle this situation properly. Backup planning is important to make sure the database is available to users as much as possible while ensuring that in case of database corruption, one can restore from the backups.

The fourth issue


Should you give access to non-experts to the database? Would you allow users that may not be proficient in SQL to directly query the database?

Ideally the application already gives the users what they need so they don’t really need to access the database directly. If it is required that users can query the database directly, make sure the users are trained and also consider making view tables for them to make it easier for non-SQL experts to use the database. Note that users can innocently run queries that ask the server to return all data to the database client application, so make sure you can terminate uncontrolled queries.


The design of yieldHUB’s database was completed after several years of iteration and testing with various datasets representing different scenarios of storing STDF data. Our largest databases are multiple Terabytes in size with database sizes corresponding approximately to the size of the compressed datalogs. Fast growing companies find no slow-down when ramping. Modern cloud-based technology is as fast as having a server beside you.

The server

Of course, the corresponding server hardware should be very much capable of handling the expected amount of data and number of simultaneous users. When planning on an automated system for STDF data upload to a database, the hardware should have as many CPUs, as much RAM and as much storage as possible within the budget. Also consider future growth and deploy a server chassis that may be upgraded by simply adding CPUs and RAM.

The above discussion is equally applicable to data formats other that STDF generated by modern testers. STDF seems to be the most popular format although yieldHUB now caters for, and integrates, dozens of other formats from the different semiconductor manufacturing stages, tester types and foundries.