The importance of accurately identifying cardholder data
Anyone who has been through a PCI assessment will tell you that the first step and probably one of the most time consuming and critical activity of a PCI compliance exercise is to accurately determine and document the scope, and hence the cardholder data environment (CDE).
Don’t be confused. While interconnected, CDE and Scope are two different things. The former is comprised of people, processes and systems that store, process, or transmit account data (see further). The latter is defined as: the CDE + all systems components (network devices, servers, computing devices, databases and applications.) connected to it, which, if compromised, could impact the CDE. Simply put, if the CDE were your vault room, the scope would be defined as the Vault room + any surrounding rooms (including cellar and attic) giving access to it.
Not properly determining and documenting the scope could ruin the compliance certification efforts (scope disagreement with the auditor or acquirer) and put the organization at risk of data breach with the usual side effects (fines, loss of business,…). Also, the scope must be considered as a movable target. This is why PCI suggests confirming it at least annually. "At least annually and prior to the annual assessment, the assessed entity should confirm the accuracy of their PCI DSS scope"
Accurately localising account data is also a pre-requisite for meeting the following requirements:
PCI DSS 3.1
PCI DSS Designated Entities Supplemental Validation
Don’t be confused. While interconnected, CDE and Scope are two different things. The former is comprised of people, processes and systems that store, process, or transmit account data (see further). The latter is defined as: the CDE + all systems components (network devices, servers, computing devices, databases and applications.) connected to it, which, if compromised, could impact the CDE. Simply put, if the CDE were your vault room, the scope would be defined as the Vault room + any surrounding rooms (including cellar and attic) giving access to it.
Not properly determining and documenting the scope could ruin the compliance certification efforts (scope disagreement with the auditor or acquirer) and put the organization at risk of data breach with the usual side effects (fines, loss of business,…). Also, the scope must be considered as a movable target. This is why PCI suggests confirming it at least annually. "At least annually and prior to the annual assessment, the assessed entity should confirm the accuracy of their PCI DSS scope"
Accurately localising account data is also a pre-requisite for meeting the following requirements:
PCI DSS 3.1
- 3.1 - Keep cardholder data storage to a minimum by implementing data retention and disposal policies, procedures and processes that include ….A quarterly process for identifying and securely deleting stored cardholder data that exceeds defined retention;
- 3.3 - Mask PAN when displayed (the first six and last four digits are the maximum number of digits to be displayed);
- 3.4 - Render PAN unreadable anywhere it is stored (including on portable digital media, backup media, and in logs);
- 6.4.3 - Production data (live PANs) are not used for testing or development.
PCI DSS Designated Entities Supplemental Validation
- DE.2.1 Document and confirm the accuracy of the scope at least quarterly and upon significant changes;
- DE.2.5 Implement a data-discovery methodology to confirm the scope and to locate all sources and locations of clear-text PAN at least Quarterly;
- DE.2.5.1 Ensure effectiveness of methods used for data discovery. The effectiveness of data-discovery methods must be confirmed at least annually;
- DE.2.5.2 Implement response procedures to be initiated upon the detection of clear-text PAN outside of the CDE.
What is meant by account data?
PCI DSS requirements apply to organizations where account data is stored, processed or transmitted. Account data being defined as:
What to look for?
You can’t expect to adequately protect your data if you don’t know what to look for.
Storage of sensitive authentication data (SAD) after authorisation being forbidden, the primary account number PAN is the determining factor for the localisation of cardholder data. But what does it looks like?
Storage of sensitive authentication data (SAD) after authorisation being forbidden, the primary account number PAN is the determining factor for the localisation of cardholder data. But what does it looks like?
A PAN number is made of three parts and 16 digits: The first part or the first six digits represent the card issuer identification number. Ex: Visa: 4*****; AMEX: 34**** or 37****; Diner’s Club International: 36****; Mastercard: 51**** to 55****. The second part consisting of the seventh digit to the second last number digit is the personal account number of the cardholder. The final part and last digit is known as the "check digit" or "checksum." This number is set by something called the Luhn formula and used to validate the PAN number.
NOTE: PCI DSS 3.1 - 3.3 - Mask PAN when displayed —no more than the first six and the last four digits may be visible on a screen, a receipt, or on any other media used by the organization.
NOTE: PCI DSS 3.1 - 3.3 - Mask PAN when displayed —no more than the first six and the last four digits may be visible on a screen, a receipt, or on any other media used by the organization.
Legitimate vs illegitimate PAN data location
You can’t expect to adequately protect your data if you don’t know where is is.
When it comes to determining the location of PAN data, organisations start logically from what they know. They generally have a good conception of where PAN data is stored as part of IT components supporting the company revenue stream and sale channels. This is what I call the legitimate location. But can we claim that this the only place where PAN data resides? As underlined by Verizon - In 66% of data breaches, the organization didn’t know the data was on the system that was compromised - Card data breaches have taught us that PAN data can exist even when organizations claim they don’t store it. Firstly, PAN data could be stored in databases or systems not directly associated to the sale channels or the core application such as marketing, customer support or HR. Secondly, unauthorized copies of PAN data could be left around on the network, this is what I call the illegitimate data.
When it comes to determining the location of PAN data, organisations start logically from what they know. They generally have a good conception of where PAN data is stored as part of IT components supporting the company revenue stream and sale channels. This is what I call the legitimate location. But can we claim that this the only place where PAN data resides? As underlined by Verizon - In 66% of data breaches, the organization didn’t know the data was on the system that was compromised - Card data breaches have taught us that PAN data can exist even when organizations claim they don’t store it. Firstly, PAN data could be stored in databases or systems not directly associated to the sale channels or the core application such as marketing, customer support or HR. Secondly, unauthorized copies of PAN data could be left around on the network, this is what I call the illegitimate data.
Where to search for PAN data?
When it comes to PAN data, there is no unusual storage location. They can be anywhere and certainly where one does not expect them.
Forensic investigators acting on breach cases confessed to regularly finding PAN data on email servers, audit logs, test servers, user stations and even in browser cache and Skype session.
The following components should be on our checklist: Application servers, file servers, database servers, email servers, production, development, test and QA and environments, workstation and desktop, gateway and proxy, load balancers, Web application firewalls and middleware components. The following should be looked at: file systems, text files, xml files, data queue folders, log files (transaction, debug mode, network traffic), batch files, office documents (e.g. Word, spreadsheet, PDF), temp files, emails, database tables, removable media, backups and archives, browser caches, and audio recordings.
Forensic investigators acting on breach cases confessed to regularly finding PAN data on email servers, audit logs, test servers, user stations and even in browser cache and Skype session.
The following components should be on our checklist: Application servers, file servers, database servers, email servers, production, development, test and QA and environments, workstation and desktop, gateway and proxy, load balancers, Web application firewalls and middleware components. The following should be looked at: file systems, text files, xml files, data queue folders, log files (transaction, debug mode, network traffic), batch files, office documents (e.g. Word, spreadsheet, PDF), temp files, emails, database tables, removable media, backups and archives, browser caches, and audio recordings.
Document your hunting process
Make sure to include a documented PAN data hunting process in your PCI library.
Not only should the legitimate and illegitimate location of data be determined and documented but also how this location was determined. PCI doesn’t have any guideline or recommendation on how to search for PAN data but states on page 10 of PCI DSS 3.1 "The entity retains documentation that shows how PCI DSS scope was determined. The documentation is retained for assessor review and/or for reference during the next annual PCI DSS scope confirmation activity” . It goes on with “For each PCI DSS assessment, the assessor is required to validate that the scope of the assessment is accurately defined and documented”.
Not only should the legitimate and illegitimate location of data be determined and documented but also how this location was determined. PCI doesn’t have any guideline or recommendation on how to search for PAN data but states on page 10 of PCI DSS 3.1 "The entity retains documentation that shows how PCI DSS scope was determined. The documentation is retained for assessor review and/or for reference during the next annual PCI DSS scope confirmation activity” . It goes on with “For each PCI DSS assessment, the assessor is required to validate that the scope of the assessment is accurately defined and documented”.
The PAN hunter toolkit
Finding information within a single database is cumbersome but finding information across an organisation network is a huge challenge.
The toolkit of the perfect PAN hunter consists of soft and technical analysis tools.
Soft Analysis
The toolkit of the perfect PAN hunter consists of soft and technical analysis tools.
Soft Analysis
A major part of the legitimate PAN data could be identified through documentation reviews (process documentation, diagrams, vendor documentation, etc.) and interviews with personnel such as Sale channels, Business owners, customer supports, HR, finance and IT. Here are few beacon questions helping clearing the path to the legitimate PAN data.
- What are the sale channels and payment methods?
- Are payment taken via emails and user messaging such as Skype or phone?
- How PAN data and other sensitive data are collected through the sale channels?
- Is PAN exchange via emails?
- How are they processed and by which departments?
- What is the authorisation process?
- How are customer reimbursement and chargeback dealt with?
- What are the rationals for keeping/storing PAN data after authorisation?
- Do you keep a backup of PAN data?
- Is PAN data stored in a datawarehouse?
- Could customers manage (add, remove, update) their payment data?
- If payment could be taken via phone, is there call recording in place?
- Is PAN data used as primary key?
- How is payment modules tested? Is PAN data used for this purpose?
- What are the IT and network components supporting the payment flow and customer information management?
- Is PAN data include in transaction or system logs?
Note: The only goal of this exercise is to identify all possible location of PAN data, not to validate the security controls.
Technical analysis
The amount of data and places to be searched are so large (globally the whole network is at stake) that some kind of automated search for PANs in all systems, databases, files, and network traffic is the only viable way to approach the complete picture of the situation. These solutions are known as card data discovery or scan tools. Examples are provided here below. Don’t be fooled. These tools help automating the discovery process but the results require careful review to separate the wheat from the chaff.
Here is a test that each organisation should perform to help in identifying PAN data on the components supporting the revenue stream or core application or in determining how card data flows through its payment systems. Firstly, perform a number of different end-to-end transactions through the core/payment application with a single, known payment card number.Then, once the transactions have completed, scan the systems searching for that specific known number.
Conclusion
PAN data is a strange owl! Discovering its nets around the network is the first challenge in a PCI journey. In fact, it is a project in itself often code-named PROJECT #0. It is the foundation of a PCI compliance. Not bestowing the necessary effort, time and budget it deserves could jeopardise the entire PCI compliance. Project #0 must address the legitimate and illegitimate part of the equation through soft and technical analysis.
Questions
- How do you address the card data discovery?
- Is card discovery part of your PCI tooling solution?
- How do you confirm/ validate scope?