【AWS Introduction 】 Introduction to Amazon S3

Data storage type

Data generated by computers, such as text, images, videos or other types, need to be saved before they can be used again, whether for system management, Web development or enterprise applications.

The types of data storage typically include the following:

1.File Storage

File storage adopts the traditional top-down tree structure, with data stored in different folders respectively. Similar to the mode of folders and subfolders and files (refer to File Explorer in Windows 10/11, or Finder in macOS and the directory structure under it).

File storage is a hierarchical topological structure, with each file having a complete path and file name. A file name consists of a name and a suffix.

Files are stored on hard disks and NAS (Network Attached Storage).

File storage is very easy to operate, but it requires a well-planned file structure for storage. As the file system becomes increasingly large, the cost of file storage keeps rising and is relatively expensive.

File storage protocols include NFS, SMB, etc.

2.Block Storage

Block storage, used for larger databases, structured files and applications, etc. The file is divided into many blocks and each has a unique ID. These blocks can be stored on different disks (i.e. hard disks and other storage devices).

The commonly used block Storage includes SAN (Storage Area Network).
Block storage is faster and more flexible, but it is more complex than file storage.

The protocols for block storage adopt iSCSI, FC, etc.

3.Object Storage

Object storage is relatively new. It stores a large amount of unstructured data as objects and manages the data as objects in a flat address space. Each object is assigned a unique identifier and metadata, and is usually used in cloud environments. The object has three components:

ID, a unique identifier;
MetaData, or original data, is a description of data.
Data is the actual data.
Object storage is used for static content storage, such as movies, pictures, music and other unstructured data. Its data is widely distributed and easy to access via HTTP requests, that is, it is stored and accessed through the Web site.

Definition of Amazon S3

Amazon Simple Storage Service, abbreviated as S3, is the Amazon Simple Storage Service. It is where individuals, applications and a range of AWS services store their data.

Amazon S3 has excellent application scenarios:

Back up documents, maintain log files and disaster recovery images
Analyze and operate static big data
Hosted static websites

Compared with traditional storage methods, S3 offers cheap and reliable storage. Amazon S3 can also be closely integrated with the internal and external operations running on AWS.

According to the previous introduction, block storage divides the data on a physical storage device into separate blocks and uses file storage for management.

NTFS is a commonly used file system in Windows, while Btrfs or ext4 are the file systems used in Linux. The file system allocates space for files and data on behalf of the operating system and provides access rights when the operating system needs to read data.

Object storage systems like Amazon S3 provide a plane that can be regarded as storing data. This simple design avoids some of the operating system-related complexities of kauicunc paste and allows anyone to easily use any amount of professionally designed and carefully maintained storage capacity.

Amazon S3 Service Architecture

Amazon S3 stores files in buket, also known as “storage bucket”.

By default, each AWS account can create up to 100 buckets. Like other AWS services, you can request AWS to relax this restriction based on your usage.

S3 buckets and contents exist in a single AWS Region, but the name selected for the S3 bucket must be globally unique in the S3 system.

1.Prefixes and delimiters

Obviously, a bucket does not have a directory structure similar to file storage, but it can define the bucket structure using prefixes and delimiters.

The prefix is represented by a plain text string to indicate the organizational level hierarchy; The delimiter will tell S3 to treat files with names similar to contact/phonenumber.pdf as objects.

The method to access files using Amazon CLI is:
S3://bucketname/filename

2.Large objects

Although there is no theoretical limit on the total amount of S3 bucket data, a single object may not exceed 5TB in size. The upload of a single file cannot exceed 5GB.

To reduce data loss or avoid the risk of upload termination, AWS recommends using the segmented upload function for any object weighing 100MB (i.e., large objects).

Segmented upload means dividing a large object into multiple smaller parts and transmitting them separately to the S3 target. Even if the transmission fails, it can be restarted without affecting other transmissions.

If you want to Transfer large files to an S3 bucket, Amazon S3 Transfer Acceleration can come in handy to speed up the upload.

AWS Management Console

The AWS Management Console meets the requirements for the management and control of all cloud computing resources. Therefore, after registering an AWS international or domestic account, one can use the AWS management console on its platform to configure resources and then carry out application development and operation, etc.

In addition to managing EC2, the AWS management console can also manage S3 and many other AWS services.

Amazon S3 bucket types

S3 stores data in buckets, meaning there are no minimum or maximum limits. Everything is determined based on storage requirements.

Of course, S3 has customized different bucket packages to meet the storage needs of different users. Users only need to pay for the storage they use. The following are the main S3 categories:

1.Amazon S3 Standard

It is S3 universal storage. The S3 standard provides object storage with high persistence, availability and performance for frequently accessed data. Due to the S3 standard’s ability to deliver low latency and high throughput, it is suitable for a wide range of use cases, including cloud applications, dynamic websites, content distribution, mobile and gaming applications, as well as big data analytics.

2. Amazon S3 Intelligent-Tiering (S3 Intelligent-Tiering)

This is AWS’s first cloud storage for unknown or changing access. Data can be automatically moved to the most cost-effective access layer based on access frequency, thereby automatically reducing users’ storage costs at the fine-grained object level without affecting performance or incursing retrieval fees or operational expenses.

3.Amazon S3 Express One Zone

This is a high-performance dedicated single availability zone storage class that provides stable millisecond-level data access performance for frequently accessed data and latency-sensitive applications by users. Compared with S3 Standard, S3 Express One Zone can increase the data access speed by 10 times and reduce the request cost by 50%.

4.Amazon S3 standard-infrequent Access(S3 standard-IA)

It is suitable for data that is not frequently accessed but requires quick access when needed. S3 Standard-IA offers high persistence, high throughput and low S3 Standard latency. Its storage price per GB and retrieval cost are both relatively low, with low cost and excellent performance.

5.Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA), namely S3 single zone-IA

It is suitable for data that is not frequently accessed but requires quick access when needed. Other S3 storage types of data have at least three availability zones (AZ), while S3 single-zone IA stores data in a single availability zone (AZ) and costs 20% less than S3 standard-IA. It is highly suitable for users who access data infrequently, use it at a lower cost, and require availability and elasticity.

6.Amazon Glacier

Amazon Glacier is used for long-term archiving files. Amazon Glacier is built specifically for data archiving, aiming to provide users with the highest performance, maximum retrieval flexibility and the lowest cost of cloud archiving storage.

There are three optimized archive storage classes to choose from for different access modes and storage durations. They are respectively:

o Amazon S3 Glacier Instant Retrieval
o Amazon S3 Glacier Flexible Retieval
o Amazon S3 Glacier Deep Archive

Amazon S3 encryption

The data stored on S3 is always encrypted.

To protect static data, S3 provides end-to-end encryption methods:

Server-side encryption. It is the S3 platform that encrypts data when it is saved and decrypts it when the user completes the data retrieval operation after identity verification.
Client-side encryption. Encrypt the data before it is transmitted to S3. Encryption can be accomplished either by using AWS KMS-Managed Customer Master Key (i.e. CMK), or by using the Client-Side Master Key provided by the Amazon S3 encryption Client.

Server-side encryption can significantly reduce the complexity of the process and is usually chosen. However, sometimes regulatory authorities may require you to maintain full control over the encryption key, and this is where client-side encryption becomes the only option.