ApacheのETAG計算

Apache では静的ファイルの ETAG をどうやって生成しているのか確認してみたくなったので調査。

FileETag Directive は次のように説明されている

The FileETag directive allows you to choose which of these — if any — should be used. The recognized keywords are:

INode
  The file's i-node number will be included in the calculation
MTime
  The date and time the file was last modified will be included
Size
  The number of bytes in the file will be included
All
  All available fields will be used. This is equivalent to:
  FileETag INode MTime Size

“All” の場合、次のようになる。

$ HEAD http://localhost/
200 OK
Connection: close
Date: Tue, 06 Jul 2010 04:22:52 GMT
Accept-Ranges: bytes
ETag: "16a386-2d-460ce601ebfc0"
Server: Apache/2.2.9 (Ubuntu) PHP/5.2.6-2ubuntu4.6 with Suhosin-Patch
Vary: Accept-Encoding
Content-Length: 45
Content-Type: text/html
Last-Modified: Mon, 19 Jan 2009 04:29:59 GMT
Client-Date: Tue, 06 Aug 2010 04:22:52 GMT
Client-Peer: 127.0.0.1:80
Client-Response-Num: 1

計算は次の箇所で行われている(Apache のバージョンは 2.2.15)

httpd-2.2.15/modules/http/http_etag.c
55 #define ETAG_WEAK "W/"
56 #define CHARS_PER_UINT64 (sizeof(apr_uint64_t) * 2)
57 /*
58  * Construct an entity tag (ETag) from resource information.  If it's a real
59  * file, build in some of the file characteristics.  If the modification time
60  * is newer than (request-time minus 1 second), mark the ETag as weak - it
61  * could be modified again in as short an interval.  We rationalize the
62  * modification time we're given to keep it from being in the future.
63  */
64 AP_DECLARE(char *) ap_make_etag(request_rec *r, int force_weak)
65 {
...
91     /*
92      * Make an ETag header out of various pieces of information.  We use
93      * the last-modified date and, if we have a real file, the
94      * length and inode number - note that this doesn't have to match
95      * the content-length (i.e. includes), it just has to be unique
96      * for the file.
97      *
98      * If the request was made within a second of the last-modified date,
99      * we send a weak tag instead of a strong one, since it could
100      * be modified again later in the second, and the validation
101      * would be incorrect.
102      */
103     if ((r->request_time - r->mtime > (1 * APR_USEC_PER_SEC)) &&
104         !force_weak) {
105         weak = NULL;
106         weak_len = 0;
107     }
108     else {
109         weak = ETAG_WEAK;
110         weak_len = sizeof(ETAG_WEAK);
111     }
112
113     if (r->finfo.filetype != 0) {
114         /*
115          * ETag gets set to [W/]"inode-size-mtime", modulo any
116          * FileETag keywords.
117          */
118         etag = apr_palloc(r->pool, weak_len + sizeof("\"--\"") +
119                           3 * CHARS_PER_UINT64 + 1);
120         next = etag;
121         if (weak) {
122             while (*weak) {
123                 *next++ = *weak++;
124             }
125         }
126         *next++ = '"';
127         bits_added = 0;
128         if (etag_bits & ETAG_INODE) {
129             next = etag_uint64_to_hex(next, r->finfo.inode); // Python:hex(statinfo.st_ino)
130             bits_added |= ETAG_INODE;
131         }
132         if (etag_bits & ETAG_SIZE) {
133             if (bits_added != 0) {
134                 *next++ = '-';
135             }
136             next = etag_uint64_to_hex(next, r->finfo.size);  // hex(statinfo.st_size)
137             bits_added |= ETAG_SIZE;
138         }
139         if (etag_bits & ETAG_MTIME) {
140             if (bits_added != 0) {
141                 *next++ = '-';
142             }
143             next = etag_uint64_to_hex(next, r->mtime);       // int(statinfo.st_mtime * 1000000))
144         }
145         *next++ = '"';
146         *next = '';
147     }
148     else {
149         /*
150          * Not a file document, so just use the mtime: [W/]"mtime"
151          */
...
164     }
165
166     return etag;

なお request_rec->mtime は次の箇所で定義されている

httpd-2.2.15/include/httpd.h
773 /**
774  * @brief A structure that represents the current request
775  */
776 struct request_rec {
...
861     /** Last modified time of the requested resource */
862     apr_time_t mtime;

さらに apr_time_t は次の箇所で typedef されている

httpd-2.2.15/srclib/apr/include/apr_time.h
45   /** number of microseconds since 00:00:00 january 1, 1970 UTC */
46   typedef apr_int64_t apr_time_t;

結局 Python だと次のようになる

>>> import os
>>> statinfo = os.stat('/var/www/index.html')

>>> statinfo.st_ino
1483654L
>>> hex(_)
'0x16a386L'

>>> statinfo.st_size
45L
>>> hex(_)
'0x2dL'

>>> statinfo.st_mtime
1232339399.0
>>> hex(int(_ * 1000000))
'0x460ce601ebfc0L'

上記をまとめると

>>> print '%x-%x-%x' % (statinfo.st_ino,
...                     statinfo.st_size,
...                     int(statinfo.st_mtime * 1000000))

16a386-2d-460ce601ebfc0

これは、最初の ETAG の値「ETag: “16a386-2d-460ce601ebfc0″」と一致している。

Tagged with: , ,
Posted in middleware, web

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Archives
  • RT @__apf__: How to write a research paper: a guide for software engineers & practitioners. docs.google.com/presentation/d… /cc @inwyrd 6 days ago
  • RT @HayatoChiba: 昔、自然と対話しながら数学に打ち込んだら何かを悟れるのではと思いたち、専門書1つだけ持ってパワースポットで名高い奈良の山奥に1週間籠ったことがある。しかし泊まった民宿にドカベンが全巻揃っていたため、水島新司と対話しただけで1週間過ぎた。 それ… 3 weeks ago
  • RT @googlecloud: Ever wonder what underwater fiber optic internet cables look like? Look no further than this deep dive w/ @NatAndLo: https… 3 weeks ago
  • @ijin UTC+01:00 な時間帯で生活しています、、、 6 months ago
  • RT @mattcutts: Google's world-class Site Reliability Engineering team wrote a new book: amazon.com/Site-Reliabili… It's about managing produc… 9 months ago
%d bloggers like this: