Counting Service

Counting Service

·

6 min read

Counting service is a really fundamental service that has a lot of counting needs in business, such as likes, favorites, etc. It requires stable counting, meets long-term planning, and isolates from specific business needs.

Counting elements:

  • who

  • counting object id

  • counting object type

  • counting increment

  • create time

A single count is achieved by combining the above elements. Looking at it in the context of business, the counting needs are rough as follows:

  • Number of views for an article

  • Number of likes for a comment or answer

  • Number of purchases for a product

  • And so on.

enum ContentType {
    ContentType_UNKNOW              = 0;
    ContentType_ARTICLE             = 1;
    ContentType_QUORA               = 2;
    ContentType_TOPIC               = 3;
    ContentType_WINELIST            = 4;
    ContentType_WINECOMMENT         = 5;
    ContentType_TIMELINE            = 6;
}

enum CountingType {
    CountingType_UNKNOW             = 0;
    CountingType_PRAISE             = 1;
    CountingType_LIKE               = 2;
    CountingType_COMMENT            = 3;
    CountingType_VIEW               = 4;
    CountingType_READ               = 5;
    CountingType_FORWORD            = 6;
    CountingType_SHARE              = 7;
    CountingType_DOWN               = 8;
    CountingType_REPORT             = 9;
    CountingType_BUY                = 10;
    CountingType_SALE               = 11;
    CountingType_TSASTE             = 12;
    CountingType_FAVORITE           = 13;
}

In addition to these, there are also other needs, such as obtaining the recent count list, for example, the last 20 people who viewed an article.

Storage Structure

To meet the long-term requirements of the system, the storage structure needs to occupy as little space as possible and be highly efficient. Therefore, I have adopted a three-segment storage structure:

Segment 1: time_stamp & increment & counting_type

Segment 2: content_id & member_id

Segment 3: count

Three numbers constitute one counting operation, with each segment represented by an int64 data type. The theoretical size occupation is:

The theoretical storage size for this structure is such that 10GB of storage can hold up to 45 billion counting records, which is quite impressive.

2^8=256
2^16=65535
2^32=40 Billion
---
ONE:
+---+---------------------+------------------+--------------------+
|   |counting_type id(16) | increment id(1) | timestamp(ms)41 |
+---+---------------------+------------------+--------------------+

TWO:
+-----------------------------+
| content_id(32) | creater(32)|
+-----------------------------+

THREE:
+-----------------------------+
| total (32)                  | 
+-----------------------------+

query:
[ONE,TWO,THREE]: content_id00000000 >= content_id00000000 && content_id < (content_id+1)00000000

[[ONE,TWO,THREE],[ONE,TWO,THREE],[ONE,TWO,THREE]]

Data Structure

To add some fun to the code, I have chosen a team from a movie as the name for the data structure and assigned roles based on their character traits.

type (

    MetaItem struct {
        hOne   *HuBayi
        sTwo   *Shirley
        wThree WangKaiXuan
        hsw    []int64
        H      int64
        S      int64
        W      int64
    }

    // HuBayi: Team Core - The one in charge of every action: "Captain"
    HuBayi struct {
        time_stamp    int64
        increment     bool
        counting_type int64
    }

    // Shirley: Attentive and meticulous, responsible for intelligence work - "Spy" (or "Intelligence Officer") - records who the target is.
    Shirley struct {
        content_id int64
        member_id  int64
    }

    // WangKaiXuan: Greedy and responsible for managing assets and recording the total count - "Treasurer".
    WangKaiXuan int64
)

About the order of these three segments, there is a small trick that will be revealed in the querying section later.

Data Structure Construction:



func NewMetaItem(h *HuBayi, s *Shirley, w WangKaiXuan) *MetaItem {
    meta := &MetaItem{
        hOne:   h,
        sTwo:   s,
        wThree: w,
    }

    meta.Setup()
    return meta
}

func (this *MetaItem) Setup() {

    this.H = this.hOne.Value()
    this.S = this.sTwo.Value()
    this.W = int64(this.wThree)
}

As an example, let's use the character "Hu Bayi":

/*
+---+---------------------+------------------+--------------------+
|0 |counting_type id(16) | increment id(1)   |  timestamp(ms)41  |
+---+--------------------+-------------------+--------------------+
*/
func (this *HuBayi) Value() int64 {

    var incrementInt int64
    if this.increment {
        incrementInt = 1
    } else {
        incrementInt = 0
    }

    return this.counting_type<<CountingTypeShift |
        incrementInt<<IncrmentShift |
        (this.time_stamp - twepoch)

}

func (this *HuBayi) Parse(value int64) (int64, int64, int64) {

    time_stamp := (value & MaxTimestamp) + twepoch

    increment := (value >> IncrmentShift) & MaxIncrment

    counting_type := (value >> CountingTypeShift) & MaxCountingType

    return counting_type, increment, time_stamp
}

Scalability

  • 16 bits are reserved for counting_type, which completely satisfies any future expansion requirements.

  • To achieve maximum scalability, all timestamps do not start from 1997, but rather from a recent time, theoretically allowing the system to operate for another 20 years.

Querying

To query the total count, we only need to retrieve the last counting record, as Wang Kaixuan always records the final result after each counting operation. Here is an explanation of how to perform the query:

+---+---------------------+------------------+--------------------+
|   |counting_type id(16) | increment id(1)   |  timestamp(ms)41  |
+---+--------------------+-------------------+--------------------+

TWO:
+-----------------------------+
| content_id(32) | creater(32)|
+-----------------------------+

As mentioned earlier, we can convert the query operation into a comparison operation, which is much more efficient than string-matching operations.

By sorting the counting records by timestamp and keeping track of the latest count for each content_id and counting_type, we can quickly obtain the total count for each counting_type. Then, we can simply sum up all the latest counts to get the total count.

Since the fields in the CountingRecord struct are all integers, we can use comparison operators to compare them, which is much faster than string-matching operations.


content_id_select := content_id << model.ContentIdIdShift
content_id_select_max := (content_id + 1) << model.ContentIdIdShift

err = this.withDBContext(func(c *mgo.Collection) error {

    var err error
    query := bson.M{}

    query["s"] = bson.M{
        "$gte": content_id_select,
        "$lt":  content_id_select_max,
    }

    err = c.Find(query).Sort("-_id").One(&it)
    return err

}, this.dbSession,
    getDBName(content_type),
    getCollectionName(counting_type))

To find the desired record, we can use the content_id_select and content_id_select_max fields to define a range and then search for the maximum count within that range. Similarly, other query operations can be performed using this approach, such as finding the most recent likes list, whether a user has liked a certain content, the time of the like, and so on. Here is an example of how to find the maximum count within a range:

Sort the counting records by timestamp in descending order.

Iterate through the records and keep track of the latest count for each content_id and counting_type within the range defined by content_id_select and content_id_select_max.

Return the maximum count within the range.

This approach allows us to efficiently search for records within a specific range without the need to iterate through all the records, which would be much slower.

Database

That's correct. Since the system is designed using a general approach, it can be implemented on various databases including MongoDB, and the performance should not differ significantly.

Index

it's a good idea to create an index on the "h", "s", and "id" fields since they are used in the queries to identify specific records. By creating an index, the database can quickly locate the relevant records and avoid scanning through the entire collection. This can significantly improve the performance of the queries.

Performance

Real-time retrieval of billions of records in a single table

This system has low space consumption and high read/write performance, with a small amount of code. It has been running in production for many years.

End

Did you find this article valuable?

Support Levene's blog by becoming a sponsor. Any amount is appreciated!